Commit Graph

95 Commits

Author SHA1 Message Date
Evan Tseng
a34d054f45 Merge pull request #332 from gijsk/fix-readability-determination-in-generate-testcase
Use jsdom for parsing the document to determine readability (fixes #325), r=evanxd
2016-12-15 17:30:24 +08:00
Evan Tseng
63230a307a Bug 1142312 - Add two more types of unlikely candidates: cover-wrap and yom-remote, r=Gijs 2016-12-15 11:30:40 +08:00
Gijs Kruitbosch
46842048c1 Use jsdom for parsing the document to determine readability (fixes #325) 2016-12-14 13:44:29 +00:00
Gijs Kruitbosch
0ab4ac8556 Fix test failures caused by timeout still being too low 2016-12-14 11:41:38 +00:00
Evan Tseng
e84c0c3f07 Bug 1285543 - Only use "og:title" or "twitter:title" if _getArticleTitle does not return a valid title, r=Gijs 2016-12-14 11:34:15 +00:00
Gijs
c2f7db51f5 Remove textContent from metadata file (fixes #324) (#326) 2016-12-09 13:28:56 -10:00
Evan Tseng
33dc8fa023 Bug 1255978 - Remove legends candidate, r=Gijs 2016-11-25 11:12:47 +00:00
Evan Tseng
af0aa5c59f Bug 1173548 - Find out text direction from ancestors of final candidate, r=Gijs 2016-11-25 10:24:41 +00:00
Evan Tseng
ece0d1ecea Bug 1317930 - Tests for msn.com, r=Gijs 2016-11-16 10:32:55 +00:00
Evan Tseng
1b694cf650 Bug 1310075 - Tests for qq.com. r=Gijs 2016-11-09 12:02:44 +00:00
Evan Tseng
522f39617f Bug 1310074 - Tests for yahoo.com. r=Gijs 2016-11-02 08:54:57 +00:00
Evan Tseng
4fa0d1b207 Bug 1177619 - Score div nodes which have br nodes. r=Gijs 2016-11-01 10:25:57 +00:00
Evan Tseng
8bfd2a978d Bug 1310073 - Tests for wikipedia.org. r=Gijs 2016-10-28 10:59:56 +01:00
Gijs
1a12befa41 Fix code style, tighten up eslint rules (#301) 2016-07-19 21:44:27 +01:00
Gijs Kruitbosch
46b08a5ea5 Address issue #277 by marking 'modal' unlikely+negative 2016-03-17 10:53:57 +00:00
Gijs Kruitbosch
a4d1e9ca12 Fix oversight in comment removal code exposed by better/newer jsdom implementation 2016-03-10 15:09:36 +00:00
Gijs Kruitbosch
e830ac9dd8 Fix eslint issues identified in m-c 2016-03-08 14:49:51 +00:00
Gijs Kruitbosch
dffa760c04 Fix issue #267 by ignoring hash URIs when making URIs absolute 2016-03-07 10:32:09 +00:00
Wes Johnston
f87a12400b Reuse test from pull request #239 which passes without modifications (modified by @gijsk to pass in the current XHTML test environment) 2016-01-25 11:18:47 +00:00
Gijs Kruitbosch
2e1cb3f467 Fix issue #251 by making JSDOMParser expect XML and stop making excuses for 'self-closed' things, when all that does is cause trouble 2016-01-22 19:57:45 +00:00
Gijs
a801846a45 Merge pull request #204 from mozilla/tweak-great-grandparent-scoring
Updated great grandparent node scoring.
2015-05-05 23:02:09 +01:00
Nicolas Perriault
ae0833522c Improved embedded video elements detection. 2015-05-05 22:11:11 +02:00
Nicolas Perriault
46304bb5fe Updated great grandparent node scoring. 2015-05-05 18:12:17 +02:00
Nicolas Perriault
88ef3893b5 Fixes #180 - Score intermediary headings. 2015-05-04 08:59:05 +02:00
Nicolas Perriault
dc1b2c9fa0 Refs #195 - Exclude nodes likely to be related content. 2015-05-04 08:51:45 +02:00
Nicolas Perriault
cc18cb5787 Ref #195 - Add support for dailymotion videos. 2015-04-30 15:02:52 +02:00
Nicolas Perriault
9dbc009376 Fixes #113 - Recursive node ancestor scoring. 2015-04-29 22:51:45 +02:00
Nicolas Perriault
44879722b6 Fixes #183 - Preserve list items. 2015-04-28 16:32:04 +02:00
Alexis Métaireau
5912e0c872 Add Firefox User-Agent when generating the test case. 2015-04-28 08:45:42 +02:00
Gijs
79aa2fca87 Merge pull request #189 from mozilla/dont-remove-headings
Fixes #150 - Keep article intermediary headings.
2015-04-27 23:36:39 +01:00
Margaret Leibovic
af6da2a87d Merge pull request #190 from mozilla/improved-author-meta-extraction
Improved author metadata detection.
2015-04-27 09:11:30 -07:00
Nicolas Perriault
7aee44adb2 Improved author metadata detection. 2015-04-27 17:03:23 +02:00
Gijs Kruitbosch
5f184053cd Make isProbablyReaderable include <pre>, and deal with long <br>-separated paragraphs and/or shorter-than-5-paragraph text and such. 2015-04-27 15:49:03 +01:00
Nicolas Perriault
2451a07a7d Fixes #150 - Keep article intermediary headings. 2015-04-27 15:15:52 +02:00
Margaret Leibovic
319a50b4f0 Fixes #184 - Don't strip class names from article content 2015-04-24 14:49:30 -07:00
Gijs
49e40768aa Merge pull request #185 from mozilla/score-section-tags-by-default
Fixes #139 #143: Added more weight to section tags.
2015-04-24 20:12:11 +01:00
Nicolas Perriault
f6ffa6acde Fixes #139 #143: Added more weight to section tags. 2015-04-24 19:55:51 +02:00
Nicolas Perriault
58cd789cd3 Improved title extraction 'algorithm'. 2015-04-24 16:16:10 +01:00
Nicolas Perriault
de89036cd5 Fixes #130 - Using js-beautify for HTML formatting. 2015-04-21 10:30:48 +02:00
Gijs
b37ff08bc7 Merge pull request #169 from mozilla/clean-footer-tags
Fixes #163 - Avoid including footer tag contents.
2015-04-17 16:53:51 +01:00
Nicolas Perriault
12c6a11f67 Fixes #163 - Avoid including footer tag contents. 2015-04-17 17:33:04 +02:00
Nicolas Perriault
6eeabf90c1 Fixes #164 - Add support for title alt semantic metadata. 2015-04-17 15:38:25 +02:00
Gijs Kruitbosch
0ff82de0f4 Implement createTextNode, do more relaxed escaping there, update testcase. 2015-04-13 14:32:49 +01:00
Margaret Leibovic
37a8cd4171 Bug 1147584 - Don't remove unlikely <a> tags, and replace <a> tags with their text content if they won't be useful links 2015-04-09 17:19:59 -07:00
Gijs
a6014f5854 Merge pull request #132 from gijsk/heise-ad-prioritization
Don't look at banners and skyscrapers, remove <noscript> elements
2015-04-09 20:12:01 +01:00
Gijs Kruitbosch
a6346a0ad4 Don't look at banners and skyscrapers, remove <noscript> elements 2015-04-09 20:02:46 +01:00
Nicolas Perriault
4424b0bad7 Refs #128 - Add support for options to Readability constructor. r=@gijsk 2015-04-09 11:56:58 +02:00
Gijs Kruitbosch
c53ca31907 Fixed test result output being sent at once 2015-04-08 15:24:34 +01:00
Nicolas Perriault
4d41f5e4ed Refs #117 - Drop social/share buttons. 2015-04-07 23:00:52 +02:00
Nicolas Perriault
d725ebc953 Fixes #99: JSDOMParser tag name case handling. r=@gijsk 2015-04-07 14:19:54 +02:00