Gijs Kruitbosch
|
9baea36169
|
Fix bug 1322674 by not removing content in data tables
|
2017-01-30 15:19:10 +01:00 |
|
Evan Tseng
|
131d923d38
|
Bug 1167568 - Find a better topCandidate if there are other nodes scores are high enough, r=Gijs
|
2017-01-17 11:29:57 +08:00 |
|
Sebastian Hengst
|
5e9c7a3910
|
Bug 1322327 - Only remove images which are not descendants of a figure if there is more than one image, update affected tests
|
2016-12-21 16:31:25 +00:00 |
|
andrei-ch
|
c5ff44d8fe
|
Clean <input>,<textarea>,<select>,<button> elements
|
2016-12-17 13:37:27 +00:00 |
|
Evan Tseng
|
a34d054f45
|
Merge pull request #332 from gijsk/fix-readability-determination-in-generate-testcase
Use jsdom for parsing the document to determine readability (fixes #325), r=evanxd
|
2016-12-15 17:30:24 +08:00 |
|
Evan Tseng
|
63230a307a
|
Bug 1142312 - Add two more types of unlikely candidates: cover-wrap and yom-remote, r=Gijs
|
2016-12-15 11:30:40 +08:00 |
|
Gijs Kruitbosch
|
46842048c1
|
Use jsdom for parsing the document to determine readability (fixes #325)
|
2016-12-14 13:44:29 +00:00 |
|
Gijs Kruitbosch
|
0ab4ac8556
|
Fix test failures caused by timeout still being too low
|
2016-12-14 11:41:38 +00:00 |
|
Evan Tseng
|
e84c0c3f07
|
Bug 1285543 - Only use "og:title" or "twitter:title" if _getArticleTitle does not return a valid title, r=Gijs
|
2016-12-14 11:34:15 +00:00 |
|
Gijs
|
c2f7db51f5
|
Remove textContent from metadata file (fixes #324) (#326)
|
2016-12-09 13:28:56 -10:00 |
|
Evan Tseng
|
33dc8fa023
|
Bug 1255978 - Remove legends candidate, r=Gijs
|
2016-11-25 11:12:47 +00:00 |
|
Evan Tseng
|
af0aa5c59f
|
Bug 1173548 - Find out text direction from ancestors of final candidate, r=Gijs
|
2016-11-25 10:24:41 +00:00 |
|
Evan Tseng
|
ece0d1ecea
|
Bug 1317930 - Tests for msn.com, r=Gijs
|
2016-11-16 10:32:55 +00:00 |
|
Evan Tseng
|
1b694cf650
|
Bug 1310075 - Tests for qq.com. r=Gijs
|
2016-11-09 12:02:44 +00:00 |
|
Evan Tseng
|
522f39617f
|
Bug 1310074 - Tests for yahoo.com. r=Gijs
|
2016-11-02 08:54:57 +00:00 |
|
Evan Tseng
|
4fa0d1b207
|
Bug 1177619 - Score div nodes which have br nodes. r=Gijs
|
2016-11-01 10:25:57 +00:00 |
|
Evan Tseng
|
8bfd2a978d
|
Bug 1310073 - Tests for wikipedia.org. r=Gijs
|
2016-10-28 10:59:56 +01:00 |
|
Gijs
|
1a12befa41
|
Fix code style, tighten up eslint rules (#301)
|
2016-07-19 21:44:27 +01:00 |
|
Gijs Kruitbosch
|
46b08a5ea5
|
Address issue #277 by marking 'modal' unlikely+negative
|
2016-03-17 10:53:57 +00:00 |
|
Gijs Kruitbosch
|
a4d1e9ca12
|
Fix oversight in comment removal code exposed by better/newer jsdom implementation
|
2016-03-10 15:09:36 +00:00 |
|
Gijs Kruitbosch
|
e830ac9dd8
|
Fix eslint issues identified in m-c
|
2016-03-08 14:49:51 +00:00 |
|
Gijs Kruitbosch
|
dffa760c04
|
Fix issue #267 by ignoring hash URIs when making URIs absolute
|
2016-03-07 10:32:09 +00:00 |
|
Wes Johnston
|
f87a12400b
|
Reuse test from pull request #239 which passes without modifications (modified by @gijsk to pass in the current XHTML test environment)
|
2016-01-25 11:18:47 +00:00 |
|
Gijs Kruitbosch
|
2e1cb3f467
|
Fix issue #251 by making JSDOMParser expect XML and stop making excuses for 'self-closed' things, when all that does is cause trouble
|
2016-01-22 19:57:45 +00:00 |
|
Gijs
|
a801846a45
|
Merge pull request #204 from mozilla/tweak-great-grandparent-scoring
Updated great grandparent node scoring.
|
2015-05-05 23:02:09 +01:00 |
|
Nicolas Perriault
|
ae0833522c
|
Improved embedded video elements detection.
|
2015-05-05 22:11:11 +02:00 |
|
Nicolas Perriault
|
46304bb5fe
|
Updated great grandparent node scoring.
|
2015-05-05 18:12:17 +02:00 |
|
Nicolas Perriault
|
88ef3893b5
|
Fixes #180 - Score intermediary headings.
|
2015-05-04 08:59:05 +02:00 |
|
Nicolas Perriault
|
dc1b2c9fa0
|
Refs #195 - Exclude nodes likely to be related content.
|
2015-05-04 08:51:45 +02:00 |
|
Nicolas Perriault
|
cc18cb5787
|
Ref #195 - Add support for dailymotion videos.
|
2015-04-30 15:02:52 +02:00 |
|
Nicolas Perriault
|
9dbc009376
|
Fixes #113 - Recursive node ancestor scoring.
|
2015-04-29 22:51:45 +02:00 |
|
Nicolas Perriault
|
44879722b6
|
Fixes #183 - Preserve list items.
|
2015-04-28 16:32:04 +02:00 |
|
Alexis Métaireau
|
5912e0c872
|
Add Firefox User-Agent when generating the test case.
|
2015-04-28 08:45:42 +02:00 |
|
Gijs
|
79aa2fca87
|
Merge pull request #189 from mozilla/dont-remove-headings
Fixes #150 - Keep article intermediary headings.
|
2015-04-27 23:36:39 +01:00 |
|
Margaret Leibovic
|
af6da2a87d
|
Merge pull request #190 from mozilla/improved-author-meta-extraction
Improved author metadata detection.
|
2015-04-27 09:11:30 -07:00 |
|
Nicolas Perriault
|
7aee44adb2
|
Improved author metadata detection.
|
2015-04-27 17:03:23 +02:00 |
|
Gijs Kruitbosch
|
5f184053cd
|
Make isProbablyReaderable include <pre>, and deal with long <br>-separated paragraphs and/or shorter-than-5-paragraph text and such.
|
2015-04-27 15:49:03 +01:00 |
|
Nicolas Perriault
|
2451a07a7d
|
Fixes #150 - Keep article intermediary headings.
|
2015-04-27 15:15:52 +02:00 |
|
Margaret Leibovic
|
319a50b4f0
|
Fixes #184 - Don't strip class names from article content
|
2015-04-24 14:49:30 -07:00 |
|
Gijs
|
49e40768aa
|
Merge pull request #185 from mozilla/score-section-tags-by-default
Fixes #139 #143: Added more weight to section tags.
|
2015-04-24 20:12:11 +01:00 |
|
Nicolas Perriault
|
f6ffa6acde
|
Fixes #139 #143: Added more weight to section tags.
|
2015-04-24 19:55:51 +02:00 |
|
Nicolas Perriault
|
58cd789cd3
|
Improved title extraction 'algorithm'.
|
2015-04-24 16:16:10 +01:00 |
|
Nicolas Perriault
|
de89036cd5
|
Fixes #130 - Using js-beautify for HTML formatting.
|
2015-04-21 10:30:48 +02:00 |
|
Gijs
|
b37ff08bc7
|
Merge pull request #169 from mozilla/clean-footer-tags
Fixes #163 - Avoid including footer tag contents.
|
2015-04-17 16:53:51 +01:00 |
|
Nicolas Perriault
|
12c6a11f67
|
Fixes #163 - Avoid including footer tag contents.
|
2015-04-17 17:33:04 +02:00 |
|
Nicolas Perriault
|
6eeabf90c1
|
Fixes #164 - Add support for title alt semantic metadata.
|
2015-04-17 15:38:25 +02:00 |
|
Gijs Kruitbosch
|
0ff82de0f4
|
Implement createTextNode, do more relaxed escaping there, update testcase.
|
2015-04-13 14:32:49 +01:00 |
|
Margaret Leibovic
|
37a8cd4171
|
Bug 1147584 - Don't remove unlikely <a> tags, and replace <a> tags with their text content if they won't be useful links
|
2015-04-09 17:19:59 -07:00 |
|
Gijs
|
a6014f5854
|
Merge pull request #132 from gijsk/heise-ad-prioritization
Don't look at banners and skyscrapers, remove <noscript> elements
|
2015-04-09 20:12:01 +01:00 |
|
Gijs Kruitbosch
|
a6346a0ad4
|
Don't look at banners and skyscrapers, remove <noscript> elements
|
2015-04-09 20:02:46 +01:00 |
|