Commit Graph

182 Commits

Author SHA1 Message Date
Wes Johnston
f87a12400b Reuse test from pull request #239 which passes without modifications (modified by @gijsk to pass in the current XHTML test environment) 2016-01-25 11:18:47 +00:00
Gijs
b1d360168b Merge pull request #252 from gijsk/fix-delayed-closing-tags
Fix issue #251 by making JSDOMParser deal with non-self-closed-things
2016-01-23 15:26:28 +00:00
Gijs Kruitbosch
2e1cb3f467 Fix issue #251 by making JSDOMParser expect XML and stop making excuses for 'self-closed' things, when all that does is cause trouble 2016-01-22 19:57:45 +00:00
Gijs
d360226f8c Merge pull request #260 from gijsk/hid-class
Fix bug 1230050 by checking for the 'hid' class specifically, r?MattN
2016-01-12 18:41:44 +00:00
Gijs Kruitbosch
a9597efc17 Fix bug 1230050 by checking for the 'hid' class specifically, r?MattN 2016-01-12 15:24:27 +00:00
Gijs
30d6db3c11 Merge pull request #256 from brendanlong/spdx-license
Fix package.json's license to be in SPDX format ("Apache-2.0").
2015-11-16 20:13:15 +00:00
Brendan Long
c59a054f78 Fix package.json's license to be in SPDX format ("Apache-2.0").
See: https://docs.npmjs.com/files/package.json#license
2015-11-16 15:03:01 -05:00
Gijs
e5a6d628f4 Merge pull request #254 from hsemarap/readme-change-to-fix-scrambling-of-dom
Readme change to advise about DOM modification effects of parse(). Fixes #250
2015-10-21 20:21:03 +01:00
Parameswaran D
a812b329ea moved the sample code under Optional subsection 2015-10-21 21:03:48 +05:30
Parameswaran D
d1e4ef0dcd Fixes #250 : scrambling of DOM on parse 2015-10-21 12:20:57 +05:30
Parameswaran D
0b5dd0a6fb Fixes #250 : scrambling of DOM on parse 2015-10-21 12:15:40 +05:30
Gijs
8510106638 Merge pull request #211 from mozilla/add-support-for-wbr-tag
Added support for the wbr html tag to JSDOMParser.
2015-05-06 14:22:11 +01:00
Nicolas Perriault
8806e999d1 Added support for the wbr html tag to JSDOMParser. 2015-05-06 14:53:04 +02:00
Gijs
a801846a45 Merge pull request #204 from mozilla/tweak-great-grandparent-scoring
Updated great grandparent node scoring.
2015-05-05 23:02:09 +01:00
Gijs
5bf56177be Merge pull request #207 from mozilla/better-dm
Improved embedded video elements detection.
2015-05-05 23:01:32 +01:00
Nicolas Perriault
ae0833522c Improved embedded video elements detection. 2015-05-05 22:11:11 +02:00
Nicolas Perriault
46304bb5fe Updated great grandparent node scoring. 2015-05-05 18:12:17 +02:00
Nicolas Perriault
66071e573d Merge pull request #194 from mozilla/score-intermediary-headers
Fixes #180 - Score intermediary headings.
2015-05-04 09:00:36 +02:00
Nicolas Perriault
88ef3893b5 Fixes #180 - Score intermediary headings. 2015-05-04 08:59:05 +02:00
Nicolas Perriault
6344b3f736 Merge pull request #196 from mozilla/strip-related-contents
Refs #195 - Exclude nodes likely to be related content.
2015-05-04 08:53:11 +02:00
Nicolas Perriault
dc1b2c9fa0 Refs #195 - Exclude nodes likely to be related content. 2015-05-04 08:51:45 +02:00
Margaret Leibovic
affa0edbdd Merge pull request #197 from mozilla/support-dailymotion-videos
Ref #195 - Add support for dailymotion videos.
2015-04-30 08:33:21 -07:00
Nicolas Perriault
cc18cb5787 Ref #195 - Add support for dailymotion videos. 2015-04-30 15:02:52 +02:00
Nicolas Perriault
4721837e27 Merge pull request #193 from mozilla/score-great-grandparent-nodes
Fixes #113 - Score great grandparent nodes.
2015-04-29 22:56:16 +02:00
Nicolas Perriault
9dbc009376 Fixes #113 - Recursive node ancestor scoring. 2015-04-29 22:51:45 +02:00
Gijs
f71ec9ceae Merge pull request #191 from mozilla/preserve-list-items
Fixes #183 - Preserve list items.
2015-04-28 15:34:58 +01:00
Nicolas Perriault
44879722b6 Fixes #183 - Preserve list items. 2015-04-28 16:32:04 +02:00
Alexis Métaireau
5912e0c872 Add Firefox User-Agent when generating the test case. 2015-04-28 08:45:42 +02:00
Gijs
79aa2fca87 Merge pull request #189 from mozilla/dont-remove-headings
Fixes #150 - Keep article intermediary headings.
2015-04-27 23:36:39 +01:00
Margaret Leibovic
af6da2a87d Merge pull request #190 from mozilla/improved-author-meta-extraction
Improved author metadata detection.
2015-04-27 09:11:30 -07:00
Nicolas Perriault
0d696051e9 Merge pull request #188 from gijsk/improve-isprobably-readerable
Make isProbablyReaderable include <pre>, and deal with long <br>-separat...
2015-04-27 17:09:54 +02:00
Nicolas Perriault
7aee44adb2 Improved author metadata detection. 2015-04-27 17:03:23 +02:00
Gijs Kruitbosch
5f184053cd Make isProbablyReaderable include <pre>, and deal with long <br>-separated paragraphs and/or shorter-than-5-paragraph text and such. 2015-04-27 15:49:03 +01:00
Gijs Kruitbosch
d9a475e8d4 Fix benchmark script, add isProbablyReaderable benchmark 2015-04-27 15:49:03 +01:00
Nicolas Perriault
2451a07a7d Fixes #150 - Keep article intermediary headings. 2015-04-27 15:15:52 +02:00
Gijs
62f5d43c70 Merge pull request #187 from leibovic/classnames
Fixes #184 - Don't strip class names from article content
2015-04-25 00:13:30 +01:00
Margaret Leibovic
319a50b4f0 Fixes #184 - Don't strip class names from article content 2015-04-24 14:49:30 -07:00
Gijs
49e40768aa Merge pull request #185 from mozilla/score-section-tags-by-default
Fixes #139 #143: Added more weight to section tags.
2015-04-24 20:12:11 +01:00
Nicolas Perriault
f6ffa6acde Fixes #139 #143: Added more weight to section tags. 2015-04-24 19:55:51 +02:00
Gijs
32d8a526f9 Merge pull request #175 from mozilla/improve-title-extraction
Fixes #174 - Remove aggressive article title formatting rule.
2015-04-24 16:17:00 +01:00
Nicolas Perriault
58cd789cd3 Improved title extraction 'algorithm'. 2015-04-24 16:16:10 +01:00
Gijs
647658a47b Merge pull request #172 from mozilla/js-beautify
Fixes #130 - Using js-beautify for HTML formatting.
2015-04-21 12:26:01 +01:00
Nicolas Perriault
de89036cd5 Fixes #130 - Using js-beautify for HTML formatting. 2015-04-21 10:30:48 +02:00
Gijs
b37ff08bc7 Merge pull request #169 from mozilla/clean-footer-tags
Fixes #163 - Avoid including footer tag contents.
2015-04-17 16:53:51 +01:00
Nicolas Perriault
12c6a11f67 Fixes #163 - Avoid including footer tag contents. 2015-04-17 17:33:04 +02:00
Gijs
87c0bc0144 Merge pull request #167 from mozilla/better-headline-extraction
Fixes #164 - Add support for title alt semantic metadata.
2015-04-17 16:28:21 +01:00
Nicolas Perriault
6eeabf90c1 Fixes #164 - Add support for title alt semantic metadata. 2015-04-17 15:38:25 +02:00
Margaret Leibovic
eb7ec7231e Merge pull request #135 from gijsk/links
Bug 1147584 - Don't strip unlikely <a>s, and replace useless <a>s with textContent
2015-04-13 07:00:10 -07:00
Gijs Kruitbosch
0ff82de0f4 Implement createTextNode, do more relaxed escaping there, update testcase. 2015-04-13 14:32:49 +01:00
Margaret Leibovic
37a8cd4171 Bug 1147584 - Don't remove unlikely <a> tags, and replace <a> tags with their text content if they won't be useful links 2015-04-09 17:19:59 -07:00