PalmerAL
b551f1cf6e
Fix missing content on Wikipedia articles ( #560 )
2019-09-30 19:25:29 +01:00
Joe Winett
60f470c4bb
Remove aria-hidden="true" nodes ( fixes #541 ) ( #555 )
...
Remove aria-hidden="true" nodes (fixes #541 )
2019-08-29 08:33:28 +01:00
Jordy van den Aardweg
2982216913
Added "keepClasses" option to prevent cleaning of classes ( #552 )
2019-08-04 08:56:27 +01:00
Gijs
f33a6c2a23
Switch to a newer node.js to fix build issues ( #551 )
2019-07-15 14:53:42 +01:00
Gijs
234f420279
Clarify security implications of using readability
2019-07-15 14:40:34 +01:00
PalmerAL
9092b2a29c
Remove sharing elements in fewer situations ( #545 )
...
* remove fewer share elements
* simplify and fix social-buttons testcase
2019-05-22 23:53:51 +01:00
PalmerAL
814f0a3884
Add support for detecting lazy-loaded images ( #542 )
...
Add support for detecting lazy-loaded images using `src` or `srcset` attributes.
2019-05-08 23:48:37 +01:00
Mozilla-GitHub-Standards
26379fe62e
Add Mozilla Code of Conduct file
...
Fixes #537 .
_(Message COC002)_
2019-03-29 12:24:48 +00:00
Gijs Kruitbosch
cb5771fd4a
Add nested font tags to test _setNodeTag on those (see #59 )
2019-03-15 12:02:21 +00:00
Radhi
9009f64f9c
Fix table header missing ( #530 )
2019-03-07 13:09:21 +00:00
Radhi
6761a7e412
Fix embedded videos getting removed ( #526 )
...
Fix embedded videos getting removed
2019-03-07 13:02:15 +00:00
PalmerAL
f5c46a7b14
fix formatting
2019-03-05 01:33:00 +00:00
PalmerAL
681bf0c47b
use default threshold for share elements
2019-03-05 01:33:00 +00:00
PalmerAL
b9cece3e58
add test
2019-03-05 01:33:00 +00:00
PalmerAL
e76aba3485
only remove sharing elements if they contain <500 characters
2019-03-05 01:33:00 +00:00
PalmerAL
27ee1e947e
update regexes in readerable.js
2019-03-01 11:04:58 +00:00
PalmerAL
a014e0c9c8
exclude graphs from nytimes articles
2019-03-01 11:04:58 +00:00
Radhi Fadlillah
c942b32945
Revert source files and fix expected results
2019-03-01 11:02:48 +00:00
Radhi Fadlillah
bd5087d2f1
fix error in testing "wikipedia"
2019-03-01 11:02:48 +00:00
Radhi Fadlillah
3e025d58e5
fix error in testing "lwn-01"
2019-03-01 11:02:48 +00:00
Radhi Fadlillah
df95c9d717
fix error in testing "keep-tabulard-data"
2019-03-01 11:02:48 +00:00
Radhi Fadlillah
6a5066abe2
Fix tabular data got removed
2019-03-01 11:02:48 +00:00
PalmerAL
f70d36852b
check itemprop when determining whether a node is a byline
2019-02-23 18:26:15 +00:00
EvsChen
b9f47bcc8d
fix(test-util): fix generate testcase tool
2019-02-15 13:14:30 +00:00
Andres Rey
d41de78c26
Close img tag
2019-02-11 22:06:35 +00:00
Andres Rey
1187b2dae1
Update test expectations
2019-02-11 22:06:35 +00:00
Andres Rey
3ca8c12d87
Update test expectations
2019-02-11 22:06:35 +00:00
Andres Rey
f836a8f291
Add "gdpr" to the list of negative tags
2019-02-11 22:06:35 +00:00
Andres Rey
4ffd482004
Add medicalnewstoday test case with incorrect results
2019-02-11 22:06:35 +00:00
Taylor Buley
c0c097c930
update JSDOM example for node
2019-01-29 12:06:26 +00:00
Gijs Kruitbosch
60ef565b67
Don't choke on <meta> tags that do not have a content attribute
2019-01-28 15:55:07 +00:00
Gijs
878545f64d
Make usage sections in README more discoverable
...
This just reorders some of the content and reduces duplication.
2019-01-07 18:56:27 +00:00
Gijs Kruitbosch
30f9670a5f
Avoid setAttribute errors from invalid attributes, fixes #392
2019-01-07 18:53:24 +00:00
Gijs
15d411a865
Add comment to indicate duplicate regexes
...
This comment was added in mozilla-central and seems useful, adding it to keep m-c and github in sync.
2019-01-03 14:27:18 +00:00
Gijs Kruitbosch
d8c837012b
Fix benchmark script for script split and new JSDOM version
2018-12-29 18:22:14 +00:00
Gijs Kruitbosch
512e1c18a7
Update to latest JSDOM
2018-12-29 18:22:14 +00:00
Gijs Kruitbosch
977be42d1f
Fix link normalization for live HTMLCollections
...
Newer versions of JSDOM implement getElementsByTagName correctly.
This means it returns a live node list. When calling
`Element.replaceChild` for links inside the loop over that
collection, elements disappear from the list, meaning we miss
every other item. Without this fix, the `clean-links` testcase
breaks.
2018-12-29 18:22:14 +00:00
Gijs Kruitbosch
e8bb7f722f
Fix whitespace normalization in title metadata
...
When switching to a newer version of JSDOM, it is more literal
about listing whitespace as part of textContent, including
newlines and not normalizing multiple spaces.
It seems prudent to just always normalize whitespace for titles,
which are guaranteed to be pretty short anyway.
2018-12-29 18:22:14 +00:00
Gijs Kruitbosch
3610476663
Remove CSS that jsdom struggles to parse
2018-12-29 18:22:14 +00:00
Gijs Kruitbosch
2620542dd1
Split off isProbablyReaderable implementation
2018-12-29 18:22:14 +00:00
Maria Luiza Soares
8c41d92560
Assert on siteName in all test cases
2018-12-21 18:28:28 +00:00
Maria Luiza Soares
1bac47c70d
Add newly generated test case
2018-12-21 18:28:28 +00:00
Maria Luiza Soares
262fffd703
Retrieve site name on parse, based on meta og:site_name
2018-12-21 18:28:28 +00:00
Gijs
876c81f710
Update sorting function in Readability.js
...
Simplify sorting function also considering case where arguments are equal
Co-Authored-By: jemrobinson <james.em.robinson@gmail.com>
2018-11-20 12:08:07 +00:00
James Robinson
ee18c21fc2
Switched sort function from boolean to explicit -1 and 1 thus avoiding failures to sort when false is evaluated as 0
2018-11-20 12:08:07 +00:00
Dan Burzo
44e90de00b
Elements that have no .style (e.g. mathml) are probably visible; fixes #493
2018-11-07 13:29:41 +00:00
Hugo Locurcio
9fbe42683a
Add .gitattributes file
...
This ignores HTML (test data) so the repository is considered
to use JavaScript instead of HTML on GitHub.
2018-11-01 15:41:20 +00:00
Daniel Aleksandersen
3be1aaa01c
Recognize Sina Weibo meta tags
...
http://open.weibo.com/wiki/Weibo_meta_tag
2018-08-28 11:04:29 +01:00
Daniel Aleksandersen
5a69d4a8eb
Improve metadata extraction ( #478 )
...
* Improve metadata extraction
* Recognize meta[property] as a space-separated list
* Recognize Dulin Core (dc|dcterm): metadata.
* Prefer Dublin Core, Open Graph, Twitter, and HTML in that order.
* _getArticleTitle() is now only used as fallback if document
doesn't provide good metadata.
2018-08-25 00:28:00 +01:00
Daniel Aleksandersen
0449dbf186
Recognize more iframe video embed video services
...
* TenCent QQ Video, Alexa Rank 8
* Twitch clips and streams, Alexa Rank 33
* Internet Archive, Alexa Rank 265
* Wikimedia, Alexa Rank 347
2018-08-22 16:08:46 +01:00