Commit Graph

397 Commits

Author SHA1 Message Date
Radhi
52ab9b5c89
Fix lazy-loaded images are not visible in Kinja sites (#590)
* Add initial test case for kinja's lazy image

* Implement method to remove small data uri image

* Convert relative uri in poster and srcset of media nodes

* Eslint doesn't like arrow function

* Unescape HTML entities in metadata

* Fix wrong regex for parsing srcset urls

* Remove line to check data url since it already handled by new URL

* Replace String.matchAll since it only supported in Node 12+

* Use numeric code when unescaping HTML

* Don't remove data URL src if it's svg

* Don't remove b64 src if it's the only attr that contains image

* Make the comma part non-optional in regex for srcset url

* Fix wrong code for unescaping HTML

* Don't capture comma and semicolon in data URL regex
2020-04-13 14:40:37 +01:00
Gijs Kruitbosch
d5621f85e7 Fix #585 - remove nodes with role=complementary 2020-04-07 15:29:26 +02:00
Radhi Fadlillah
668a3a1010 Minor cchange in comments 2020-04-03 09:20:55 +01:00
Radhi Fadlillah
3976fa34e9 Don't use data-old- prefix if old img attr not exists 2020-04-03 09:20:55 +01:00
Radhi Fadlillah
7d74395b7b Feed semicolon to eslint 2020-04-03 09:20:55 +01:00
Radhi Fadlillah
d8366f0686 Keep all attributes that might contain image 2020-04-03 09:20:55 +01:00
Radhi Fadlillah
e85122e8d7 Make eslint happy 2020-04-03 09:20:55 +01:00
Radhi Fadlillah
c8eab07661 Stop using live list while removing nodes 2020-04-03 09:20:55 +01:00
Radhi Fadlillah
1277d22b81 Keep old img src as data attribute 2020-04-03 09:20:55 +01:00
Radhi Fadlillah
6fed28610d Simplify loop for unwrapping noscript 2020-04-03 09:20:55 +01:00
Radhi Fadlillah
adc6accaec Fix grammar issues in comments 2020-04-03 09:20:55 +01:00
Radhi Fadlillah
89572ad29a Update test for several pages 2020-04-03 09:20:55 +01:00
Radhi Fadlillah
d784bf7e20 Add method to unwrap img inside noscript 2020-04-03 09:20:55 +01:00
Gijs Kruitbosch
b2f3a43f9f Detect 'trailing' content when comparing DOMs 2020-03-30 23:14:12 +01:00
Gijs Kruitbosch
dc34dfd8fa Fix #580 by not using live node lists when removing items 2020-02-28 18:28:44 +00:00
Gijs
630681bd26 Add some indenting back 2020-02-27 15:18:03 +00:00
PalmerAL
61ef00a853 add exception for wikimedia math images 2020-02-27 15:18:03 +00:00
Gijs
56ecc4d4ba Fix eslint issues. 2020-02-27 15:07:44 +00:00
PalmerAL
7c91bdd275 preserve children when removing javascript: links 2020-02-27 15:07:44 +00:00
Gijs
d6fc38c4b4
Fix #564 by allowing 'content' as an indicator of readable content (#565)
This avoid `contentWithSidebar` causing complete removal of the content.
As a side-effect, it slightly improves byline detection by not removing
content as early on as before.
2019-10-21 15:13:55 +01:00
PalmerAL
b551f1cf6e Fix missing content on Wikipedia articles (#560) 2019-09-30 19:25:29 +01:00
Joe Winett
60f470c4bb Remove aria-hidden="true" nodes (fixes #541) (#555)
Remove aria-hidden="true" nodes (fixes #541)
2019-08-29 08:33:28 +01:00
Jordy van den Aardweg
2982216913 Added "keepClasses" option to prevent cleaning of classes (#552) 2019-08-04 08:56:27 +01:00
Gijs
f33a6c2a23
Switch to a newer node.js to fix build issues (#551) 2019-07-15 14:53:42 +01:00
Gijs
234f420279
Clarify security implications of using readability 2019-07-15 14:40:34 +01:00
PalmerAL
9092b2a29c Remove sharing elements in fewer situations (#545)
* remove fewer share elements

* simplify and fix social-buttons testcase
2019-05-22 23:53:51 +01:00
PalmerAL
814f0a3884 Add support for detecting lazy-loaded images (#542)
Add support for detecting lazy-loaded images using `src` or `srcset` attributes.
2019-05-08 23:48:37 +01:00
Mozilla-GitHub-Standards
26379fe62e Add Mozilla Code of Conduct file
Fixes #537.

_(Message COC002)_
2019-03-29 12:24:48 +00:00
Gijs Kruitbosch
cb5771fd4a Add nested font tags to test _setNodeTag on those (see #59) 2019-03-15 12:02:21 +00:00
Radhi
9009f64f9c Fix table header missing (#530) 2019-03-07 13:09:21 +00:00
Radhi
6761a7e412 Fix embedded videos getting removed (#526)
Fix embedded videos getting removed
2019-03-07 13:02:15 +00:00
PalmerAL
f5c46a7b14 fix formatting 2019-03-05 01:33:00 +00:00
PalmerAL
681bf0c47b use default threshold for share elements 2019-03-05 01:33:00 +00:00
PalmerAL
b9cece3e58 add test 2019-03-05 01:33:00 +00:00
PalmerAL
e76aba3485 only remove sharing elements if they contain <500 characters 2019-03-05 01:33:00 +00:00
PalmerAL
27ee1e947e update regexes in readerable.js 2019-03-01 11:04:58 +00:00
PalmerAL
a014e0c9c8 exclude graphs from nytimes articles 2019-03-01 11:04:58 +00:00
Radhi Fadlillah
c942b32945 Revert source files and fix expected results 2019-03-01 11:02:48 +00:00
Radhi Fadlillah
bd5087d2f1 fix error in testing "wikipedia" 2019-03-01 11:02:48 +00:00
Radhi Fadlillah
3e025d58e5 fix error in testing "lwn-01" 2019-03-01 11:02:48 +00:00
Radhi Fadlillah
df95c9d717 fix error in testing "keep-tabulard-data" 2019-03-01 11:02:48 +00:00
Radhi Fadlillah
6a5066abe2 Fix tabular data got removed 2019-03-01 11:02:48 +00:00
PalmerAL
f70d36852b check itemprop when determining whether a node is a byline 2019-02-23 18:26:15 +00:00
EvsChen
b9f47bcc8d fix(test-util): fix generate testcase tool 2019-02-15 13:14:30 +00:00
Andres Rey
d41de78c26 Close img tag 2019-02-11 22:06:35 +00:00
Andres Rey
1187b2dae1 Update test expectations 2019-02-11 22:06:35 +00:00
Andres Rey
3ca8c12d87 Update test expectations 2019-02-11 22:06:35 +00:00
Andres Rey
f836a8f291 Add "gdpr" to the list of negative tags 2019-02-11 22:06:35 +00:00
Andres Rey
4ffd482004 Add medicalnewstoday test case with incorrect results 2019-02-11 22:06:35 +00:00
Taylor Buley
c0c097c930 update JSDOM example for node 2019-01-29 12:06:26 +00:00