PalmerAL
d5eea06a00
exclude additional elements based on their role ( #619 )
2020-08-24 16:00:23 +01:00
Garrett Xu
3fe82816af
Add support for author array in JSON-LD. #617 ( #618 )
2020-08-21 11:33:44 +01:00
PalmerAL
3844d8f05b
Include more ancestors in candidate scoring ( #611 )
...
* include more ancestors in candidate scoring
* fix medium-3 testcase
The original source file contained two copies of the document, which
was causing incorrect results
* remove unnecessary nested elements
* fix removal of empty elements
* add option to regenerate all testcases
* update tests
* fix quanta testcase
* fix creating testcase from network
* fix early exit in testcase generation
* format HTML before comparing while testing
* upgrade js-beautify
* don't merge outer readability div
2020-08-21 10:16:58 +01:00
Gijs Kruitbosch
80d818aaa6
Don't publish git attributes or travis config to npm
2020-08-05 23:03:04 +01:00
Dan Burzo
2ca98284e9
Prefer JSON-LD metadata object, when present ( #609 )
...
* Prefer JSON-LD metadata object, when present
* Log JSON-LD parsing error
* Trim all JSON-LD fields
2020-08-05 22:39:31 +01:00
Gijs Kruitbosch
914307a90b
Increment version before publishing on npm
2020-08-05 22:33:52 +01:00
Dan Burzo
1a61a23f68
Readability on npm ( #608 )
...
* Initial work on preparing Readability for npm
* Adjust some require()s
* Point package.json to index.js
* Add Node.js instructions to README
* Use ES6 in eslint
2020-08-05 12:17:05 +01:00
S Nikhill
59570ba7fc
Replace a Dead Link in Comment ( #606 )
...
* Update Links in Comments
Update a link in comments to point to a better source.
Remove a dead link. (Link Removed: http://blog.cdleary.com/2012/01/string-representation-in-spidermonkey/#ropes )
2020-08-05 10:53:38 +01:00
Dan Burzo
b1d15c0ef9
Add option.serializer, fixes #605 ( #607 )
2020-08-04 23:05:17 +01:00
Radhi
52ab9b5c89
Fix lazy-loaded images are not visible in Kinja sites ( #590 )
...
* Add initial test case for kinja's lazy image
* Implement method to remove small data uri image
* Convert relative uri in poster and srcset of media nodes
* Eslint doesn't like arrow function
* Unescape HTML entities in metadata
* Fix wrong regex for parsing srcset urls
* Remove line to check data url since it already handled by new URL
* Replace String.matchAll since it only supported in Node 12+
* Use numeric code when unescaping HTML
* Don't remove data URL src if it's svg
* Don't remove b64 src if it's the only attr that contains image
* Make the comma part non-optional in regex for srcset url
* Fix wrong code for unescaping HTML
* Don't capture comma and semicolon in data URL regex
2020-04-13 14:40:37 +01:00
Gijs Kruitbosch
d5621f85e7
Fix #585 - remove nodes with role=complementary
2020-04-07 15:29:26 +02:00
Radhi Fadlillah
668a3a1010
Minor cchange in comments
2020-04-03 09:20:55 +01:00
Radhi Fadlillah
3976fa34e9
Don't use data-old- prefix if old img attr not exists
2020-04-03 09:20:55 +01:00
Radhi Fadlillah
7d74395b7b
Feed semicolon to eslint
2020-04-03 09:20:55 +01:00
Radhi Fadlillah
d8366f0686
Keep all attributes that might contain image
2020-04-03 09:20:55 +01:00
Radhi Fadlillah
e85122e8d7
Make eslint happy
2020-04-03 09:20:55 +01:00
Radhi Fadlillah
c8eab07661
Stop using live list while removing nodes
2020-04-03 09:20:55 +01:00
Radhi Fadlillah
1277d22b81
Keep old img src as data attribute
2020-04-03 09:20:55 +01:00
Radhi Fadlillah
6fed28610d
Simplify loop for unwrapping noscript
2020-04-03 09:20:55 +01:00
Radhi Fadlillah
adc6accaec
Fix grammar issues in comments
2020-04-03 09:20:55 +01:00
Radhi Fadlillah
89572ad29a
Update test for several pages
2020-04-03 09:20:55 +01:00
Radhi Fadlillah
d784bf7e20
Add method to unwrap img inside noscript
2020-04-03 09:20:55 +01:00
Gijs Kruitbosch
b2f3a43f9f
Detect 'trailing' content when comparing DOMs
2020-03-30 23:14:12 +01:00
Gijs Kruitbosch
dc34dfd8fa
Fix #580 by not using live node lists when removing items
2020-02-28 18:28:44 +00:00
Gijs
630681bd26
Add some indenting back
2020-02-27 15:18:03 +00:00
PalmerAL
61ef00a853
add exception for wikimedia math images
2020-02-27 15:18:03 +00:00
Gijs
56ecc4d4ba
Fix eslint issues.
2020-02-27 15:07:44 +00:00
PalmerAL
7c91bdd275
preserve children when removing javascript: links
2020-02-27 15:07:44 +00:00
Gijs
d6fc38c4b4
Fix #564 by allowing 'content' as an indicator of readable content ( #565 )
...
This avoid `contentWithSidebar` causing complete removal of the content.
As a side-effect, it slightly improves byline detection by not removing
content as early on as before.
2019-10-21 15:13:55 +01:00
PalmerAL
b551f1cf6e
Fix missing content on Wikipedia articles ( #560 )
2019-09-30 19:25:29 +01:00
Joe Winett
60f470c4bb
Remove aria-hidden="true" nodes ( fixes #541 ) ( #555 )
...
Remove aria-hidden="true" nodes (fixes #541 )
2019-08-29 08:33:28 +01:00
Jordy van den Aardweg
2982216913
Added "keepClasses" option to prevent cleaning of classes ( #552 )
2019-08-04 08:56:27 +01:00
Gijs
f33a6c2a23
Switch to a newer node.js to fix build issues ( #551 )
2019-07-15 14:53:42 +01:00
Gijs
234f420279
Clarify security implications of using readability
2019-07-15 14:40:34 +01:00
PalmerAL
9092b2a29c
Remove sharing elements in fewer situations ( #545 )
...
* remove fewer share elements
* simplify and fix social-buttons testcase
2019-05-22 23:53:51 +01:00
PalmerAL
814f0a3884
Add support for detecting lazy-loaded images ( #542 )
...
Add support for detecting lazy-loaded images using `src` or `srcset` attributes.
2019-05-08 23:48:37 +01:00
Mozilla-GitHub-Standards
26379fe62e
Add Mozilla Code of Conduct file
...
Fixes #537 .
_(Message COC002)_
2019-03-29 12:24:48 +00:00
Gijs Kruitbosch
cb5771fd4a
Add nested font tags to test _setNodeTag on those (see #59 )
2019-03-15 12:02:21 +00:00
Radhi
9009f64f9c
Fix table header missing ( #530 )
2019-03-07 13:09:21 +00:00
Radhi
6761a7e412
Fix embedded videos getting removed ( #526 )
...
Fix embedded videos getting removed
2019-03-07 13:02:15 +00:00
PalmerAL
f5c46a7b14
fix formatting
2019-03-05 01:33:00 +00:00
PalmerAL
681bf0c47b
use default threshold for share elements
2019-03-05 01:33:00 +00:00
PalmerAL
b9cece3e58
add test
2019-03-05 01:33:00 +00:00
PalmerAL
e76aba3485
only remove sharing elements if they contain <500 characters
2019-03-05 01:33:00 +00:00
PalmerAL
27ee1e947e
update regexes in readerable.js
2019-03-01 11:04:58 +00:00
PalmerAL
a014e0c9c8
exclude graphs from nytimes articles
2019-03-01 11:04:58 +00:00
Radhi Fadlillah
c942b32945
Revert source files and fix expected results
2019-03-01 11:02:48 +00:00
Radhi Fadlillah
bd5087d2f1
fix error in testing "wikipedia"
2019-03-01 11:02:48 +00:00
Radhi Fadlillah
3e025d58e5
fix error in testing "lwn-01"
2019-03-01 11:02:48 +00:00
Radhi Fadlillah
df95c9d717
fix error in testing "keep-tabulard-data"
2019-03-01 11:02:48 +00:00