PalmerAL
b9cece3e58
add test
2019-03-05 01:33:00 +00:00
PalmerAL
e76aba3485
only remove sharing elements if they contain <500 characters
2019-03-05 01:33:00 +00:00
PalmerAL
27ee1e947e
update regexes in readerable.js
2019-03-01 11:04:58 +00:00
PalmerAL
a014e0c9c8
exclude graphs from nytimes articles
2019-03-01 11:04:58 +00:00
Radhi Fadlillah
c942b32945
Revert source files and fix expected results
2019-03-01 11:02:48 +00:00
Radhi Fadlillah
bd5087d2f1
fix error in testing "wikipedia"
2019-03-01 11:02:48 +00:00
Radhi Fadlillah
3e025d58e5
fix error in testing "lwn-01"
2019-03-01 11:02:48 +00:00
Radhi Fadlillah
df95c9d717
fix error in testing "keep-tabulard-data"
2019-03-01 11:02:48 +00:00
Radhi Fadlillah
6a5066abe2
Fix tabular data got removed
2019-03-01 11:02:48 +00:00
PalmerAL
f70d36852b
check itemprop when determining whether a node is a byline
2019-02-23 18:26:15 +00:00
EvsChen
b9f47bcc8d
fix(test-util): fix generate testcase tool
2019-02-15 13:14:30 +00:00
Andres Rey
d41de78c26
Close img tag
2019-02-11 22:06:35 +00:00
Andres Rey
1187b2dae1
Update test expectations
2019-02-11 22:06:35 +00:00
Andres Rey
3ca8c12d87
Update test expectations
2019-02-11 22:06:35 +00:00
Andres Rey
f836a8f291
Add "gdpr" to the list of negative tags
2019-02-11 22:06:35 +00:00
Andres Rey
4ffd482004
Add medicalnewstoday test case with incorrect results
2019-02-11 22:06:35 +00:00
Taylor Buley
c0c097c930
update JSDOM example for node
2019-01-29 12:06:26 +00:00
Gijs Kruitbosch
60ef565b67
Don't choke on <meta> tags that do not have a content attribute
2019-01-28 15:55:07 +00:00
Gijs
878545f64d
Make usage sections in README more discoverable
...
This just reorders some of the content and reduces duplication.
2019-01-07 18:56:27 +00:00
Gijs Kruitbosch
30f9670a5f
Avoid setAttribute errors from invalid attributes, fixes #392
2019-01-07 18:53:24 +00:00
Gijs
15d411a865
Add comment to indicate duplicate regexes
...
This comment was added in mozilla-central and seems useful, adding it to keep m-c and github in sync.
2019-01-03 14:27:18 +00:00
Gijs Kruitbosch
d8c837012b
Fix benchmark script for script split and new JSDOM version
2018-12-29 18:22:14 +00:00
Gijs Kruitbosch
512e1c18a7
Update to latest JSDOM
2018-12-29 18:22:14 +00:00
Gijs Kruitbosch
977be42d1f
Fix link normalization for live HTMLCollections
...
Newer versions of JSDOM implement getElementsByTagName correctly.
This means it returns a live node list. When calling
`Element.replaceChild` for links inside the loop over that
collection, elements disappear from the list, meaning we miss
every other item. Without this fix, the `clean-links` testcase
breaks.
2018-12-29 18:22:14 +00:00
Gijs Kruitbosch
e8bb7f722f
Fix whitespace normalization in title metadata
...
When switching to a newer version of JSDOM, it is more literal
about listing whitespace as part of textContent, including
newlines and not normalizing multiple spaces.
It seems prudent to just always normalize whitespace for titles,
which are guaranteed to be pretty short anyway.
2018-12-29 18:22:14 +00:00
Gijs Kruitbosch
3610476663
Remove CSS that jsdom struggles to parse
2018-12-29 18:22:14 +00:00
Gijs Kruitbosch
2620542dd1
Split off isProbablyReaderable implementation
2018-12-29 18:22:14 +00:00
Maria Luiza Soares
8c41d92560
Assert on siteName in all test cases
2018-12-21 18:28:28 +00:00
Maria Luiza Soares
1bac47c70d
Add newly generated test case
2018-12-21 18:28:28 +00:00
Maria Luiza Soares
262fffd703
Retrieve site name on parse, based on meta og:site_name
2018-12-21 18:28:28 +00:00
Gijs
876c81f710
Update sorting function in Readability.js
...
Simplify sorting function also considering case where arguments are equal
Co-Authored-By: jemrobinson <james.em.robinson@gmail.com>
2018-11-20 12:08:07 +00:00
James Robinson
ee18c21fc2
Switched sort function from boolean to explicit -1 and 1 thus avoiding failures to sort when false is evaluated as 0
2018-11-20 12:08:07 +00:00
Dan Burzo
44e90de00b
Elements that have no .style (e.g. mathml) are probably visible; fixes #493
2018-11-07 13:29:41 +00:00
Hugo Locurcio
9fbe42683a
Add .gitattributes file
...
This ignores HTML (test data) so the repository is considered
to use JavaScript instead of HTML on GitHub.
2018-11-01 15:41:20 +00:00
Daniel Aleksandersen
3be1aaa01c
Recognize Sina Weibo meta tags
...
http://open.weibo.com/wiki/Weibo_meta_tag
2018-08-28 11:04:29 +01:00
Daniel Aleksandersen
5a69d4a8eb
Improve metadata extraction ( #478 )
...
* Improve metadata extraction
* Recognize meta[property] as a space-separated list
* Recognize Dulin Core (dc|dcterm): metadata.
* Prefer Dublin Core, Open Graph, Twitter, and HTML in that order.
* _getArticleTitle() is now only used as fallback if document
doesn't provide good metadata.
2018-08-25 00:28:00 +01:00
Daniel Aleksandersen
0449dbf186
Recognize more iframe video embed video services
...
* TenCent QQ Video, Alexa Rank 8
* Twitch clips and streams, Alexa Rank 33
* Internet Archive, Alexa Rank 265
* Wikimedia, Alexa Rank 347
2018-08-22 16:08:46 +01:00
Gijs Kruitbosch
f782bc5f06
Avoid global flag when looking for metadata using regexes
2018-08-21 17:56:25 +02:00
Johann Hofmann
93a2f1b026
Merge pull request #471 from gijsk/moar-eslint
...
Add more eslint rules (fixes #457 )
2018-07-16 07:15:08 +02:00
Gijs Kruitbosch
30611cc57f
Fix quotes issues in test and benchmark files
2018-07-15 15:43:50 +01:00
Gijs Kruitbosch
f511d1aa2b
Enable eslint checks for quotes and single-line loops/conditionals
2018-07-14 22:09:14 +01:00
Gijs Kruitbosch
7cf95bd427
Fix same-line loops and if statements
2018-07-14 22:09:00 +01:00
Gijs Kruitbosch
d9f7bb2965
Fix quotes
2018-07-14 14:28:41 +01:00
Gijs Kruitbosch
7d03bec52d
Fix issues with finding nytimes content caused by in-article ads
2018-07-10 14:13:01 +01:00
tmm2018
076bf2017b
[docs] - mozilla/readibility - README.md - fixing tiny little issues (grammar, rethorics, spelling, etc.) ( #462 )
...
* [docs] - mozilla/readibility - README.md - add articles to the description of the properties of the Readability output
2018-06-13 08:14:36 -07:00
Gijs
4b193ccd6a
Include URI information for jsdom
in the README.
...
See #453 for an example of where this led to confusion.
2018-06-12 10:12:55 -07:00
Gijs Kruitbosch
8fec62d246
Strip XML namespaces from tag names to deal with broken serializations
2018-06-09 09:51:16 +01:00
Gijs Kruitbosch
8e92a1fa19
Reuse textNode variable for CDATA blocks, too
2018-06-09 09:49:25 +01:00
David A Roberts
ea4165721f
Remove single-cell tables
2018-06-09 09:49:01 +01:00
David A Roberts
bf64b58d90
Update tests
2018-06-08 11:06:01 +01:00