Commit Graph

331 Commits (9fbe42683a23eac8a5c479cd640c5bf8044c9185)
 

Author SHA1 Message Date
Hugo Locurcio 9fbe42683a Add .gitattributes file
This ignores HTML (test data) so the repository is considered
to use JavaScript instead of HTML on GitHub.
6 years ago
Daniel Aleksandersen 3be1aaa01c Recognize Sina Weibo meta tags
http://open.weibo.com/wiki/Weibo_meta_tag
6 years ago
Daniel Aleksandersen 5a69d4a8eb Improve metadata extraction (#478)
* Improve metadata extraction

* Recognize meta[property] as a space-separated list
* Recognize Dulin Core (dc|dcterm): metadata.
* Prefer Dublin Core, Open Graph, Twitter, and HTML in that order.
* _getArticleTitle() is now only used as fallback if document
 doesn't provide good metadata.
6 years ago
Daniel Aleksandersen 0449dbf186 Recognize more iframe video embed video services
* TenCent QQ Video, Alexa Rank 8
* Twitch clips and streams, Alexa Rank 33
* Internet Archive, Alexa Rank 265
* Wikimedia, Alexa Rank 347
6 years ago
Gijs Kruitbosch f782bc5f06 Avoid global flag when looking for metadata using regexes 6 years ago
Johann Hofmann 93a2f1b026
Merge pull request #471 from gijsk/moar-eslint
Add more eslint rules (fixes #457)
6 years ago
Gijs Kruitbosch 30611cc57f Fix quotes issues in test and benchmark files 6 years ago
Gijs Kruitbosch f511d1aa2b Enable eslint checks for quotes and single-line loops/conditionals 6 years ago
Gijs Kruitbosch 7cf95bd427 Fix same-line loops and if statements 6 years ago
Gijs Kruitbosch d9f7bb2965 Fix quotes 6 years ago
Gijs Kruitbosch 7d03bec52d Fix issues with finding nytimes content caused by in-article ads 6 years ago
tmm2018 076bf2017b [docs] - mozilla/readibility - README.md - fixing tiny little issues (grammar, rethorics, spelling, etc.) (#462)
* [docs] - mozilla/readibility - README.md - add articles to the description of the properties of the Readability output
6 years ago
Gijs 4b193ccd6a
Include URI information for `jsdom` in the README.
See #453 for an example of where this led to confusion.
6 years ago
Gijs Kruitbosch 8fec62d246 Strip XML namespaces from tag names to deal with broken serializations 6 years ago
Gijs Kruitbosch 8e92a1fa19 Reuse textNode variable for CDATA blocks, too 6 years ago
David A Roberts ea4165721f Remove single-cell tables 6 years ago
David A Roberts bf64b58d90 Update tests 6 years ago
David A Roberts 72bd1a8532 Don't nest paragraphs 6 years ago
David A Roberts 68c9af4ffa Use numeric encoding for non-XML entities
JSDOMParser can't handle HTML named entities like ` `
6 years ago
David A Roberts 611e9e3a6f JSDOMParser: handle CDATA sections 6 years ago
David A Roberts afcc4b8e49 Fix titles not being trimmed sometimes 6 years ago
Gijs Kruitbosch d4b842c82a Match headings on trimmed strings to avoid whitespace causing mismatches 6 years ago
Gijs Kruitbosch 8c02a0d34c Fix #283 and remove hidden nodes 6 years ago
David A Roberts 656a6673d9 Don't put non-phrasing content into paragraphs 6 years ago
David A Roberts 5ae90930cd Don't convert DIVs to Ps when more than 25% links 6 years ago
David A Roberts 9f2c5cb42e Put phrasing content into paragraphs
This removes the need for `p.readability-styled` elements.
6 years ago
David A Roberts c823a6efb2 Fix generate-testcase.js 6 years ago
Gijs Kruitbosch f4ab856992 Check for a document being passed
This provides a descriptive error message if no document is passed, and
ignores the first argument if the second argument looks like
a reasonable DOM document instance.
6 years ago
David A Roberts 7a24801958 Don't include root html node in candidates
Fixes #435
6 years ago
David A Roberts acfd3759a1 Generate XHTML-compatible input for test cases
Fixes the bug noted in the README
6 years ago
David A Roberts d60184966c Remove unused URI parameter from constructor 6 years ago
David A Roberts 5ee03bc960 Stop Readability depending on Node.* constants 6 years ago
Andres Rey 3c76104adb Fix engadget test case 6 years ago
Andres Rey 4b99f41ec9 Add engadget test case 6 years ago
Andres Rey 6c5bc62959 Remove aside tags on test cases 6 years ago
Andres Rey 6fd816496c Clean <aside> tags on _prepArticle 6 years ago
David A Roberts f8d9b1c224 Update test expectations 6 years ago
David A Roberts 8414158fa9 Fix _replaceBrs
Previously, `nextElem` was not actually proceeding to the next element, and therefore aborting the paragraph at the first `<br>` (rather than the first `<br><br>` as the comment indicates).
6 years ago
Joan Espasa Arxer 3ff9a166fb Changed wordThreshold to charThreshold to better reflect the semantics. 6 years ago
Brad Philips 8525c6af36 Fix relative URIs given <base> tags (#422) 6 years ago
Gijs Kruitbosch d598baf02b Improve URL handling in JSDOMParser and Readability.js
This change ups the required node version to 7.0 because it relies on the builtin url module.

We now pass a url when constructing a jsdom document or JSDOMParser document.
Because this is an API change, I'm increasing the package version.

Ultimately, I would like to remove the  argument from the readability constructor. It should
use the documentURI from the document it is passed.
6 years ago
Andres Rey 834672ef86 Return longest text after failing to detect text longer than the configured value (#423)
Save extracted text across attempts and return the longest one when all attempts fail, and add a test case from hukumusume
6 years ago
Tom Z?hner 264b8e8968 Remove link elements when preparing article for display 6 years ago
Thomas Jaggi fd1557560a [Docs] Fixed JSDOM usage note 7 years ago
Andres Rey fa9d8bda48 Add la-nacion test case 7 years ago
Andres Rey 01ffd0c617 Remove "modal" from strings to remove 7 years ago
Gijs 8da91b9eed
Fix omitted semicolon 7 years ago
Gijs 0a30527c85
Explicitly mention lack of `Node` in `node.js` environments 7 years ago
Gijs Kruitbosch 807bf05aa3 Fix className usage so it deals correctly with SVG nodes (fixes #412). 7 years ago
Gijs Kruitbosch c586aeb404 Fix generate-testcase.js script so it keeps `caption` classes 7 years ago