Commit Graph

7 Commits (9fbe42683a23eac8a5c479cd640c5bf8044c9185)

Author SHA1 Message Date
Daniel Aleksandersen 5a69d4a8eb Improve metadata extraction (#478)
* Improve metadata extraction

* Recognize meta[property] as a space-separated list
* Recognize Dulin Core (dc|dcterm): metadata.
* Prefer Dublin Core, Open Graph, Twitter, and HTML in that order.
* _getArticleTitle() is now only used as fallback if document
 doesn't provide good metadata.
6 years ago
Gijs Kruitbosch 8c02a0d34c Fix #283 and remove hidden nodes 6 years ago
David A Roberts 5ae90930cd Don't convert DIVs to Ps when more than 25% links 6 years ago
David A Roberts 9f2c5cb42e Put phrasing content into paragraphs
This removes the need for `p.readability-styled` elements.
6 years ago
Gijs Kruitbosch ad4dd26448 Update test expectations 7 years ago
Cameron McCormack 5ad448f831 Update test expectations. 7 years ago
Evan Tseng 131d923d38 Bug 1167568 - Find a better topCandidate if there are other nodes scores are high enough, r=Gijs 8 years ago