You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
readability/test/test-pages
Daniel Aleksandersen 5a69d4a8eb Improve metadata extraction (#478)
* Improve metadata extraction

* Recognize meta[property] as a space-separated list
* Recognize Dulin Core (dc|dcterm): metadata.
* Prefer Dublin Core, Open Graph, Twitter, and HTML in that order.
* _getArticleTitle() is now only used as fallback if document
 doesn't provide good metadata.
6 years ago
..
001 Update test expectations. 7 years ago
002 Improve metadata extraction (#478) 6 years ago
003-metadata-preferred Improve metadata extraction (#478) 6 years ago
004-metadata-space-separated-properties Improve metadata extraction (#478) 6 years ago
aclu Don't include root html node in candidates 6 years ago
ars-1 Don't convert DIVs to Ps when more than 25% links 6 years ago
base-url Bug 1259763 - Remove h2 when there is only one h2 and its text content substantially equals article title, r=Gijs 8 years ago
base-url-base-element Fix relative URIs given <base> tags (#422) 6 years ago
base-url-base-element-relative Fix relative URIs given <base> tags (#422) 6 years ago
basic-tags-cleaning Fix issue #251 by making JSDOMParser expect XML and stop making excuses for 'self-closed' things, when all that does is cause trouble 9 years ago
bbc-1 Improve metadata extraction (#478) 6 years ago
blogger Avoid global flag when looking for metadata using regexes 6 years ago
breitbart Improve metadata extraction (#478) 6 years ago
bug-1255978 Improve metadata extraction (#478) 6 years ago
buzzfeed-1 Improve metadata extraction (#478) 6 years ago
clean-links Remove single-cell tables 6 years ago
cnet Update test expectations. 7 years ago
cnet-svg-classes Put phrasing content into paragraphs 6 years ago
cnn Fix #283 and remove hidden nodes 6 years ago
comment-inside-script-parsing Fixes #130 - Using js-beautify for HTML formatting. 9 years ago
daringfireball-1 Update test expectations 7 years ago
ehow-1 Improve metadata extraction (#478) 6 years ago
ehow-2 Improve metadata extraction (#478) 6 years ago
embedded-videos Fix issue #251 by making JSDOMParser expect XML and stop making excuses for 'self-closed' things, when all that does is cause trouble 9 years ago
engadget Improve metadata extraction (#478) 6 years ago
gmw Put phrasing content into paragraphs 6 years ago
heise Improve metadata extraction (#478) 6 years ago
herald-sun-1 Improve metadata extraction (#478) 6 years ago
hidden-nodes Fix #283 and remove hidden nodes 6 years ago
hukumusume Remove single-cell tables 6 years ago
iab-1 Improve metadata extraction (#478) 6 years ago
ietf-1 Improve metadata extraction (#478) 6 years ago
keep-images Improve metadata extraction (#478) 6 years ago
la-nacion Improve metadata extraction (#478) 6 years ago
lemonde-1 Update test expectations 7 years ago
liberation-1 Update test expectations 7 years ago
lifehacker-post-comment-load Remove aside tags on test cases 6 years ago
lifehacker-working Remove aside tags on test cases 6 years ago
links-in-tables Avoid global flag when looking for metadata using regexes 6 years ago
lwn-1 Put phrasing content into paragraphs 6 years ago
medium-1 Improve metadata extraction (#478) 6 years ago
medium-2 Improve metadata extraction (#478) 6 years ago
medium-3 Improve metadata extraction (#478) 6 years ago
missing-paragraphs Fix issue #251 by making JSDOMParser expect XML and stop making excuses for 'self-closed' things, when all that does is cause trouble 9 years ago
mozilla-1 Improve metadata extraction (#478) 6 years ago
mozilla-2 Avoid global flag when looking for metadata using regexes 6 years ago
msn Update test expectations. 7 years ago
normalize-spaces Bug 1259763 - Remove h2 when there is only one h2 and its text content substantially equals article title, r=Gijs 8 years ago
nytimes-1 Put phrasing content into paragraphs 6 years ago
nytimes-2 Put phrasing content into paragraphs 6 years ago
pixnet Update test expectations 7 years ago
qq Put phrasing content into paragraphs 6 years ago
remove-extra-brs Don't put non-phrasing content into paragraphs 6 years ago
remove-extra-paragraphs Fix issue #251 by making JSDOMParser expect XML and stop making excuses for 'self-closed' things, when all that does is cause trouble 9 years ago
remove-script-tags Fix issue #251 by making JSDOMParser expect XML and stop making excuses for 'self-closed' things, when all that does is cause trouble 9 years ago
reordering-paragraphs Update test expectations 7 years ago
replace-brs Put phrasing content into paragraphs 6 years ago
replace-font-tags Bug 1259763 - Remove h2 when there is only one h2 and its text content substantially equals article title, r=Gijs 8 years ago
rtl-1 Bug 1300697 - Reader View missed first few paragraphs on New York Times website, r=Gijs 8 years ago
rtl-2 Bug 1300697 - Reader View missed first few paragraphs on New York Times website, r=Gijs 8 years ago
rtl-3 Bug 1300697 - Reader View missed first few paragraphs on New York Times website, r=Gijs 8 years ago
rtl-4 Bug 1300697 - Reader View missed first few paragraphs on New York Times website, r=Gijs 8 years ago
salon-1 Improve metadata extraction (#478) 6 years ago
simplyfound-1 Improve metadata extraction (#478) 6 years ago
social-buttons Update test expectations. 7 years ago
style-tags-removal Bug 1259763 - Remove h2 when there is only one h2 and its text content substantially equals article title, r=Gijs 8 years ago
svg-parsing Update test expectations 7 years ago
table-style-attributes Remove single-cell tables 6 years ago
telegraph Match headings on trimmed strings to avoid whitespace causing mismatches 6 years ago
title-and-h1-discrepancy Add test case for title and h1 discrepancy 7 years ago
tmz-1 Put phrasing content into paragraphs 6 years ago
tumblr Improve metadata extraction (#478) 6 years ago
wapo-1 Don't convert DIVs to Ps when more than 25% links 6 years ago
wapo-2 Put phrasing content into paragraphs 6 years ago
webmd-1 Don't convert DIVs to Ps when more than 25% links 6 years ago
webmd-2 Don't convert DIVs to Ps when more than 25% links 6 years ago
wikia Fix relative URIs given <base> tags (#422) 6 years ago
wikipedia Fix #283 and remove hidden nodes 6 years ago
wordpress Improve metadata extraction (#478) 6 years ago
yahoo-1 Put phrasing content into paragraphs 6 years ago
yahoo-2 Improve metadata extraction (#478) 6 years ago
yahoo-3 Improve metadata extraction (#478) 6 years ago
yahoo-4 Improve metadata extraction (#478) 6 years ago
youth Update test expectations 7 years ago