readability

Commit Graph

Author	SHA1	Message	Date
PalmerAL	3844d8f05b	Include more ancestors in candidate scoring (#611 ) * include more ancestors in candidate scoring * fix medium-3 testcase The original source file contained two copies of the document, which was causing incorrect results * remove unnecessary nested elements * fix removal of empty elements * add option to regenerate all testcases * update tests * fix quanta testcase * fix creating testcase from network * fix early exit in testcase generation * format HTML before comparing while testing * upgrade js-beautify * don't merge outer readability div	4 years ago
Dan Burzo	2ca98284e9	Prefer JSON-LD metadata object, when present (#609 ) * Prefer JSON-LD metadata object, when present * Log JSON-LD parsing error * Trim all JSON-LD fields	4 years ago
Radhi	52ab9b5c89	Fix lazy-loaded images are not visible in Kinja sites (#590 ) * Add initial test case for kinja's lazy image * Implement method to remove small data uri image * Convert relative uri in poster and srcset of media nodes * Eslint doesn't like arrow function * Unescape HTML entities in metadata * Fix wrong regex for parsing srcset urls * Remove line to check data url since it already handled by new URL * Replace String.matchAll since it only supported in Node 12+ * Use numeric code when unescaping HTML * Don't remove data URL src if it's svg * Don't remove b64 src if it's the only attr that contains image * Make the comma part non-optional in regex for srcset url * Fix wrong code for unescaping HTML * Don't capture comma and semicolon in data URL regex	5 years ago
Gijs Kruitbosch	d5621f85e7	Fix #585 - remove nodes with role=complementary	5 years ago
Radhi Fadlillah	3976fa34e9	Don't use data-old- prefix if old img attr not exists	5 years ago
Radhi Fadlillah	d8366f0686	Keep all attributes that might contain image	5 years ago
Radhi Fadlillah	1277d22b81	Keep old img src as data attribute	5 years ago
Radhi Fadlillah	6fed28610d	Simplify loop for unwrapping noscript	5 years ago
Radhi Fadlillah	89572ad29a	Update test for several pages	5 years ago
Radhi Fadlillah	d784bf7e20	Add method to unwrap img inside noscript	5 years ago
Gijs Kruitbosch	b2f3a43f9f	Detect 'trailing' content when comparing DOMs	5 years ago
PalmerAL	61ef00a853	add exception for wikimedia math images	5 years ago
PalmerAL	7c91bdd275	preserve children when removing javascript: links	5 years ago
Gijs	d6fc38c4b4	Fix #564 by allowing 'content' as an indicator of readable content (#565 ) This avoid `contentWithSidebar` causing complete removal of the content. As a side-effect, it slightly improves byline detection by not removing content as early on as before.	5 years ago
PalmerAL	b551f1cf6e	Fix missing content on Wikipedia articles (#560 )	5 years ago
Joe Winett	60f470c4bb	Remove aria-hidden="true" nodes (fixes #541 ) (#555 ) Remove aria-hidden="true" nodes (fixes #541)	5 years ago
PalmerAL	9092b2a29c	Remove sharing elements in fewer situations (#545 ) * remove fewer share elements * simplify and fix social-buttons testcase	5 years ago
PalmerAL	814f0a3884	Add support for detecting lazy-loaded images (#542 ) Add support for detecting lazy-loaded images using `src` or `srcset` attributes.	5 years ago
Gijs Kruitbosch	cb5771fd4a	Add nested font tags to test _setNodeTag on those (see #59 )	6 years ago
Radhi	9009f64f9c	Fix table header missing (#530 )	6 years ago
Radhi	6761a7e412	Fix embedded videos getting removed (#526 ) Fix embedded videos getting removed	6 years ago
PalmerAL	b9cece3e58	add test	6 years ago
PalmerAL	a014e0c9c8	exclude graphs from nytimes articles	6 years ago
Radhi Fadlillah	c942b32945	Revert source files and fix expected results	6 years ago
Radhi Fadlillah	bd5087d2f1	fix error in testing "wikipedia"	6 years ago
Radhi Fadlillah	3e025d58e5	fix error in testing "lwn-01"	6 years ago
Radhi Fadlillah	df95c9d717	fix error in testing "keep-tabulard-data"	6 years ago
Radhi Fadlillah	6a5066abe2	Fix tabular data got removed	6 years ago
PalmerAL	f70d36852b	check itemprop when determining whether a node is a byline	6 years ago
Andres Rey	d41de78c26	Close img tag	6 years ago
Andres Rey	1187b2dae1	Update test expectations	6 years ago
Andres Rey	3ca8c12d87	Update test expectations	6 years ago
Andres Rey	4ffd482004	Add medicalnewstoday test case with incorrect results	6 years ago
Gijs Kruitbosch	60ef565b67	Don't choke on <meta> tags that do not have a content attribute	6 years ago
Gijs Kruitbosch	e8bb7f722f	Fix whitespace normalization in title metadata When switching to a newer version of JSDOM, it is more literal about listing whitespace as part of textContent, including newlines and not normalizing multiple spaces. It seems prudent to just always normalize whitespace for titles, which are guaranteed to be pretty short anyway.	6 years ago
Gijs Kruitbosch	3610476663	Remove CSS that jsdom struggles to parse	6 years ago
Maria Luiza Soares	8c41d92560	Assert on siteName in all test cases	6 years ago
Maria Luiza Soares	1bac47c70d	Add newly generated test case	6 years ago
Daniel Aleksandersen	5a69d4a8eb	Improve metadata extraction (#478 ) * Improve metadata extraction * Recognize meta[property] as a space-separated list * Recognize Dulin Core (dc\|dcterm): metadata. * Prefer Dublin Core, Open Graph, Twitter, and HTML in that order. * _getArticleTitle() is now only used as fallback if document doesn't provide good metadata.	6 years ago
Gijs Kruitbosch	f782bc5f06	Avoid global flag when looking for metadata using regexes	6 years ago
David A Roberts	ea4165721f	Remove single-cell tables	6 years ago
David A Roberts	bf64b58d90	Update tests	6 years ago
Gijs Kruitbosch	d4b842c82a	Match headings on trimmed strings to avoid whitespace causing mismatches	6 years ago
Gijs Kruitbosch	8c02a0d34c	Fix #283 and remove hidden nodes	6 years ago
David A Roberts	656a6673d9	Don't put non-phrasing content into paragraphs	6 years ago
David A Roberts	5ae90930cd	Don't convert DIVs to Ps when more than 25% links	6 years ago
David A Roberts	9f2c5cb42e	Put phrasing content into paragraphs This removes the need for `p.readability-styled` elements.	6 years ago
David A Roberts	7a24801958	Don't include root html node in candidates Fixes #435	6 years ago
Andres Rey	3c76104adb	Fix engadget test case	7 years ago
Andres Rey	4b99f41ec9	Add engadget test case	7 years ago

1 2 3 4

153 Commits (3844d8f05b3f114e3df16c3bc3caf44e5ba52181)