readability

Archives/readability

Fork 0

Commit Graph

Author	SHA1	Message	Date
PalmerAL	3844d8f05b	Include more ancestors in candidate scoring (#611 ) * include more ancestors in candidate scoring * fix medium-3 testcase The original source file contained two copies of the document, which was causing incorrect results * remove unnecessary nested elements * fix removal of empty elements * add option to regenerate all testcases * update tests * fix quanta testcase * fix creating testcase from network * fix early exit in testcase generation * format HTML before comparing while testing * upgrade js-beautify * don't merge outer readability div	2020-08-21 10:16:58 +01:00
Daniel Aleksandersen	5a69d4a8eb	Improve metadata extraction (#478 ) * Improve metadata extraction * Recognize meta[property] as a space-separated list * Recognize Dulin Core (dc\|dcterm): metadata. * Prefer Dublin Core, Open Graph, Twitter, and HTML in that order. * _getArticleTitle() is now only used as fallback if document doesn't provide good metadata.	2018-08-25 00:28:00 +01:00

Author

SHA1

Message

Date

PalmerAL

3844d8f05b

Include more ancestors in candidate scoring (#611 )

* include more ancestors in candidate scoring

* fix medium-3 testcase

The original source file contained two copies of the document, which
was causing incorrect results

* remove unnecessary nested elements

* fix removal of empty elements

* add option to regenerate all testcases

* update tests

* fix quanta testcase

* fix creating testcase from network

* fix early exit in testcase generation

* format HTML before comparing while testing

* upgrade js-beautify

* don't merge outer readability div

2020-08-21 10:16:58 +01:00

Daniel Aleksandersen

5a69d4a8eb

Improve metadata extraction (#478 )

* Improve metadata extraction

* Recognize meta[property] as a space-separated list
* Recognize Dulin Core (dc|dcterm): metadata.
* Prefer Dublin Core, Open Graph, Twitter, and HTML in that order.
* _getArticleTitle() is now only used as fallback if document
 doesn't provide good metadata.

2018-08-25 00:28:00 +01:00

2 Commits