You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
mercury-parser/src
John Brayton e217648c0b
feat: ma.ttias.be extractor (#551)
* feat:Add a custom extractor for ma.ttias.be.

When parsing content for cron.weekly issues, such as the one at https://ma.ttias.be/cronweekly/issue-130/, Mercury Parser would remove headings and ordered lists that were part of the content. This resolves that as follows:

* Remove "id" attributes from "h1" and "h2" elements. Those attributes would result in the elements having a low weight.
* Since Mercury Parser demotes "h1" elements to "h2", demote "h2" elements to "h3".
* Add class="entry-content-asset" to "ul" elements to avoid them being removed.

* removed redundant comment.

Co-authored-by: John Holdun <john@johnholdun.com>
2 years ago
..
cleaners feat: Add custom parser for Reddit (#307) 5 years ago
extractors feat: ma.ttias.be extractor (#551) 2 years ago
resource fix: don't try to re-decode prepared response (#498) 2 years ago
shims deps: upgrade (#218) 5 years ago
utils fix: skip absolutizing invalid srcsets (#386) 5 years ago
mercury.js feat: ability to add custom extractors via api (#484) 5 years ago
mercury.test.js feat: ability to add custom extractors via api (#484) 5 years ago
test-helpers.js feat: Various Character Encoding Improvements (#270) 5 years ago