Archives/mercury-parser - mercury-parser - blob42 source forge

Author	SHA1	Message	Date
John Brayton	e217648c0b	feat: ma.ttias.be extractor (#551 ) * feat:Add a custom extractor for ma.ttias.be. When parsing content for cron.weekly issues, such as the one at https://ma.ttias.be/cronweekly/issue-130/, Mercury Parser would remove headings and ordered lists that were part of the content. This resolves that as follows: * Remove "id" attributes from "h1" and "h2" elements. Those attributes would result in the elements having a low weight. * Since Mercury Parser demotes "h1" elements to "h2", demote "h2" elements to "h3". * Add class="entry-content-asset" to "ul" elements to avoid them being removed. * removed redundant comment. Co-authored-by: John Holdun <john@johnholdun.com>	2 years ago
Nitin Khanna	8c9982247b	feat: Ladbible.com extractor (#624 ) * Ladbible.com extractors and test * CircleCI says timezone needs to be Europe/London aka BST Co-authored-by: Postlight Bot <adam.pash+postlight-bot@postlight.com> Co-authored-by: Jad Termsani <32297675+JadTermsani@users.noreply.github.com>	3 years ago
Nitin Khanna	30d6f472ee	feat: Times of India extractor (#503 ) * Adding custom parser for Times of India * moved transforms to clean The transforms were just working as cleans. Moved things around as per recommendations. Co-authored-by: Postlight Bot <adam.pash+postlight-bot@postlight.com>	3 years ago
Sven Wiegand	f95947fe88	Implemented custom extractor epaper.zeit.de (#488 )	5 years ago
david0leong	911b0f87c8	Add custom extractor for biorxiv.org (#467 ) * Add custom extractor for biorxiv.org * Fix content selector * Improve content selector	5 years ago
Ben Ubois	0942c37876	feat: custom parser for phoronix.com. (#431 )	5 years ago
Michael P. Geraci	571a913745	feat: pitchfork extractor (#439 ) * generate the custom extractor and get the first test to pass * add the basic extractors (title, author, date, etc) * select the score as well as the review text, and break the content test * prepend the score to the content * get the date from the datetime attribute * mangle this test a little, but just a little (it does work properly) * move from prepending the score to the review text to adding it as a custom field in the extractor	5 years ago
david0leong	694ea820aa	Custom Extractor for clinicaltrials.gov (#305 ) * Add prototype of custom extractor for clinicaltrials.gov * Add .DS_Store to gitignore * Make tests for title, author and date_published selectors pass * Make content selector test pass * Fix date_published test * Rebuild * Remove .DS-Store from gitignore * Improve extractor and text/fixture of clinicaltrials.gov	5 years ago
Wajeeh Zantout	e66ad8b81c	feat: add le monde extractor (#415 )	5 years ago
kik0220	f81dc63617	feat: add rbbtoday.com custom parser (#411 ) * feat: add rbbtoday.com custom parser * fix: content test * fix: dek and content	5 years ago
kik0220	5e1113b3a9	feat: add japan.zdnet.com custom parser (#410 ) * feat: add japan.zdnet.com custom parser * fix: author and date_published selector	5 years ago
kik0220	77e3bc00e2	feat: add wired.jp custom parser (#409 ) * feat: add wired.jp custom parser * fix: author test * fix: date_published selector * test: fix dek and contest * test: fix content (without clean dek)	5 years ago
kik0220	0b36c96de0	feat: add techlog.iij.ad.jp custom parser (#405 ) * feat: add techlog.iij.ad.jp custom parser * fix: date_published and content selector	5 years ago
kik0220	406bf1b1a9	feat: add weekly.ascii.jp custom parser (#401 ) * feat: add weekly.ascii.jp custom parser * fix: title and date_published selector	5 years ago
kik0220	216bfade00	feat: add www.ipa.go.jp custom parser (#408 )	5 years ago
kik0220	3ae8f3bde3	feat: add www.oreilly.co.jp custom parser (#407 )	5 years ago
kik0220	7396e81b72	feat: add sect.iij.ad.jp custom parser (#404 )	5 years ago
kik0220	3f1d9030ee	feat: add www.lifehacker.jp custom parser (#403 )	5 years ago
kik0220	b077000c4a	feat: add getnews.jp custom parser (#402 )	5 years ago
kik0220	b5425c3e8a	feat: add www.gizmodo.jp custom parser (#400 )	5 years ago
kik0220	a38c727a0a	feat: add deadline.com custom parser (#383 ) * feat: add deadline.com custom parser * fix: timezone * fix: date_published selectors * fix: title and author selector * test: transform .embed-twitter * fix: regenerate the fixture and fix content selector	5 years ago
kik0220	74a3c49a3c	feat: add japan.cnet.com custom parser (#382 ) * feat: add japan.cnet.com custom parser * fix: remove transform	5 years ago
kik0220	7b07f88448	feat: add www.yomiuri.co.jp custom parser (#381 )	5 years ago
kik0220	8ca2894751	feat: add bookwalker.jp custom parser (#374 )	5 years ago
kik0220	a5f06ce27a	feat: add takagi-hiromitsu.jp custom parser (#364 )	5 years ago
kik0220	b9c57dbc2f	feat: add www.publickey1.jp custom parser (#365 ) * feat: add www.publickey1.jp custom parser * fix: date_published selector	5 years ago
kik0220	d7dbea8a95	feat: add www.itmedia.co.jp custom parser (#366 ) * feat: add www.itmedia.co.jp custom parser * feat: add nlab.itmedia.co.jp support * fix: title selectors	5 years ago
kik0220	9218f80da6	feat: add www.moongift.jp custom parser (#367 ) * feat: add www.moongift.jp custom parser * fix: date_published selectors * fix: pass test * fix: add timezone	5 years ago
kik0220	4eb73dffb0	feat: add www.infoq.com custom parser (#368 ) * feat: add www.infoq.com custom parser * fix: date_published selector	5 years ago
kik0220	ce5cd2dd0d	feat: add phpspot.org custom parser (#369 ) * feat: add phpspot.org custom parser * fix: date_published selector	5 years ago
kik0220	73be0c5a10	feat: add www.jnsa.org custom parser (#346 ) * feat: add www.jnsa.org custom parser	5 years ago
Adam Pash	eacd1ee97f	feat: custom genius parser. (#284 ) also adds ability to transform value returned by an attribute selector	5 years ago
kik0220	c389c966d7	feat: add jvndb.jvn.jp custom parser (#345 )	5 years ago
kik0220	8493d05cb5	feat: add scan.netsecurity.ne.jp custom parser (#347 )	5 years ago
kik0220	2a76c6c212	feat: add www.elecom.co.jp custom parser (#348 )	5 years ago
kik0220	a9e010b718	feat: add www.sanwa.co.jp custom parser (#349 )	5 years ago
kik0220	1639eae324	feat: add www.asahi.com custom parser (#350 )	5 years ago
kik0220	21f7de70c1	feat: add buzzap.jp custom parser (#351 )	5 years ago
kik0220	f3a7e393a3	feat: add www.ossnews.jp custom parser (#352 )	5 years ago
kik0220	c309bdb373	feat: add otrs.com custom parser (#353 )	5 years ago
Ben Ubois	a7e4c67d1d	Extract content from GitHub repos. (#306 ) * Extract content from GitHub repos. * Add published and dek. * Timezone fix.	5 years ago
Toufic Mouallem	7844129fda	feat: Add custom parser for Reddit (#307 )	5 years ago
Jordan Hotmann	83d1c2401b	feat: add custom extractor for blisterreview.com (#299 )	5 years ago
kik0220	d9a1e7b22b	feat: add news.mynavi.jp custom parser (#287 )	5 years ago
Wajeeh Zantout	1ccd14e1e9	feat: add fortinet custom parser (#188 ) * feat: add fortinet custom parser * fix: eslint error * fix: transform noscript images * feat: add fortinet custom parser * fix: eslint error * fix: transform noscript images * fix: transform method * test: transform method * fix: fs import	5 years ago
Wajeeh Zantout	9b36003b62	feat: add fastcompany custom parser (#191 ) * feat: add fastcompany custom parser * fix: eslint error * fix: test for date_published * feat: add fastcompany custom parser * fix: eslint error * fix: test for date_published * fix: fs import	5 years ago
Janet	f13bb721f6	feat: prospect magazine parser (#147 ) * feat: prospect magazine parser Couldn’t find a way to parse the date but I think it’s good otherwise. * fix: pulls date * fix: add timezone * fix: generalize	7 years ago
Kevin Ngao	1b28713cf5	feat: fool.com parser (#158 ) * feat: add fool.com custom parser	7 years ago
Janet	c18959779d	feat: forward.com parser (#144 ) * feat: forward.com parser LGTM although image didn’t show up in preview * feat: also pull imge into content * fix: generalize selectors * fix: generalize selector	7 years ago
Janet	50e548bac2	feat: qdaily parser (#146 ) * feat: qdaily parser Firstly — I accidentally tried to generate the parser on the master branch, and I’m not sure where it is, maybe floating in the nether world. On to the parser — this one was a bit tricky because things were in Chinese! The content appears to be parsing (as seen in preview) but it’s not passing the test. I noticed the second “ ‘ “ mark isn’t appearing on the parser side. Additionally, some of the lazy loading images aren’t appearing in the preview (I cleaned the wrong lazy load images that appeared), so someone will probably have to work on that (I don’t know how to do transforms yet). * fix tests * fix: selector generalization	7 years ago

1 2 3