mercury-parser

Commit Graph

Author	SHA1	Message	Date
Janet	e035f36361	feat: reuters parser (#78 ) * feat: reuters parser Date parses correctly but fails test because of format discrepancy. Author tags are nested within the content, which is why the author names are appearing twice. I wasn’t sure how to address this. Additionally, the location appears twice, so I cleaned the location tags from the content. * test: fix date format * transform .article-subtitle to h4; cleaning author but leaving location	8 years ago
Janet	dec49ab073	feat: mashable parser (#76 ) * feat: mashable parser As usual the date is giving me issues because of formatting discrepancies: AssertionError: '2016-12-13T22:33:06.000Z' == '2016-12-14T03:33:06.000Z' Not sure how we wanna deal with Twitter card embeds that don’t show up? Also, image credits did not show up in preview. * test: fixed date format * transforming .image-credit to figcaption	8 years ago
Janet	cddc1afb69	feat: chicago tribune parser (#75 ) * feat: chicago tribune parser Date is parsing but failing the test because: AssertionError: '2016-12-13T21:45:00.000Z' == '2016-12-13T13:45:00-0800' I tried to insert a line of code for Time Zone but I’m a n00b so I don’t think I did it right. No image showing up in the preview. * fix: remove timezone from date_published extractor * test: update unit tests to assert the correct value for date_published	8 years ago
Janet	aff651c2d8	feat: hellogiggles parser (#107 ) Looks good to me!	8 years ago
Janet	11ad7b9a92	feat: thought catalog parser (#102 ) Looks good!	8 years ago
Janet	aa43a6091c	feat: cnbc parser (#96 ) Should be good to go!	8 years ago
Janet	cd245f7980	feat: popsugar parser (#93 ) I think this one is good to go!	8 years ago
Janet	a8ab7135e1	feat: observer parser (#91 ) no problems	8 years ago
Janet	3bee7224cb	feat: nbc news parser (#74 )	8 years ago
Janet	88242dd233	feat: nj.com parser (#73 )	8 years ago
Janet	1ac5670a54	feat: inquisitor parser (#72 )	8 years ago
Janet	9e5b91ed8b	feat: refinery29 parser (#71 )	8 years ago
Janet	b78c58c43a	feat: miami herald parser (#69 )	8 years ago
Janet	aedf83edc6	feat: eonline parser (#68 )	8 years ago
Janet	a20da5eb31	uproxx extractor (#66 )	8 years ago
Janet	87c42b6358	feat: 247sports.com extractor (#64 )	8 years ago
Janet	22e6c884fb	feat: rolling stone extractor (#65 )	8 years ago
Janet	6337231697	feat: usmagazine extractor (#63 )	8 years ago
Janet	c06b19efe7	feat: people extractor (#70 ) No major problems!	8 years ago
Janet	3cf2bb78c4	feat: vox custom parser (#67 )	8 years ago
Adam Pash	a710efd2d5	release: 1.0.3 (#62 )	8 years ago
Janet	861c5f0dcb	feat: bustle extractor (#60 )	8 years ago
Adam Pash	06397a4360	feat: browser-friendly selector for medium (#61 )	8 years ago
Adam Pash	3297ab079d	feat: bloomberg extractor (#59 ) Bloomberg has several templates. I'm supporting three different templates here, but I'm not sure that this is complete by any means. It's also worth noting that SVGs don't make it through the parser terribly well for many reasons. One, for example, is that a lot of SVGs require custom CSS in order for them to make sense. I'm not sure this is something we can expect to address in the parser.	8 years ago
Janet	e55e9da534	feat: sbnation extractor (#55 )	8 years ago
Adam Pash	8070e4790b	test: streamlined guardian tests w/new single-extraction (#58 )	8 years ago
Adam Pash	bdb751fb53	feat: more cleaning for wired (#56 )	8 years ago
Janet	e7e41bd242	feat: the guardian custom extractor (#41 )	8 years ago
Adam Pash	332f85928f	release: 1.0.2 (#54 )	8 years ago
Adam Pash	81aa89f2c1	feat: youtube custom extractor (#53 )	8 years ago
Adam Pash	2fb47640f2	Feat: detect platforms (#52 ) Detectors for matching extractors for publishing platforms. Currently supporting Medium and Blogger.	8 years ago
Adam Pash	64c0fad2fd	fix: preserve whitespace (#51 ) No longer normalizing whitespace in html	8 years ago
Adam Pash	15656cb3e1	Refactor: running tests more efficiently (#49 ) Only running one parser per page we're testing rather than a parser per field we're testing.	8 years ago
Adam Pash	edcb7295d1	release: 1.0.1 (#48 )	8 years ago
Adam Pash	f9902cfa05	Fix: extension bugs (#47 ) * feat: lead image on atlantic stories now included * feat: supporting buzzfeed "longform" template * feat: cleaning .parter-box from the atlantic	8 years ago
Adam Pash	16860f1d85	feat: improved nyt parser (#46 ) NYT was one of the first, and its test was stale and it didn't have all of its fields well defined.	8 years ago
Adam Pash	d0453efbf8	feat: improvements for nyer magazine articles (#45 ) adds dek and date_published for magazine template	8 years ago
Adam Pash	00f8965c1f	fix: cleaning up deks (#44 ) We've solidified what we consider a dek. This PR removes the dek selectors that do not fit that mold.	8 years ago
Janet	b415d1d37c	feat: aol custom extractor (#42 ) * feat: aol custom parser * removed work from other commits. merged with latest master	8 years ago
Matt	4cc3b68b5e	feat: remove footer links (#40 ) the links at the bottom of the stories feel a little spammy because of how we treat links vs. the way they are displayed on the Times, would like to clean them	8 years ago
Adam Pash	e9a36d6ebd	release: 1.0.0 so we can start doing proper releaes (#39 )	8 years ago
Adam Pash	ff1963bdca	feat: new cleaner for wapo (#38 )	8 years ago
Adam Pash	0e6ccdf622	fix: browser cleanup (#35 ) Cleaning up after the parser when it's done in the browser, before returning result.	8 years ago
Adam Pash	bd0694fbba	feat: preview with optional rebuild (#36 ) Now the preview script has an optional build step. Adding --no-rebuild as an argument to the script will skip the rebuild step and just show a preview of the parse as is with the current build.	8 years ago
Adam Pash	181b39b238	feat: ci speedup (#37 ) minor speedup to see failing tests. linting happens first	8 years ago
Silas Burton	c3d98a0d76	Feat cnn extractor (#34 ) * wip: cnn custom extactor * wip: cnn works except first paragraph * final touches on cnn parser * cleanup	8 years ago
Silas Burton	a0570f8e94	feat: extractor for the verge (#33 ) * feat: extractor for the verge's standard article template * feat: basic support for the verge feature template * feat: allow multiple links to be previewed * feat: content selector arrays Content selector arrays allow custom parsers to select multiple elements to match and include in the result. * feat: updated verge parser to use multimatch selectors * lint fix * cleanup test builds	8 years ago
Adam Pash	233ca11a33	fix: added timezone to new republic date (#32 )	8 years ago
Adam Pash	cfe7f34be4	fix: normalizing spaces for authors/dek/title (#31 ) * fix: normalizing spaces for authors/dek/title	8 years ago
Adam Pash	9a23b24a89	feat: adjustment for huffpo. skipping overly aggressive default cleaners (#30 )	8 years ago

... 5 6 7 8 9 ...

543 Commits (025261c12040ef4092d6bf5cd77a3c1b8d22999b) All Branches Search

543 Commits (025261c12040ef4092d6bf5cd77a3c1b8d22999b)

All Branches