mercury-parser

mirror of https://github.com/postlight/mercury-parser synced 2024-11-17 03:25:31 +00:00

Author	SHA1	Message	Date
Janet	4c48acba59	feat: fusion parser Looks okay — image did not load in preview.	2017-02-02 10:54:49 -07:00
Janet	d292d8ef3a	feat: ny daily news parser (#87 ) * feat: ny daily news parser	2017-02-02 12:30:16 -05:00
Janet	385b9d76a3	feat: sciencefly extractor (#116 ) * feat: sciencefly extractor, use loading image rather than 404'ing meta	2017-02-02 11:26:29 -05:00
Adam Pash	6bd6278a07	feat: custom parser for wh blog (#130 )	2017-01-31 15:50:39 -08:00
Adam Pash	aa682d71e8	fix: medium bug (#129 ) * fix: improved medium parser for images and multi-section content * fix: duplicate video	2017-01-31 15:28:25 -08:00
Adam Pash	31eb4f9222	Feat: LinkedIn parser (#123 ) * feat: rebuild custom parser * feat: linkedin custom parser	2017-01-26 10:11:10 -08:00
Janet	7709d69379	feat: npr parser (#86 ) * feat: npr parser Lead image appears in preview, but the test fails for some reason. AssertionError: null == 'https://media.npr.org/assets/img/2016/12/15/gettyimages-540681598_wide- 8b160732b96c083dc115134c3c019f3ac73586ba.jpg?s=1400' Looks okay otherwise. * feat: transformed figures/figcaptions, improved date_published and addressed NPR's bad image metadata	2017-01-23 17:23:02 -08:00
Janet	8a82f2c0ab	feat: recode parser (#85 ) * feat: recode parser Thumbs up, as far as I can tell. Note: No image appeared in the preview. * feat: pulling in lead image	2017-01-23 17:02:33 -08:00
Janet	ad29acd7b7	feat: fortune parser (#84 ) * feat: fortune parser For some reason, the dek doesn’t appear in the local version of the article I selected. I tried parsing the meta tag containing og:description but it’s not working, and the description is slightly longer than the dek in the original article. I’m not sure why, but for the lead image, the meta tag for og:image is not parsing the image url. :( * feat: fortune redesigned, so re-did extractor * fix: added timezone	2017-01-23 16:47:06 -08:00
Janet	c133ddf614	feat: qz parser (#81 ) * feat: qz parser I couldn’t figure out how to parse the date, but otherwise should be fine. I added a clean for the div.article-aside element based on what I saw in how the chrome extension worked. * feat: updated content to grab top image test: date is null :/	2017-01-23 16:08:07 -08:00
Janet	84312b6ef1	feat: dmagazine parser (#80 ) * feat: dmagazine parser I’m sorry to have failed you. :-( These are the issues I encountered: 1) author - does not have a unique selector to distinguish it from the date, couldn’t parse it 2) date - no meta data in the head 3) no meta og:image in the head (my go to), so I couldn’t get the image test to pass, but it appears to be parsing. The caption below it is the same size as the body copy in the preview. I couldn’t figure out how to “transform” it to caption size. * feat: update date, image, and author selectors and corresponding tests * feat: generalized content selector	2017-01-23 15:52:05 -08:00
Janet	e035f36361	feat: reuters parser (#78 ) * feat: reuters parser Date parses correctly but fails test because of format discrepancy. Author tags are nested within the content, which is why the author names are appearing twice. I wasn’t sure how to address this. Additionally, the location appears twice, so I cleaned the location tags from the content. * test: fix date format * transform .article-subtitle to h4; cleaning author but leaving location	2017-01-23 15:16:37 -08:00
Janet	dec49ab073	feat: mashable parser (#76 ) * feat: mashable parser As usual the date is giving me issues because of formatting discrepancies: AssertionError: '2016-12-13T22:33:06.000Z' == '2016-12-14T03:33:06.000Z' Not sure how we wanna deal with Twitter card embeds that don’t show up? Also, image credits did not show up in preview. * test: fixed date format * transforming .image-credit to figcaption	2017-01-23 15:00:18 -08:00
Janet	cddc1afb69	feat: chicago tribune parser (#75 ) * feat: chicago tribune parser Date is parsing but failing the test because: AssertionError: '2016-12-13T21:45:00.000Z' == '2016-12-13T13:45:00-0800' I tried to insert a line of code for Time Zone but I’m a n00b so I don’t think I did it right. No image showing up in the preview. * fix: remove timezone from date_published extractor * test: update unit tests to assert the correct value for date_published	2017-01-22 12:18:10 -05:00
Janet	aff651c2d8	feat: hellogiggles parser (#107 ) Looks good to me!	2017-01-21 14:07:20 -05:00
Janet	11ad7b9a92	feat: thought catalog parser (#102 ) Looks good!	2017-01-21 13:52:00 -05:00
Janet	aa43a6091c	feat: cnbc parser (#96 ) Should be good to go!	2017-01-21 13:25:23 -05:00
Janet	cd245f7980	feat: popsugar parser (#93 ) I think this one is good to go!	2017-01-21 13:11:00 -05:00
Janet	a8ab7135e1	feat: observer parser (#91 ) no problems	2017-01-21 12:47:26 -05:00
Janet	3bee7224cb	feat: nbc news parser (#74 )	2017-01-18 17:28:21 -08:00
Janet	88242dd233	feat: nj.com parser (#73 )	2017-01-18 16:49:05 -08:00
Janet	1ac5670a54	feat: inquisitor parser (#72 )	2017-01-18 16:34:22 -08:00
Janet	9e5b91ed8b	feat: refinery29 parser (#71 )	2016-12-21 21:57:13 -08:00
Janet	b78c58c43a	feat: miami herald parser (#69 )	2016-12-21 21:35:34 -08:00
Janet	aedf83edc6	feat: eonline parser (#68 )	2016-12-21 21:24:14 -08:00
Janet	a20da5eb31	uproxx extractor (#66 )	2016-12-21 21:05:10 -08:00
Janet	87c42b6358	feat: 247sports.com extractor (#64 )	2016-12-21 20:52:23 -08:00
Janet	22e6c884fb	feat: rolling stone extractor (#65 )	2016-12-21 20:30:34 -08:00
Janet	6337231697	feat: usmagazine extractor (#63 )	2016-12-21 20:06:47 -08:00
Janet	c06b19efe7	feat: people extractor (#70 ) No major problems!	2016-12-21 19:46:48 -08:00
Janet	3cf2bb78c4	feat: vox custom parser (#67 )	2016-12-15 17:48:15 -08:00
Janet	861c5f0dcb	feat: bustle extractor (#60 )	2016-12-08 15:32:08 -05:00
Adam Pash	3297ab079d	feat: bloomberg extractor (#59 ) Bloomberg has several templates. I'm supporting three different templates here, but I'm not sure that this is complete by any means. It's also worth noting that SVGs don't make it through the parser terribly well for many reasons. One, for example, is that a lot of SVGs require custom CSS in order for them to make sense. I'm not sure this is something we can expect to address in the parser.	2016-12-07 14:39:00 -05:00
Janet	e55e9da534	feat: sbnation extractor (#55 )	2016-12-07 14:25:57 -05:00
Adam Pash	81aa89f2c1	feat: youtube custom extractor (#53 )	2016-12-06 12:36:51 -05:00
Adam Pash	f9902cfa05	Fix: extension bugs (#47 ) * feat: lead image on atlantic stories now included * feat: supporting buzzfeed "longform" template * feat: cleaning .parter-box from the atlantic	2016-12-02 16:02:00 -08:00
Adam Pash	d0453efbf8	feat: improvements for nyer magazine articles (#45 ) adds dek and date_published for magazine template	2016-12-02 15:30:09 -08:00
Janet	b415d1d37c	feat: aol custom extractor (#42 ) * feat: aol custom parser * removed work from other commits. merged with latest master	2016-12-01 17:05:15 -08:00
Silas Burton	c3d98a0d76	Feat cnn extractor (#34 ) * wip: cnn custom extactor * wip: cnn works except first paragraph * final touches on cnn parser * cleanup	2016-11-30 14:55:04 -08:00
Silas Burton	a0570f8e94	feat: extractor for the verge (#33 ) * feat: extractor for the verge's standard article template * feat: basic support for the verge feature template * feat: allow multiple links to be previewed * feat: content selector arrays Content selector arrays allow custom parsers to select multiple elements to match and include in the result. * feat: updated verge parser to use multimatch selectors * lint fix * cleanup test builds	2016-11-30 14:08:56 -08:00
Silas Burton	be2e4b5c80	Feat: huffington post extractor (#28 ) * wip: huffpo custom extractor * wip: some huffpo cleanup	2016-11-29 15:50:48 -08:00
Adam Pash	94198c0a65	feat: new republic custom extractor (#25 ) * wip: new republic custom extractor * feat: new republic article extractor * feat: new republic minutes article extractor	2016-11-29 15:30:52 -08:00
Janet	c4d72fb735	feat: add money.cnn custom parser (#26 ) * feat: add money.cnn custom parser * added timezone to cnn custom parser	2016-11-29 15:13:29 -08:00
Adam Pash	a8face796a	Fix extension bugs (#23 ) * feat: cleaning supplemental elements in nytimes (visible in web only) closes https://github.com/postlight/mercury-reader-chrome-extension/issues/102 * wip * fix: more generous date published bits * feat: added washington post extractor (including figure transforms) closes https://github.com/postlight/mercury-reader-chrome-extension/issues/100 * feat: cleaning zoom lightbox from gizmodo/kinja * lint fix	2016-11-28 16:58:21 -08:00
Adam Pash	3a2f32b0eb	feat: added tmz custom parser (#22 )	2016-11-28 15:10:28 -08:00
Adam Pash	7411922c55	feat: encoding response body based on content-type charset (#21 ) Also some small code organization	2016-11-22 10:44:27 -08:00
Adam Pash	629eada1f7	feat: recording/playing back network requests with nock (#18 ) * feat: recording/playing back network requests with nock * lint fix	2016-10-28 14:54:12 -07:00
Adam Pash	d038a36544	feat: custom medium extractor	2016-10-27 08:47:25 -07:00
Adam Pash	40768fa188	feat: support lazy loading video on deadspin	2016-10-26 11:53:42 -07:00
Drew Bell	76db95e884	feat: Add custom extrator for Apartment Therapy	2016-10-17 10:35:22 -05:00

1 2

66 Commits