mercury-parser

mirror of https://github.com/postlight/mercury-parser synced 2024-10-31 03:20:40 +00:00

Author	SHA1	Message	Date
Kevin Ngao	b4fec6af98	feat: androidcentral parser (#119 ) * feat: androidcentral parser	2017-02-07 18:20:04 -05:00
Janet	beb0b89a4f	feat: pagesix parser (#97 ) * feat: pagesix parser	2017-02-07 17:38:09 -05:00
Janet	f2160eb5b6	feat: si parser (#118 ) * feat: si parser	2017-02-07 16:52:11 -05:00
Janet	2af0f6179a	feat: rawstory parser (#109 ) * feat: rawstory parser Finished, with a little help from Frankie (thanks Frankie!) * fix: date_published timezone	2017-02-07 12:53:05 -07:00
Janet	765032452d	feat: thefederalistpapers parser (#101 ) * feat: thefederalistpapers parser	2017-02-07 14:30:52 -05:00
Janet	fb5eb2e104	feat: cnet parser (#104 ) * feat: cnet parser Date test fail - please take a look! Also, image didn’t load in preview. * fix: timezone * fix: image lead	2017-02-07 11:55:04 -07:00
Janet	3c5fa28f10	feat: cbs sports parser (#98 ) * feat: cbs sports parser	2017-02-07 10:45:48 -05:00
Janet	3cf2d0d3ef	feat: msnbc parser (#100 ) * feat: msnbc parser	2017-02-06 18:08:49 -05:00
Janet	f9ab9eb885	feat: howtogeek extractor (#108 ) * feat: howtogeek extractor This one is a bit tricky - the author and date info appear in a comment section at the bottom. Was able to parse the author but not the date info. Halp * howtogeek update Thanks to @fdsimms I was able to parse the date, but not sure what to test it against, so I left it blank. * fix: date_published assertion, it was comparing against empty string * fix: timezone * amend: generalize author selector	2017-02-06 15:23:15 -07:00
Janet	258acdfd02	feat: opposing views parser (#103 ) * feat: opposing views parser	2017-02-06 12:22:42 -05:00
Janet	b63dd33579	feat: today parser (#106 ) * feat: today parser This looks fine — there are a couple of lines of “Related” but they are within the body (and don’t have their own classes) so I couldn’t clean them out. * fix: fix content assertion	2017-02-06 09:20:12 -07:00
Janet	c94eee7f92	feat: cinema blend parser (#105 ) * feat: cinema blend parser all systems go * fix: timezone	2017-02-06 09:02:11 -07:00
Janet	64e3c205e8	feat: the political insider parser (#99 ) * feat: the political insider parser with timezone	2017-02-03 16:25:16 -05:00
Janet	7b52d3d1fc	feat: al.com parser (#110 ) * feat: al.com parser I think this is good but could you pls double check time zone on the date? Thanks * fix: date_published timezone	2017-02-03 11:45:45 -07:00
Janet	15df58496f	feat: westernjournalism parser (#113 ) * feat: westernjournalism parser Adjacent sibling selector FTW! Image not displaying in preview. * feat: fix assertion, body does not include _Advertisement_ subtext	2017-02-03 11:15:50 -07:00
Janet	ae12a1d701	feat: mental floss parser (#94 ) * feat: mental floss parser	2017-02-03 11:40:01 -05:00
Janet	bf29291395	feat: thepennyhoarder parser (#112 ) * feat: thepennyhoarder parser Looks good, although no image in preview! * fix: adds selector for article lead image	2017-02-03 08:56:15 -07:00
Janet	fadd198d04	feat: abcnewsgo parser (#90 ) * feat: abcnewsgo parser	2017-02-02 17:43:35 -05:00
Adam Pash	25d9642ff9	feat: support cleaning and transforms for all fields (#138 )	2017-02-02 13:42:49 -08:00
Janet	1054d854dd	feat: america now parser (#114 ) * feat: america now parser Looks good but lead image did not display in preview. * feat: adds selector for lead image	2017-02-02 13:46:20 -07:00
David A. Viramontes	7b3ad73282	Merge pull request #115 from postlight/feat-fusion-extractor feat: fusion parser	2017-02-02 12:53:48 -07:00
dviramontes	93c8ba0e56	feat: adds selector for lead image	2017-02-02 12:29:35 -07:00
dviramontes	f71fe7685d	feat: adds video embed transform	2017-02-02 12:11:29 -07:00
dviramontes	a77515d861	fix: author selector, less brittle	2017-02-02 12:10:48 -07:00
Janet	4c48acba59	feat: fusion parser Looks okay — image did not load in preview.	2017-02-02 10:54:49 -07:00
David A. Viramontes	fa71cacf5a	Merge pull request #137 from postlight/feat-the-verge-polygon-supported-domain feat: adds www.polygon.com to list of www.theverge.com supportedDomains	2017-02-02 10:50:36 -07:00
David A. Viramontes	c679e493de	Merge branch 'master' into feat-the-verge-polygon-supported-domain	2017-02-02 10:41:37 -07:00
Janet	d292d8ef3a	feat: ny daily news parser (#87 ) * feat: ny daily news parser	2017-02-02 12:30:16 -05:00
dviramontes	a53587acef	feat: adds www.polygon.com to list of www.theverge.com supportedDomains	2017-02-02 10:23:20 -07:00
Janet	385b9d76a3	feat: sciencefly extractor (#116 ) * feat: sciencefly extractor, use loading image rather than 404'ing meta	2017-02-02 11:26:29 -05:00
Adam Pash	601b0fac16	release: 1.0.5 (#136 )	2017-02-01 15:39:19 -08:00
Adam Pash	6bd6278a07	feat: custom parser for wh blog (#130 )	2017-01-31 15:50:39 -08:00
Adam Pash	aa682d71e8	fix: medium bug (#129 ) * fix: improved medium parser for images and multi-section content * fix: duplicate video	2017-01-31 15:28:25 -08:00
Adam Pash	4e049de61a	fix: i put a bad comment in .gitattributes (#125 ) * marking html fixtures as "vendored" * fix: bad comment	2017-01-27 10:26:03 -08:00
Adam Pash	8aa215c4c2	chore: marking html fixtures as "vendored" (#124 )	2017-01-27 10:06:48 -08:00
Adam Pash	31eb4f9222	Feat: LinkedIn parser (#123 ) * feat: rebuild custom parser * feat: linkedin custom parser	2017-01-26 10:11:10 -08:00
Adam Pash	dbc706410b	release: 1.0.4 (#122 )	2017-01-26 08:42:37 -08:00
Adam Pash	8662474d8a	feat: changed user agent to latest chrome (#121 ) * feat: changed user agent to latest chrome * removed dead link	2017-01-26 08:10:43 -08:00
Janet	7709d69379	feat: npr parser (#86 ) * feat: npr parser Lead image appears in preview, but the test fails for some reason. AssertionError: null == 'https://media.npr.org/assets/img/2016/12/15/gettyimages-540681598_wide- 8b160732b96c083dc115134c3c019f3ac73586ba.jpg?s=1400' Looks okay otherwise. * feat: transformed figures/figcaptions, improved date_published and addressed NPR's bad image metadata	2017-01-23 17:23:02 -08:00
Janet	8a82f2c0ab	feat: recode parser (#85 ) * feat: recode parser Thumbs up, as far as I can tell. Note: No image appeared in the preview. * feat: pulling in lead image	2017-01-23 17:02:33 -08:00
Janet	ad29acd7b7	feat: fortune parser (#84 ) * feat: fortune parser For some reason, the dek doesn’t appear in the local version of the article I selected. I tried parsing the meta tag containing og:description but it’s not working, and the description is slightly longer than the dek in the original article. I’m not sure why, but for the lead image, the meta tag for og:image is not parsing the image url. :( * feat: fortune redesigned, so re-did extractor * fix: added timezone	2017-01-23 16:47:06 -08:00
Janet	c133ddf614	feat: qz parser (#81 ) * feat: qz parser I couldn’t figure out how to parse the date, but otherwise should be fine. I added a clean for the div.article-aside element based on what I saw in how the chrome extension worked. * feat: updated content to grab top image test: date is null :/	2017-01-23 16:08:07 -08:00
Janet	84312b6ef1	feat: dmagazine parser (#80 ) * feat: dmagazine parser I’m sorry to have failed you. :-( These are the issues I encountered: 1) author - does not have a unique selector to distinguish it from the date, couldn’t parse it 2) date - no meta data in the head 3) no meta og:image in the head (my go to), so I couldn’t get the image test to pass, but it appears to be parsing. The caption below it is the same size as the body copy in the preview. I couldn’t figure out how to “transform” it to caption size. * feat: update date, image, and author selectors and corresponding tests * feat: generalized content selector	2017-01-23 15:52:05 -08:00
Janet	e035f36361	feat: reuters parser (#78 ) * feat: reuters parser Date parses correctly but fails test because of format discrepancy. Author tags are nested within the content, which is why the author names are appearing twice. I wasn’t sure how to address this. Additionally, the location appears twice, so I cleaned the location tags from the content. * test: fix date format * transform .article-subtitle to h4; cleaning author but leaving location	2017-01-23 15:16:37 -08:00
Janet	dec49ab073	feat: mashable parser (#76 ) * feat: mashable parser As usual the date is giving me issues because of formatting discrepancies: AssertionError: '2016-12-13T22:33:06.000Z' == '2016-12-14T03:33:06.000Z' Not sure how we wanna deal with Twitter card embeds that don’t show up? Also, image credits did not show up in preview. * test: fixed date format * transforming .image-credit to figcaption	2017-01-23 15:00:18 -08:00
Janet	cddc1afb69	feat: chicago tribune parser (#75 ) * feat: chicago tribune parser Date is parsing but failing the test because: AssertionError: '2016-12-13T21:45:00.000Z' == '2016-12-13T13:45:00-0800' I tried to insert a line of code for Time Zone but I’m a n00b so I don’t think I did it right. No image showing up in the preview. * fix: remove timezone from date_published extractor * test: update unit tests to assert the correct value for date_published	2017-01-22 12:18:10 -05:00
Janet	aff651c2d8	feat: hellogiggles parser (#107 ) Looks good to me!	2017-01-21 14:07:20 -05:00
Janet	11ad7b9a92	feat: thought catalog parser (#102 ) Looks good!	2017-01-21 13:52:00 -05:00
Janet	aa43a6091c	feat: cnbc parser (#96 ) Should be good to go!	2017-01-21 13:25:23 -05:00
Janet	cd245f7980	feat: popsugar parser (#93 ) I think this one is good to go!	2017-01-21 13:11:00 -05:00

1 2 3 4 5 ...

286 Commits