mercury-parser

mirror of https://github.com/postlight/mercury-parser synced 2024-11-11 01:10:35 +00:00

Author	SHA1	Message	Date
Janet	87c42b6358	feat: 247sports.com extractor (#64 )	2016-12-21 20:52:23 -08:00
Janet	22e6c884fb	feat: rolling stone extractor (#65 )	2016-12-21 20:30:34 -08:00
Janet	6337231697	feat: usmagazine extractor (#63 )	2016-12-21 20:06:47 -08:00
Janet	c06b19efe7	feat: people extractor (#70 ) No major problems!	2016-12-21 19:46:48 -08:00
Janet	3cf2bb78c4	feat: vox custom parser (#67 )	2016-12-15 17:48:15 -08:00
Adam Pash	a710efd2d5	release: 1.0.3 (#62 )	2016-12-09 12:15:40 -05:00
Janet	861c5f0dcb	feat: bustle extractor (#60 )	2016-12-08 15:32:08 -05:00
Adam Pash	06397a4360	feat: browser-friendly selector for medium (#61 )	2016-12-07 17:58:29 -05:00
Adam Pash	3297ab079d	feat: bloomberg extractor (#59 ) Bloomberg has several templates. I'm supporting three different templates here, but I'm not sure that this is complete by any means. It's also worth noting that SVGs don't make it through the parser terribly well for many reasons. One, for example, is that a lot of SVGs require custom CSS in order for them to make sense. I'm not sure this is something we can expect to address in the parser.	2016-12-07 14:39:00 -05:00
Janet	e55e9da534	feat: sbnation extractor (#55 )	2016-12-07 14:25:57 -05:00
Adam Pash	8070e4790b	test: streamlined guardian tests w/new single-extraction (#58 )	2016-12-07 13:17:25 -05:00
Adam Pash	bdb751fb53	feat: more cleaning for wired (#56 )	2016-12-07 12:15:39 -05:00
Janet	e7e41bd242	feat: the guardian custom extractor (#41 )	2016-12-07 12:05:18 -05:00
Adam Pash	332f85928f	release: 1.0.2 (#54 )	2016-12-06 14:51:01 -05:00
Adam Pash	81aa89f2c1	feat: youtube custom extractor (#53 )	2016-12-06 12:36:51 -05:00
Adam Pash	2fb47640f2	Feat: detect platforms (#52 ) Detectors for matching extractors for publishing platforms. Currently supporting Medium and Blogger.	2016-12-06 12:17:03 -05:00
Adam Pash	64c0fad2fd	fix: preserve whitespace (#51 ) No longer normalizing whitespace in html	2016-12-06 11:31:50 -05:00
Adam Pash	15656cb3e1	Refactor: running tests more efficiently (#49 ) Only running one parser per page we're testing rather than a parser per field we're testing.	2016-12-05 15:39:45 -05:00
Adam Pash	edcb7295d1	release: 1.0.1 (#48 )	2016-12-02 16:14:07 -08:00
Adam Pash	f9902cfa05	Fix: extension bugs (#47 ) * feat: lead image on atlantic stories now included * feat: supporting buzzfeed "longform" template * feat: cleaning .parter-box from the atlantic	2016-12-02 16:02:00 -08:00
Adam Pash	16860f1d85	feat: improved nyt parser (#46 ) NYT was one of the first, and its test was stale and it didn't have all of its fields well defined.	2016-12-02 15:41:26 -08:00
Adam Pash	d0453efbf8	feat: improvements for nyer magazine articles (#45 ) adds dek and date_published for magazine template	2016-12-02 15:30:09 -08:00
Adam Pash	00f8965c1f	fix: cleaning up deks (#44 ) We've solidified what we consider a dek. This PR removes the dek selectors that do not fit that mold.	2016-12-02 15:17:49 -08:00
Janet	b415d1d37c	feat: aol custom extractor (#42 ) * feat: aol custom parser * removed work from other commits. merged with latest master	2016-12-01 17:05:15 -08:00
Matt	4cc3b68b5e	feat: remove footer links (#40 ) the links at the bottom of the stories feel a little spammy because of how we treat links vs. the way they are displayed on the Times, would like to clean them	2016-12-01 08:31:43 -08:00
Adam Pash	e9a36d6ebd	release: 1.0.0 so we can start doing proper releaes (#39 )	2016-11-30 17:49:50 -08:00
Adam Pash	ff1963bdca	feat: new cleaner for wapo (#38 )	2016-11-30 17:01:53 -08:00
Adam Pash	0e6ccdf622	fix: browser cleanup (#35 ) Cleaning up after the parser when it's done in the browser, before returning result.	2016-11-30 16:49:18 -08:00
Adam Pash	bd0694fbba	feat: preview with optional rebuild (#36 ) Now the preview script has an optional build step. Adding --no-rebuild as an argument to the script will skip the rebuild step and just show a preview of the parse as is with the current build.	2016-11-30 16:37:42 -08:00
Adam Pash	181b39b238	feat: ci speedup (#37 ) minor speedup to see failing tests. linting happens first	2016-11-30 16:11:35 -08:00
Silas Burton	c3d98a0d76	Feat cnn extractor (#34 ) * wip: cnn custom extactor * wip: cnn works except first paragraph * final touches on cnn parser * cleanup	2016-11-30 14:55:04 -08:00
Silas Burton	a0570f8e94	feat: extractor for the verge (#33 ) * feat: extractor for the verge's standard article template * feat: basic support for the verge feature template * feat: allow multiple links to be previewed * feat: content selector arrays Content selector arrays allow custom parsers to select multiple elements to match and include in the result. * feat: updated verge parser to use multimatch selectors * lint fix * cleanup test builds	2016-11-30 14:08:56 -08:00
Adam Pash	233ca11a33	fix: added timezone to new republic date (#32 )	2016-11-29 16:54:52 -08:00
Adam Pash	cfe7f34be4	fix: normalizing spaces for authors/dek/title (#31 ) * fix: normalizing spaces for authors/dek/title	2016-11-29 16:43:46 -08:00
Adam Pash	9a23b24a89	feat: adjustment for huffpo. skipping overly aggressive default cleaners (#30 )	2016-11-29 16:16:39 -08:00
Silas Burton	be2e4b5c80	Feat: huffington post extractor (#28 ) * wip: huffpo custom extractor * wip: some huffpo cleanup	2016-11-29 15:50:48 -08:00
Adam Pash	94198c0a65	feat: new republic custom extractor (#25 ) * wip: new republic custom extractor * feat: new republic article extractor * feat: new republic minutes article extractor	2016-11-29 15:30:52 -08:00
Janet	c4d72fb735	feat: add money.cnn custom parser (#26 ) * feat: add money.cnn custom parser * added timezone to cnn custom parser	2016-11-29 15:13:29 -08:00
Adam Pash	6343946dd8	Feat: custom timezones (#29 ) * using moment-timezone to allow custom timezones * added tz to tmz, even though still so-so	2016-11-29 14:46:46 -08:00
Adam Pash	19e7345bfb	feat: test builds are created for preview purposes so we aren't committing dist every time (#27 )	2016-11-29 10:06:55 -08:00
Adam Pash	a8face796a	Fix extension bugs (#23 ) * feat: cleaning supplemental elements in nytimes (visible in web only) closes https://github.com/postlight/mercury-reader-chrome-extension/issues/102 * wip * fix: more generous date published bits * feat: added washington post extractor (including figure transforms) closes https://github.com/postlight/mercury-reader-chrome-extension/issues/100 * feat: cleaning zoom lightbox from gizmodo/kinja * lint fix	2016-11-28 16:58:21 -08:00
Adam Pash	3a2f32b0eb	feat: added tmz custom parser (#22 )	2016-11-28 15:10:28 -08:00
Adam Pash	783a9cfb2f	fix: changed overly liberal regex for removing transparent images	2016-11-28 12:10:57 -08:00
Adam Pash	7411922c55	feat: encoding response body based on content-type charset (#21 ) Also some small code organization	2016-11-22 10:44:27 -08:00
Adam Pash	88c125d022	chore: package upgrades	2016-11-22 08:45:57 -08:00
Adam Pash	c30fb2e4c0	chore: updated readme	2016-11-22 08:41:35 -08:00
Adam Pash	60a6861e18	Feat: browser support (#19 ) Big undertaking to support Mercury in the browser. Builds are working and all tests are passing both for web and node builds. Most code is closely shared.	2016-11-21 14:17:06 -08:00
Adam Pash	eaea57461a	fix: servers returning bad headers was breaking request. temporarily (#20 ) using fork with a fix for this until request merges the necessary pull request	2016-11-15 13:17:01 -08:00
Adam Pash	629eada1f7	feat: recording/playing back network requests with nock (#18 ) * feat: recording/playing back network requests with nock * lint fix	2016-10-28 14:54:12 -07:00
Adam Pash	6e29848e9c	feat: making yarn-friendly for package manager (#17 ) * updated several commands; some fixes exposed by yarn upgrade * removed unnec dep	2016-10-28 11:10:42 -07:00

1 2 3 4 5 ...

328 Commits