mercury-parser

mirror of https://github.com/postlight/mercury-parser synced 2024-10-31 03:20:40 +00:00

Author	SHA1	Message	Date
kik0220	c309bdb373	feat: add otrs.com custom parser (#353 )	2019-04-09 11:17:58 +03:00
Toufic Mouallem	3ed778b53e	fix: Adapt CNBC extractor to article redesign (#336 )	2019-03-25 15:43:40 -07:00
Ben Ubois	a7e4c67d1d	Extract content from GitHub repos. (#306 ) * Extract content from GitHub repos. * Add published and dek. * Timezone fix.	2019-03-14 08:48:33 -07:00
Toufic Mouallem	7844129fda	feat: Add custom parser for Reddit (#307 )	2019-03-08 14:37:24 -08:00
Jordan Hotmann	83d1c2401b	feat: add custom extractor for blisterreview.com (#299 )	2019-03-01 16:48:26 -08:00
kik0220	d9a1e7b22b	feat: add news.mynavi.jp custom parser (#287 )	2019-03-01 16:45:32 -08:00
Adam Pash	9698d9a0c4	dx: comment on custom parser pr fix (#278 ) * dx: comment on custom parser pr fix * fix path * write json * chore: rename comment script	2019-02-28 11:11:03 -08:00
Ben Ubois	ed14203e97	fix: return early if creating the resource failed. (#285 )	2019-02-20 16:48:51 -08:00
Ben Ubois	0e27448866	feat: Various Character Encoding Improvements (#270 ) * Support HTML5 charset tag In HTML5 `<meta charset="">` is shorthand for `<meta http-equiv="content-type" content="">` https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta * Handle more character encoding declaration methods.	2019-02-12 15:15:19 -08:00
Wajeeh Zantout	1ccd14e1e9	feat: add fortinet custom parser (#188 ) * feat: add fortinet custom parser * fix: eslint error * fix: transform noscript images * feat: add fortinet custom parser * fix: eslint error * fix: transform noscript images * fix: transform method * test: transform method * fix: fs import	2019-01-30 09:33:36 +02:00
Wajeeh Zantout	9b36003b62	feat: add fastcompany custom parser (#191 ) * feat: add fastcompany custom parser * fix: eslint error * fix: test for date_published * feat: add fastcompany custom parser * fix: eslint error * fix: test for date_published * fix: fs import	2019-01-30 09:30:24 +02:00
Ralph Jbeily	f3f6e21fd8	fix: author and date published selectors (#189 )	2019-01-25 11:28:43 -08:00
Adam Pash	96640e3564	fix: failing fetchResource test (#187 ) I think was a fixture problem	2018-12-20 10:06:16 -08:00
Adam Pash	5663660f76	fix: nytimes custom parser title selector (#181 ) * fix: nytimes custom parser title selector * upgrade node version * circle ci tweak	2018-10-12 13:39:41 -07:00
Adam Pash	b8aa87c777	feat: improve wh parser (#168 )	2017-03-24 14:41:40 -07:00
Adam Pash	61f0f4e1af	fix: kept elements being removed (#166 ) Elements marked to keep were removeable under specific circumstances. This PR fixes these edge cases.	2017-03-23 13:16:21 -07:00
Adam Pash	453419de72	feat: improve wh.gov parser (#163 ) * feat: support youtube-nocookie domain * feat: updated wh.gov parser to support speeches	2017-03-22 13:16:54 -07:00
Janet	f13bb721f6	feat: prospect magazine parser (#147 ) * feat: prospect magazine parser Couldn’t find a way to parse the date but I think it’s good otherwise. * fix: pulls date * fix: add timezone * fix: generalize	2017-03-14 18:34:40 -04:00
Kevin Ngao	1b28713cf5	feat: fool.com parser (#158 ) * feat: add fool.com custom parser	2017-03-14 18:04:19 -04:00
Janet	c18959779d	feat: forward.com parser (#144 ) * feat: forward.com parser LGTM although image didn’t show up in preview * feat: also pull imge into content * fix: generalize selectors * fix: generalize selector	2017-03-14 17:53:23 -04:00
Janet	50e548bac2	feat: qdaily parser (#146 ) * feat: qdaily parser Firstly — I accidentally tried to generate the parser on the master branch, and I’m not sure where it is, maybe floating in the nether world. On to the parser — this one was a bit tricky because things were in Chinese! The content appears to be parsing (as seen in preview) but it’s not passing the test. I noticed the second “ ‘ “ mark isn’t appearing on the parser side. Additionally, some of the lazy loading images aren’t appearing in the preview (I cleaned the wrong lazy load images that appeared), so someone will probably have to work on that (I don’t know how to do transforms yet). * fix tests * fix: selector generalization	2017-03-14 17:37:53 -04:00
Silas Burton	11382ce651	Feat: Slate extractor (#153 ) * feat: slate extractor * fix: generalize selectors * fix: add Slate timezone	2017-03-13 17:44:04 -04:00
Silas Burton	5acaa6ab56	feat: ici.radio-canada.ca extractor (#156 ) * feat: ici.radio-canada.ca extractor * fix: add timezone	2017-03-13 17:23:20 -04:00
Silas Burton	9b371e51ac	Feat: gothamist extractor (#151 ) * feat: gothamist extractor * feat: add other gothamist network sites * fix: try getting date another way * fix: add gothamist timezone * fix: generalize selectors * fix: h1 is inside entry-header, needs to be specific because of another h1 on the page * fix: general and specific selector	2017-03-09 13:13:46 -05:00
Kevin Ngao	afbef9bc39	Fix Encoding on Body (#143 ) * fix: check encoding on body	2017-03-06 11:36:56 -05:00
Janet	93d2baf5cf	feat: news.natgeo parser (#88 ) * feat: natgeo parser For some reason, the local copy of the article didn’t grab the author name in it, so I couldn’t figure out how to parse it. The generic parser took a name of an author of a paper mentioned in the article, and thought that was the author name, which was funny. I cleaned a large block quote that didn’t make sense as it was shown in the preview, although I noticed that the Mercury chrome extension didn’t even display it. * fix: add date_published transform * fix: date_published assertion * disable: author assertion, generlize author selector * rm: author assertion * fix: image lead * fix: guard agaist missing img url * fix: generalize dek and title selectors	2017-02-08 15:27:35 -07:00
Janet	2279c2d486	feat: natgeo parser (#89 ) * feat: natgeo parser Same as the news.nationalgeographic.com parser - for some reason the author name doesn’t appear to be getting pulled into the local copy of the file. * fix: content assertion * fix: generalize author byline * disable: author assertion * rm: author assertion * fix: image lead, handles image-group * fix: guard agaist missing img url * fix: generalize dek and title selectors	2017-02-08 15:01:55 -07:00
Janet	11f466ccb3	feat: latimes parser (#92 ) * feat: latimes parser	2017-02-08 11:29:03 -05:00
Kevin Ngao	26a8e4f75a	feat: macrumors parser (#120 ) * feat: add macrumors	2017-02-07 19:15:29 -05:00
Kevin Ngao	b4fec6af98	feat: androidcentral parser (#119 ) * feat: androidcentral parser	2017-02-07 18:20:04 -05:00
Janet	beb0b89a4f	feat: pagesix parser (#97 ) * feat: pagesix parser	2017-02-07 17:38:09 -05:00
Janet	f2160eb5b6	feat: si parser (#118 ) * feat: si parser	2017-02-07 16:52:11 -05:00
Janet	2af0f6179a	feat: rawstory parser (#109 ) * feat: rawstory parser Finished, with a little help from Frankie (thanks Frankie!) * fix: date_published timezone	2017-02-07 12:53:05 -07:00
Janet	765032452d	feat: thefederalistpapers parser (#101 ) * feat: thefederalistpapers parser	2017-02-07 14:30:52 -05:00
Janet	fb5eb2e104	feat: cnet parser (#104 ) * feat: cnet parser Date test fail - please take a look! Also, image didn’t load in preview. * fix: timezone * fix: image lead	2017-02-07 11:55:04 -07:00
Janet	3c5fa28f10	feat: cbs sports parser (#98 ) * feat: cbs sports parser	2017-02-07 10:45:48 -05:00
Janet	3cf2d0d3ef	feat: msnbc parser (#100 ) * feat: msnbc parser	2017-02-06 18:08:49 -05:00
Janet	f9ab9eb885	feat: howtogeek extractor (#108 ) * feat: howtogeek extractor This one is a bit tricky - the author and date info appear in a comment section at the bottom. Was able to parse the author but not the date info. Halp * howtogeek update Thanks to @fdsimms I was able to parse the date, but not sure what to test it against, so I left it blank. * fix: date_published assertion, it was comparing against empty string * fix: timezone * amend: generalize author selector	2017-02-06 15:23:15 -07:00
Janet	258acdfd02	feat: opposing views parser (#103 ) * feat: opposing views parser	2017-02-06 12:22:42 -05:00
Janet	b63dd33579	feat: today parser (#106 ) * feat: today parser This looks fine — there are a couple of lines of “Related” but they are within the body (and don’t have their own classes) so I couldn’t clean them out. * fix: fix content assertion	2017-02-06 09:20:12 -07:00
Janet	c94eee7f92	feat: cinema blend parser (#105 ) * feat: cinema blend parser all systems go * fix: timezone	2017-02-06 09:02:11 -07:00
Janet	64e3c205e8	feat: the political insider parser (#99 ) * feat: the political insider parser with timezone	2017-02-03 16:25:16 -05:00
Janet	7b52d3d1fc	feat: al.com parser (#110 ) * feat: al.com parser I think this is good but could you pls double check time zone on the date? Thanks * fix: date_published timezone	2017-02-03 11:45:45 -07:00
Janet	15df58496f	feat: westernjournalism parser (#113 ) * feat: westernjournalism parser Adjacent sibling selector FTW! Image not displaying in preview. * feat: fix assertion, body does not include _Advertisement_ subtext	2017-02-03 11:15:50 -07:00
Janet	ae12a1d701	feat: mental floss parser (#94 ) * feat: mental floss parser	2017-02-03 11:40:01 -05:00
Janet	bf29291395	feat: thepennyhoarder parser (#112 ) * feat: thepennyhoarder parser Looks good, although no image in preview! * fix: adds selector for article lead image	2017-02-03 08:56:15 -07:00
Janet	fadd198d04	feat: abcnewsgo parser (#90 ) * feat: abcnewsgo parser	2017-02-02 17:43:35 -05:00
Janet	1054d854dd	feat: america now parser (#114 ) * feat: america now parser Looks good but lead image did not display in preview. * feat: adds selector for lead image	2017-02-02 13:46:20 -07:00
Janet	4c48acba59	feat: fusion parser Looks okay — image did not load in preview.	2017-02-02 10:54:49 -07:00
Janet	d292d8ef3a	feat: ny daily news parser (#87 ) * feat: ny daily news parser	2017-02-02 12:30:16 -05:00

1 2 3

114 Commits