mercury-parser

Commit Graph

Author	SHA1	Message	Date
kik0220	c309bdb373	feat: add otrs.com custom parser (#353 )	6 years ago
Toufic Mouallem	3ed778b53e	fix: Adapt CNBC extractor to article redesign (#336 )	6 years ago
Drew Bell	b3e2a0ffd1	feat: extract custom types with extend option (#313 ) * feat: extract custom types with extend option Adds an `extend` option that lets you add custom types to be extracted and returned alongside the defaults, either in a call to `parse()` or in a custom extractor. ``` Mercury.parse( url, extend: { last_edited: { selectors: ['#last-edited'], defaultCleaner: false } } ) ``` * chore: use Reflect.ownKeys * feat: add CLI options * doc: add extend param to cli help * refactor: extract selectExtendedTypes * feat: only overwrite null extended results * feat: add allowMultiple extraction option * feat: accept extendList CLI args * feat: allow attribute selectors in extends on CLI * test: update extend tests * fix: don't invoke cleaner for custom types * feat: always return array if allowMultiple * test: add test for array of single result * refactor: extract extractHtml * refactor: destructure allowMultiple * fix: wrap multiple matches in $ for cheerio shim * fix: find extended types before any other munging * feat: absolutize all links * fix: clean content more directly * doc: Update CLI docs in README * chore: update dist * doc: Document extend in custom extractor README	6 years ago
Ben Ubois	a7e4c67d1d	Extract content from GitHub repos. (#306 ) * Extract content from GitHub repos. * Add published and dek. * Timezone fix.	6 years ago
Toufic Mouallem	7844129fda	feat: Add custom parser for Reddit (#307 )	6 years ago
Drew Bell	91fb0dfb46	fix: update parse signature in tests (#315 )	6 years ago
Toufic Mouallem	9714cb70c5	feat: Use Deadspin parser for all Kinja websites (#304 )	6 years ago
Jordan Hotmann	83d1c2401b	feat: add custom extractor for blisterreview.com (#299 )	6 years ago
kik0220	d9a1e7b22b	feat: add news.mynavi.jp custom parser (#287 )	6 years ago
Olli Sulopuisto	44a7ec791d	docs: typofix (#300 )	6 years ago
Adam Pash	9bf88b0ba3	chore: refactor format output adjustments (#272 ) I had previously done this in an overly complicated manner. This PR cleans it up a bit.	6 years ago
Adam Pash	9b0664bc91	feat: add content format output options (#256 )	6 years ago
Adam Pash	c6f42c1278	docs: cleanup and update docs (#238 )	6 years ago
George Haddad	5c0325f5a7	feat: hook up ci to publish to npm (#226 ) * chore: add missing fields to package.json * feat: add postlight org scope to package name * feat: automate npm publish * test: npm publish without filters * fix: add docker image * test: change directory * test: add working directory * fix: defaults syntax * test: add workspace * fix: attach workspace * fix: use standard mercury email * fix: use ISO time format and preserve original timezone offset * fix: do not match time zone offset * chore: move babel runtime-corejs2 to prod deps * chore: uncomment config to deploy on git tag * feat: publish to npm public * adding browser-request It doesn't seem to impact the build, but technically it should be there so for good measure, why not... * chore: roll version back to original state	6 years ago
Adam Pash	663cc45bf4	fresh run of prettier; remove NOTES.md (#233 )	6 years ago
Wajeeh Zantout	1ccd14e1e9	feat: add fortinet custom parser (#188 ) * feat: add fortinet custom parser * fix: eslint error * fix: transform noscript images * feat: add fortinet custom parser * fix: eslint error * fix: transform noscript images * fix: transform method * test: transform method * fix: fs import	6 years ago
Wajeeh Zantout	9b36003b62	feat: add fastcompany custom parser (#191 ) * feat: add fastcompany custom parser * fix: eslint error * fix: test for date_published * feat: add fastcompany custom parser * fix: eslint error * fix: test for date_published * fix: fs import	6 years ago
Ralph Jbeily	f3f6e21fd8	fix: author and date published selectors (#189 )	6 years ago
Jad Termsani	28cf41304c	fix: timezone comparison (#222 ) * fix: use format() instead of toISOString() * fix: timezone comparison	6 years ago
Ralph Jbeily	ca44ce3dd1	docs: add install build and test guide (#215 ) * docs: add install build and test guide * docs: remove install build and test guides * docs: add installation guide	6 years ago
Ralph Jbeily	2e1e4d90c9	feat: add remarklint for md docs (#213 ) * feat: add remarklint for md docs * fix: remarkrc file and run linter on commit hook	6 years ago
Adam Pash	76d333f0be	deps: upgrade (#218 )	6 years ago
George Haddad	56badb51f5	dx: remove unnec comments in source (#205 ) * dx: remove commented code and obvious comments that can be looked up * dx: remove commented out eslint options * dx: remove commented out code * dx: remove commented out code * dx: remove commented out code * dx: remove test block as all its code was commented out * dx: remove commented out code * dx: remove commented out code * dx: remove commented out code * dx: remove regex example comments * dx: remove commented out code * dx: remove commented out code * dx: remove commented out import * dx: remove commented out code * dx: remove commented out code * dx: remove commented out code * dx: remove commented out code * dx: remove commented out code * dx: remove commented out code * dx: remove commented out code * dx: remove commented out code * dx: remove commented out code * dx: remove commented out code * dx: remove commented out code * chore: remove empty files * chore: re-prettier code that may have missed it * added back nec comments	6 years ago
Adam Pash	e4b057f9ea	chore: update node and some deps (#209 ) * chore: update .nvmrc * added prettier and pre-commit hooks * update docker image to new node * add karma-cli to get web tests working * explictly install karma... seems to fix problem * remove pre-built phantomjs * swap install order	6 years ago
Adam Pash	d850177b68	docs: Update README.md (#184 )	6 years ago
Adam Pash	5663660f76	fix: nytimes custom parser title selector (#181 ) * fix: nytimes custom parser title selector * upgrade node version * circle ci tweak	6 years ago
Adam Pash	b8aa87c777	feat: improve wh parser (#168 )	8 years ago
Adam Pash	61f0f4e1af	fix: kept elements being removed (#166 ) Elements marked to keep were removeable under specific circumstances. This PR fixes these edge cases.	8 years ago
Adam Pash	453419de72	feat: improve wh.gov parser (#163 ) * feat: support youtube-nocookie domain * feat: updated wh.gov parser to support speeches	8 years ago
Janet	f13bb721f6	feat: prospect magazine parser (#147 ) * feat: prospect magazine parser Couldn’t find a way to parse the date but I think it’s good otherwise. * fix: pulls date * fix: add timezone * fix: generalize	8 years ago
Kevin Ngao	1b28713cf5	feat: fool.com parser (#158 ) * feat: add fool.com custom parser	8 years ago
Janet	c18959779d	feat: forward.com parser (#144 ) * feat: forward.com parser LGTM although image didn’t show up in preview * feat: also pull imge into content * fix: generalize selectors * fix: generalize selector	8 years ago
Janet	50e548bac2	feat: qdaily parser (#146 ) * feat: qdaily parser Firstly — I accidentally tried to generate the parser on the master branch, and I’m not sure where it is, maybe floating in the nether world. On to the parser — this one was a bit tricky because things were in Chinese! The content appears to be parsing (as seen in preview) but it’s not passing the test. I noticed the second “ ‘ “ mark isn’t appearing on the parser side. Additionally, some of the lazy loading images aren’t appearing in the preview (I cleaned the wrong lazy load images that appeared), so someone will probably have to work on that (I don’t know how to do transforms yet). * fix tests * fix: selector generalization	8 years ago
Silas Burton	51a4d1d12f	feat: newrepublic parser shows image on page (#159 )	8 years ago
Silas Burton	11382ce651	Feat: Slate extractor (#153 ) * feat: slate extractor * fix: generalize selectors * fix: add Slate timezone	8 years ago
Silas Burton	5acaa6ab56	feat: ici.radio-canada.ca extractor (#156 ) * feat: ici.radio-canada.ca extractor * fix: add timezone	8 years ago
Silas Burton	4509b341e6	feat: better cleanup of atlantic articles (#157 )	8 years ago
Silas Burton	9b371e51ac	Feat: gothamist extractor (#151 ) * feat: gothamist extractor * feat: add other gothamist network sites * fix: try getting date another way * fix: add gothamist timezone * fix: generalize selectors * fix: h1 is inside entry-header, needs to be specific because of another h1 on the page * fix: general and specific selector	8 years ago
Janet	93d2baf5cf	feat: news.natgeo parser (#88 ) * feat: natgeo parser For some reason, the local copy of the article didn’t grab the author name in it, so I couldn’t figure out how to parse it. The generic parser took a name of an author of a paper mentioned in the article, and thought that was the author name, which was funny. I cleaned a large block quote that didn’t make sense as it was shown in the preview, although I noticed that the Mercury chrome extension didn’t even display it. * fix: add date_published transform * fix: date_published assertion * disable: author assertion, generlize author selector * rm: author assertion * fix: image lead * fix: guard agaist missing img url * fix: generalize dek and title selectors	8 years ago
Janet	2279c2d486	feat: natgeo parser (#89 ) * feat: natgeo parser Same as the news.nationalgeographic.com parser - for some reason the author name doesn’t appear to be getting pulled into the local copy of the file. * fix: content assertion * fix: generalize author byline * disable: author assertion * rm: author assertion * fix: image lead, handles image-group * fix: guard agaist missing img url * fix: generalize dek and title selectors	8 years ago
Adam Pash	08b5bb7ff1	feat: allow parser to define custom date formats (#141 ) * feat: allow parser to define custom date formats * feat: updating macrumors to test/verify format working correctly	8 years ago
Janet	11f466ccb3	feat: latimes parser (#92 ) * feat: latimes parser	8 years ago
Kevin Ngao	26a8e4f75a	feat: macrumors parser (#120 ) * feat: add macrumors	8 years ago
Kevin Ngao	b4fec6af98	feat: androidcentral parser (#119 ) * feat: androidcentral parser	8 years ago
Janet	beb0b89a4f	feat: pagesix parser (#97 ) * feat: pagesix parser	8 years ago
Janet	f2160eb5b6	feat: si parser (#118 ) * feat: si parser	8 years ago
Janet	2af0f6179a	feat: rawstory parser (#109 ) * feat: rawstory parser Finished, with a little help from Frankie (thanks Frankie!) * fix: date_published timezone	8 years ago
Janet	765032452d	feat: thefederalistpapers parser (#101 ) * feat: thefederalistpapers parser	8 years ago
Janet	fb5eb2e104	feat: cnet parser (#104 ) * feat: cnet parser Date test fail - please take a look! Also, image didn’t load in preview. * fix: timezone * fix: image lead	8 years ago
Janet	3c5fa28f10	feat: cbs sports parser (#98 ) * feat: cbs sports parser	8 years ago

1 2 3 4 5

203 Commits (c309bdb37359b67b247208a0e06abe599a4504db)