mercury-parser

Commit Graph

Author	SHA1	Message	Date
Sarah Doire	c0364ec52b	feat: update all fixtures and custom parsers to match (#713 ) * feat: Refactor and update fixtures This patch changes how fixtures are stored. Previously, a fixture's folder identified its domain and its filename identified when it was fetched. This has been changed so that the filename indicates the domain and the modified time of the file indicates how recently it was fetched. A fixture's filename can optionally include a modifier to distinguish between two different page types on the same domain, for example. Also included here are changes to the update-fixture script, both to accomodate the new filename scheme as well as to actually update all fixtures. The functionality for running automatically and opening PRs has been removed but will likely be reintroduced. Finally, all fixtures have been updated. * Remove reference to deleted extractor * feat: first batch of test and parser updates due to new fixtures * feat: update more custom parsers and unit tests * feat: update more custom parsers and unit tests and remove unnecessary parser * feat: update more custom parsers and unit tests * feat: update more parsers and add correct bloomberg html files * fix: remove console statement * feat: all parsers updated and tests passing * fix: update date_published tests to account for test server time difference * fix: cleanup remaining fixtures in folders * feat: move fixtures for newest custom parsers * feat: remove script changes * fix: update dist files to account for reverting script changes * adding .DS_Store to .gitignore * adding .DS_Store to .gitignore -- 2 * adding .DS_Store to .gitignore -- 3 lol * cleaning up some tests * fix: ran build:generator command to update generate-custom-parser dist file * fix: update rollup configs to generate source maps and update source maps * fix: use underscore in place of unused error variable * fix: remove unused fixture Co-authored-by: Postlight Bot <adam.pash+postlight-bot@postlight.com> Co-authored-by: flbn <overasc@gmail.com>	1 year ago
Sarah Doire	7b68bcd94c	feat: remove obsolete custom extractors (#712 )	2 years ago
John Holdun	ad8d4aa268	release: 2.2.3 (#703 )	2 years ago
Sarah Doire	8ca8a5f7e5	feat: add postlight.com custom extractor (#695 )	2 years ago
John Holdun	39b9ff55c4	release: 2.2.2 (#689 )	2 years ago
John Holdun	0d2bad544c	chore: Update builds	2 years ago
Michael Ashley	56a19bf934	fix: updating generate-parser dist (#499 )	2 years ago
Jad Termsani	0ccb91a487	release: 2.2.1 (#631 )	3 years ago
Michael Ashley	c5c000586d	release: 2.2.0 (#496 ) * release: 2.2.0	5 years ago
Adam Pash	713de25751	release: 2.1.1 (#446 )	5 years ago
Adam Pash	ca47f9c7a7	release: 2.1.0 (#373 )	5 years ago
Toufic Mouallem	144a797564	feat: Support passing custom headers in requests (#337 )	5 years ago
Drew Bell	b3e2a0ffd1	feat: extract custom types with extend option (#313 ) * feat: extract custom types with extend option Adds an `extend` option that lets you add custom types to be extracted and returned alongside the defaults, either in a call to `parse()` or in a custom extractor. ``` Mercury.parse( url, extend: { last_edited: { selectors: ['#last-edited'], defaultCleaner: false } } ) ``` * chore: use Reflect.ownKeys * feat: add CLI options * doc: add extend param to cli help * refactor: extract selectExtendedTypes * feat: only overwrite null extended results * feat: add allowMultiple extraction option * feat: accept extendList CLI args * feat: allow attribute selectors in extends on CLI * test: update extend tests * fix: don't invoke cleaner for custom types * feat: always return array if allowMultiple * test: add test for array of single result * refactor: extract extractHtml * refactor: destructure allowMultiple * fix: wrap multiple matches in $ for cheerio shim * fix: find extended types before any other munging * feat: absolutize all links * fix: clean content more directly * doc: Update CLI docs in README * chore: update dist * doc: Document extend in custom extractor README	5 years ago
Adam Pash	b044cfa958	release: 2.0.0 (#275 )	5 years ago
Adam Pash	9bf88b0ba3	chore: refactor format output adjustments (#272 ) I had previously done this in an overly complicated manner. This PR cleans it up a bit.	5 years ago
Adam Pash	ab56ce0de3	fix: custom parser generator (#271 ) - swap fs import - fix rollup config	5 years ago
Adam Pash	9b0664bc91	feat: add content format output options (#256 )	5 years ago
Adam Pash	d884c3470c	release: 1.1.0 (#245 )	5 years ago
Adam Pash	76d333f0be	deps: upgrade (#218 )	5 years ago
Adam Pash	c643666c88	dx: automate fixture updates (#197 )	5 years ago
Adam Pash	fd6c9d4fa3	release: 1.0.13 (#183 )	6 years ago
Adam Pash	7fcd9b62eb	release: 1.0.12 (#173 )	7 years ago
Jeremy Mack	5fcea1c5c3	fix: PARSING_NODE undefined (#172 ) * fix: PARSING_NODE undefined * chore: remove unused cleanup function/call	7 years ago
Adam Pash	a51cc81c27	release: 1.0.11 (#171 )	7 years ago
Jeremy Mack	e92e798880	fix: viewport tags leaking to parent page (#170 ) * fix: scrub meta viewport tags They leak to the parent page when using the web version of Mercury Parser. * chore: build * fix: keep DOM in memory to avoid conflicts	7 years ago
Adam Pash	86d6bd1dc1	release: 1.0.10 (#169 )	7 years ago
Adam Pash	e56e8e24cd	release: 1.0.9 (#167 )	7 years ago
Adam Pash	321c087be6	release: 1.0.8 (#164 )	7 years ago
Adam Pash	e267d57d78	release: 1.0.7 (#160 )	7 years ago
Kevin Ngao	f2e3f055c2	Fixes an issue with encoding (#154 ) * fix: fixes an issue with encoding on the fetch level	7 years ago
Kevin Ngao	afbef9bc39	Fix Encoding on Body (#143 ) * fix: check encoding on body	7 years ago
Adam Pash	9d4c883d51	release: 1.0.6 (#142 )	7 years ago
Adam Pash	601b0fac16	release: 1.0.5 (#136 )	7 years ago
Adam Pash	31eb4f9222	Feat: LinkedIn parser (#123 ) * feat: rebuild custom parser * feat: linkedin custom parser	7 years ago
Adam Pash	dbc706410b	release: 1.0.4 (#122 )	7 years ago
Adam Pash	a710efd2d5	release: 1.0.3 (#62 )	8 years ago
Adam Pash	332f85928f	release: 1.0.2 (#54 )	8 years ago
Adam Pash	15656cb3e1	Refactor: running tests more efficiently (#49 ) Only running one parser per page we're testing rather than a parser per field we're testing.	8 years ago
Adam Pash	edcb7295d1	release: 1.0.1 (#48 )	8 years ago
Adam Pash	e9a36d6ebd	release: 1.0.0 so we can start doing proper releaes (#39 )	8 years ago
Janet	c4d72fb735	feat: add money.cnn custom parser (#26 ) * feat: add money.cnn custom parser * added timezone to cnn custom parser	8 years ago
Adam Pash	6343946dd8	Feat: custom timezones (#29 ) * using moment-timezone to allow custom timezones * added tz to tmz, even though still so-so	8 years ago
Adam Pash	a8face796a	Fix extension bugs (#23 ) * feat: cleaning supplemental elements in nytimes (visible in web only) closes https://github.com/postlight/mercury-reader-chrome-extension/issues/102 * wip * fix: more generous date published bits * feat: added washington post extractor (including figure transforms) closes https://github.com/postlight/mercury-reader-chrome-extension/issues/100 * feat: cleaning zoom lightbox from gizmodo/kinja * lint fix	8 years ago
Adam Pash	3a2f32b0eb	feat: added tmz custom parser (#22 )	8 years ago
Adam Pash	7411922c55	feat: encoding response body based on content-type charset (#21 ) Also some small code organization	8 years ago
Adam Pash	60a6861e18	Feat: browser support (#19 ) Big undertaking to support Mercury in the browser. Builds are working and all tests are passing both for web and node builds. Most code is closely shared.	8 years ago
Adam Pash	eaea57461a	fix: servers returning bad headers was breaking request. temporarily (#20 ) using fork with a fix for this until request merges the necessary pull request	8 years ago
Adam Pash	6e29848e9c	feat: making yarn-friendly for package manager (#17 ) * updated several commands; some fixes exposed by yarn upgrade * removed unnec dep	8 years ago
Adam Pash	048d654417	feat: parser auto-generates name; lint is more specific	8 years ago
Adam Pash	4d1d950807	updated generator templates for new style of import/export. also some adjustments for usability	8 years ago

1 2

88 Commits (fix-remove-moment-js)