Archives/mercury-parser - mercury-parser - blob42 source forge

Author	SHA1	Message	Date
Sarah Doire	c0364ec52b	feat: update all fixtures and custom parsers to match (#713 ) * feat: Refactor and update fixtures This patch changes how fixtures are stored. Previously, a fixture's folder identified its domain and its filename identified when it was fetched. This has been changed so that the filename indicates the domain and the modified time of the file indicates how recently it was fetched. A fixture's filename can optionally include a modifier to distinguish between two different page types on the same domain, for example. Also included here are changes to the update-fixture script, both to accomodate the new filename scheme as well as to actually update all fixtures. The functionality for running automatically and opening PRs has been removed but will likely be reintroduced. Finally, all fixtures have been updated. * Remove reference to deleted extractor * feat: first batch of test and parser updates due to new fixtures * feat: update more custom parsers and unit tests * feat: update more custom parsers and unit tests and remove unnecessary parser * feat: update more custom parsers and unit tests * feat: update more parsers and add correct bloomberg html files * fix: remove console statement * feat: all parsers updated and tests passing * fix: update date_published tests to account for test server time difference * fix: cleanup remaining fixtures in folders * feat: move fixtures for newest custom parsers * feat: remove script changes * fix: update dist files to account for reverting script changes * adding .DS_Store to .gitignore * adding .DS_Store to .gitignore -- 2 * adding .DS_Store to .gitignore -- 3 lol * cleaning up some tests * fix: ran build:generator command to update generate-custom-parser dist file * fix: update rollup configs to generate source maps and update source maps * fix: use underscore in place of unused error variable * fix: remove unused fixture Co-authored-by: Postlight Bot <adam.pash+postlight-bot@postlight.com> Co-authored-by: flbn <overasc@gmail.com>	1 year ago
Michael Ashley	ab401822aa	maintenance update - october 2022 (#696 ) * fix: add alternative word count method * fix: replace pages_rendered key with rendered_pages for consistency * fix: return first lead_image_url when multiple og:image present * fix: properly pull image src from lazy loaded img * fix: allow drop cap character in medium custom extractor * fix: refined medium parser	2 years ago
John Holdun	97472cf4f8	Change Name (#688 ) Mercury Parser is now Postlight Parser!	2 years ago
John Holdun	112846f74f	chore: Inline test fixtures (#683 ) Not to be confused with extractor fixtures, which are snapshots of a webpage. This change removes the pattern of separate JS files that provide "fixtures" for tests, which are used as provided or expected strings in tests. They were inconsistent and disorganized, and generally just served to add indirection to test files. So now all those strings are defined where they are used in their respective tests.	2 years ago
Michael Ashley	e12c916499	feat: ability to add custom extractors via api (#484 ) * feat: ability to add custom extractors via api * docs: updating readme * fix: example.com was being used in another test * fix: timezone was messing up date_published test * fix: using a unique site for testing * fix: updated custom extractor api * docs: updating readme * fix: removing unused fixture * fix: updating test description * feat: ability to add custom extractors via cli	5 years ago
Kirill Danshin	592f175270	tests: remove a duplicate test (#448 )	5 years ago
Toufic Mouallem	262dda94b3	fix: explicity reject non-200 status codes (#342 )	5 years ago
Drew Bell	b3e2a0ffd1	feat: extract custom types with extend option (#313 ) * feat: extract custom types with extend option Adds an `extend` option that lets you add custom types to be extracted and returned alongside the defaults, either in a call to `parse()` or in a custom extractor. ``` Mercury.parse( url, extend: { last_edited: { selectors: ['#last-edited'], defaultCleaner: false } } ) ``` * chore: use Reflect.ownKeys * feat: add CLI options * doc: add extend param to cli help * refactor: extract selectExtendedTypes * feat: only overwrite null extended results * feat: add allowMultiple extraction option * feat: accept extendList CLI args * feat: allow attribute selectors in extends on CLI * test: update extend tests * fix: don't invoke cleaner for custom types * feat: always return array if allowMultiple * test: add test for array of single result * refactor: extract extractHtml * refactor: destructure allowMultiple * fix: wrap multiple matches in $ for cheerio shim * fix: find extended types before any other munging * feat: absolutize all links * fix: clean content more directly * doc: Update CLI docs in README * chore: update dist * doc: Document extend in custom extractor README	5 years ago
Toufic Mouallem	136d6df798	feat: Return specific errors on failed parse attempts	5 years ago
Ben Ubois	ed14203e97	fix: return early if creating the resource failed. (#285 )	5 years ago
Adam Pash	9bf88b0ba3	chore: refactor format output adjustments (#272 ) I had previously done this in an overly complicated manner. This PR cleans it up a bit.	5 years ago
Adam Pash	9b0664bc91	feat: add content format output options (#256 )	5 years ago
Ralph Jbeily	f3f6e21fd8	fix: author and date published selectors (#189 )	5 years ago
Adam Pash	e4b057f9ea	chore: update node and some deps (#209 ) * chore: update .nvmrc * added prettier and pre-commit hooks * update docker image to new node * add karma-cli to get web tests working * explictly install karma... seems to fix problem * remove pre-built phantomjs * swap install order	5 years ago
Adam Pash	629eada1f7	feat: recording/playing back network requests with nock (#18 ) * feat: recording/playing back network requests with nock * lint fix	8 years ago
Adam Pash	e325d860fd	Feat: improving ci (#16 ) This commit also swaps in yarn for npm and tweaks circle ci a bit. * appveyor.yml first go * changing node * ps * narrow it down * trying this * fix airbnb module * trying with yarn * logging * hybrid? * trying yarn w/circle * bump workers? * build off? * updating script * tweaking script for appveyor * bumping maxworkers * cleaning up * build step? * yarn it * added appveyor badge	8 years ago
Adam Pash	17317823de	fix: bug that stopped proper attr cleaning in certain cases	8 years ago
Adam Pash	d3b11be473	feat: keeping youtube and vimeo iframe embeds (#14 ) * feat: keeping youtube and vimeo iframe embeds * fix: removing class from article correctly	8 years ago
Adam Pash	173f885674	feat: custom parser + generator + detailed readme instructions Squashed commit of the following: commit 02563daa67712c3679258ebebac60dfa9568dffb Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 12:25:44 2016 -0400 updated readme, added newyorker parser for readme guide commit 0ac613ef823efbffbf4cc9a89e5cb2489d1c4f6f Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 11:16:52 2016 -0400 feat: updated parser so the saved fixture absolutizes urls commit 85c7a2660b21f95c2205ca4a4378a7570687fed0 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 30 10:15:26 2016 -0400 refactor: attribute selectors must be an array for custom extractors commit f60f93d5d3d9b2f2d9ec6f28d27ae9dcf16ef01e Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 29 10:13:14 2016 -0400 fix: whitelisting srcset and alt attributes commit e31cb1f4e8a9fc9c3d9b20ef9f40ca6c8d6ad51a Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 29 09:44:21 2016 -0400 some housekeeping for coverage tests commit 39eafe420c776a1fe7f9fea634fb529a3ed75a71 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 28 17:52:08 2016 -0400 fix: word count for multi-page articles commit b04e0066b52f190481b1b604c64e3d0b1226ff02 Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 22 10:40:23 2016 -0400 major improvements to output commit 3f3a880b63b47fe21953485da670b6e291ac60e5 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 17:27:53 2016 -0400 updated test command commit 14503426557a870755453572221d95c92cff4bd2 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 16:00:30 2016 -0400 shortened generator command commit 5ebd8343cd4b87b3f5787dab665bff0de96846e1 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Sep 21 15:59:14 2016 -0400 feat: can disable fallback to generic parser (this will be useful for testing custom parsers)	8 years ago
Adam Pash	75b1880f01	chore: cleaned up unused files, slight reorg	8 years ago
Adam Pash	ad42055f8f	feat: switched test framework to jest	8 years ago
Adam Pash	2ae2dba690	chore: renamed iris to mercury	8 years ago