Commit Graph

22 Commits (fix-remove-moment-js)

Author SHA1 Message Date
Sarah Doire c0364ec52b
feat: update all fixtures and custom parsers to match (#713)
* feat: Refactor and update fixtures

This patch changes how fixtures are stored. Previously, a fixture's folder identified its domain and its filename identified when it was fetched. This has been changed so that the filename indicates the domain and the modified time of the file indicates how recently it was fetched. A fixture's filename can optionally include a modifier to distinguish between two different page types on the same domain, for example.

Also included here are changes to the update-fixture script, both to accomodate the new filename scheme as well as to actually update all fixtures. The functionality for running automatically and opening PRs has been removed but will likely be reintroduced.

Finally, all fixtures have been updated.

* Remove reference to deleted extractor

* feat: first batch of test and parser updates due to new fixtures

* feat: update more custom parsers and unit tests

* feat: update more custom parsers and unit tests and remove unnecessary parser

* feat: update more custom parsers and unit tests

* feat: update more parsers and add correct bloomberg html files

* fix: remove console statement

* feat: all parsers updated and tests passing

* fix: update date_published tests to account for test server time difference

* fix: cleanup remaining fixtures in folders

* feat: move fixtures for newest custom parsers

* feat: remove script changes

* fix: update dist files to account for reverting script changes

* adding .DS_Store to .gitignore

* adding .DS_Store to .gitignore -- 2

* adding .DS_Store to .gitignore -- 3 lol

* cleaning up some tests

* fix: ran build:generator command to update generate-custom-parser dist file

* fix: update rollup configs to generate source maps and update source maps

* fix: use underscore in place of unused error variable

* fix: remove unused fixture

Co-authored-by: Postlight Bot <adam.pash+postlight-bot@postlight.com>
Co-authored-by: flbn <overasc@gmail.com>
1 year ago
Michael Ashley ab401822aa
maintenance update - october 2022 (#696)
* fix: add alternative word count method

* fix: replace pages_rendered key with rendered_pages for consistency

* fix: return first lead_image_url when multiple og:image present

* fix: properly pull image src from lazy loaded img

* fix: allow drop cap character in medium custom extractor

* fix: refined medium parser
2 years ago
John Holdun 97472cf4f8
Change Name (#688)
Mercury Parser is now Postlight Parser!
2 years ago
John Holdun 112846f74f
chore: Inline test fixtures (#683)
Not to be confused with extractor fixtures, which are snapshots of a webpage.

This change removes the pattern of separate JS files that provide "fixtures" for tests, which are used as provided or expected strings in tests. They were inconsistent and disorganized, and generally just served to add indirection to test files. So now all those strings are defined where they are used in their respective tests.
2 years ago
Michael Ashley e12c916499
feat: ability to add custom extractors via api (#484)
* feat: ability to add custom extractors via api

* docs: updating readme

* fix: example.com was being used in another test

* fix: timezone was messing up date_published test

* fix: using a unique site for testing

* fix: updated custom extractor api

* docs: updating readme

* fix: removing unused fixture

* fix: updating test description

* feat: ability to add custom extractors via cli
5 years ago
Kirill Danshin 592f175270 tests: remove a duplicate test (#448) 5 years ago
Toufic Mouallem 262dda94b3 fix: explicity reject non-200 status codes (#342) 5 years ago
Drew Bell b3e2a0ffd1 feat: extract custom types with extend option (#313)
* feat: extract custom types with extend option

Adds an `extend` option that lets you add custom types to be extracted
and returned alongside the defaults, either in a call to `parse()` or in
a custom extractor.

```
Mercury.parse(
  url,
  extend: {
    last_edited: { selectors: ['#last-edited'], defaultCleaner: false }
  }
)
```

* chore: use Reflect.ownKeys

* feat: add CLI options

* doc: add extend param to cli help

* refactor: extract selectExtendedTypes

* feat: only overwrite null extended results

* feat: add allowMultiple extraction option

* feat: accept extendList CLI args

* feat: allow attribute selectors in extends on CLI

* test: update extend tests

* fix: don't invoke cleaner for custom types

* feat: always return array if allowMultiple

* test: add test for array of single result

* refactor: extract extractHtml

* refactor: destructure allowMultiple

* fix: wrap multiple matches in $ for cheerio shim

* fix: find extended types before any other munging

* feat: absolutize all links

* fix: clean content more directly

* doc: Update CLI docs in README

* chore: update dist

* doc: Document extend in custom extractor README
5 years ago
Toufic Mouallem 136d6df798
feat: Return specific errors on failed parse attempts 5 years ago
Ben Ubois ed14203e97 fix: return early if creating the resource failed. (#285) 5 years ago
Adam Pash 9bf88b0ba3
chore: refactor format output adjustments (#272)
I had previously done this in an overly complicated manner. This PR cleans
it up a bit.
5 years ago
Adam Pash 9b0664bc91
feat: add content format output options (#256) 5 years ago
Ralph Jbeily f3f6e21fd8 fix: author and date published selectors (#189) 5 years ago
Adam Pash e4b057f9ea
chore: update node and some deps (#209)
* chore: update .nvmrc

* added prettier and pre-commit hooks

* update docker image to new node

* add karma-cli to get web tests working

* explictly install karma... seems to fix problem

* remove pre-built phantomjs

* swap install order
5 years ago
Adam Pash 629eada1f7 feat: recording/playing back network requests with nock (#18)
* feat: recording/playing back network requests with nock

* lint fix
8 years ago
Adam Pash e325d860fd Feat: improving ci (#16)
This commit also swaps in yarn for npm and tweaks circle ci a bit.

* appveyor.yml first go

* changing node

* ps

* narrow it down

* trying this

* fix airbnb module

* trying with yarn

* logging

* hybrid?

* trying yarn w/circle

* bump workers?

* build off?

* updating script

* tweaking script for appveyor

* bumping maxworkers

* cleaning up

* build step?

* yarn it

* added appveyor badge
8 years ago
Adam Pash 17317823de fix: bug that stopped proper attr cleaning in certain cases 8 years ago
Adam Pash d3b11be473 feat: keeping youtube and vimeo iframe embeds (#14)
* feat: keeping youtube and vimeo iframe embeds

* fix: removing class from article correctly
8 years ago
Adam Pash 173f885674 feat: custom parser + generator + detailed readme instructions
Squashed commit of the following:

commit 02563daa67712c3679258ebebac60dfa9568dffb
Author: Adam Pash <adam.pash@gmail.com>
Date:   Fri Sep 30 12:25:44 2016 -0400

    updated readme, added newyorker parser for readme guide

commit 0ac613ef823efbffbf4cc9a89e5cb2489d1c4f6f
Author: Adam Pash <adam.pash@gmail.com>
Date:   Fri Sep 30 11:16:52 2016 -0400

    feat: updated parser so the saved fixture absolutizes urls

commit 85c7a2660b21f95c2205ca4a4378a7570687fed0
Author: Adam Pash <adam.pash@gmail.com>
Date:   Fri Sep 30 10:15:26 2016 -0400

    refactor: attribute selectors must be an array for custom extractors

commit f60f93d5d3d9b2f2d9ec6f28d27ae9dcf16ef01e
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 29 10:13:14 2016 -0400

    fix: whitelisting srcset and alt attributes

commit e31cb1f4e8a9fc9c3d9b20ef9f40ca6c8d6ad51a
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 29 09:44:21 2016 -0400

    some housekeeping for coverage tests

commit 39eafe420c776a1fe7f9fea634fb529a3ed75a71
Author: Adam Pash <adam.pash@gmail.com>
Date:   Wed Sep 28 17:52:08 2016 -0400

    fix: word count for multi-page articles

commit b04e0066b52f190481b1b604c64e3d0b1226ff02
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 22 10:40:23 2016 -0400

    major improvements to output

commit 3f3a880b63b47fe21953485da670b6e291ac60e5
Author: Adam Pash <adam.pash@gmail.com>
Date:   Wed Sep 21 17:27:53 2016 -0400

    updated test command

commit 14503426557a870755453572221d95c92cff4bd2
Author: Adam Pash <adam.pash@gmail.com>
Date:   Wed Sep 21 16:00:30 2016 -0400

    shortened generator command

commit 5ebd8343cd4b87b3f5787dab665bff0de96846e1
Author: Adam Pash <adam.pash@gmail.com>
Date:   Wed Sep 21 15:59:14 2016 -0400

    feat: can disable fallback to generic parser (this will be useful for testing custom parsers)
8 years ago
Adam Pash 75b1880f01 chore: cleaned up unused files, slight reorg 8 years ago
Adam Pash ad42055f8f feat: switched test framework to jest 8 years ago
Adam Pash 2ae2dba690 chore: renamed iris to mercury 8 years ago