* feat: Refactor and update fixtures
This patch changes how fixtures are stored. Previously, a fixture's folder identified its domain and its filename identified when it was fetched. This has been changed so that the filename indicates the domain and the modified time of the file indicates how recently it was fetched. A fixture's filename can optionally include a modifier to distinguish between two different page types on the same domain, for example.
Also included here are changes to the update-fixture script, both to accomodate the new filename scheme as well as to actually update all fixtures. The functionality for running automatically and opening PRs has been removed but will likely be reintroduced.
Finally, all fixtures have been updated.
* Remove reference to deleted extractor
* feat: first batch of test and parser updates due to new fixtures
* feat: update more custom parsers and unit tests
* feat: update more custom parsers and unit tests and remove unnecessary parser
* feat: update more custom parsers and unit tests
* feat: update more parsers and add correct bloomberg html files
* fix: remove console statement
* feat: all parsers updated and tests passing
* fix: update date_published tests to account for test server time difference
* fix: cleanup remaining fixtures in folders
* feat: move fixtures for newest custom parsers
* feat: remove script changes
* fix: update dist files to account for reverting script changes
* adding .DS_Store to .gitignore
* adding .DS_Store to .gitignore -- 2
* adding .DS_Store to .gitignore -- 3 lol
* cleaning up some tests
* fix: ran build:generator command to update generate-custom-parser dist file
* fix: update rollup configs to generate source maps and update source maps
* fix: use underscore in place of unused error variable
* fix: remove unused fixture
Co-authored-by: Postlight Bot <adam.pash+postlight-bot@postlight.com>
Co-authored-by: flbn <overasc@gmail.com>
* fix: add alternative word count method
* fix: replace pages_rendered key with rendered_pages for consistency
* fix: return first lead_image_url when multiple og:image present
* fix: properly pull image src from lazy loaded img
* fix: allow drop cap character in medium custom extractor
* fix: refined medium parser
Not to be confused with extractor fixtures, which are snapshots of a webpage.
This change removes the pattern of separate JS files that provide "fixtures" for tests, which are used as provided or expected strings in tests. They were inconsistent and disorganized, and generally just served to add indirection to test files. So now all those strings are defined where they are used in their respective tests.
* feat: ability to add custom extractors via api
* docs: updating readme
* fix: example.com was being used in another test
* fix: timezone was messing up date_published test
* fix: using a unique site for testing
* fix: updated custom extractor api
* docs: updating readme
* fix: removing unused fixture
* fix: updating test description
* feat: ability to add custom extractors via cli
* feat: extract custom types with extend option
Adds an `extend` option that lets you add custom types to be extracted
and returned alongside the defaults, either in a call to `parse()` or in
a custom extractor.
```
Mercury.parse(
url,
extend: {
last_edited: { selectors: ['#last-edited'], defaultCleaner: false }
}
)
```
* chore: use Reflect.ownKeys
* feat: add CLI options
* doc: add extend param to cli help
* refactor: extract selectExtendedTypes
* feat: only overwrite null extended results
* feat: add allowMultiple extraction option
* feat: accept extendList CLI args
* feat: allow attribute selectors in extends on CLI
* test: update extend tests
* fix: don't invoke cleaner for custom types
* feat: always return array if allowMultiple
* test: add test for array of single result
* refactor: extract extractHtml
* refactor: destructure allowMultiple
* fix: wrap multiple matches in $ for cheerio shim
* fix: find extended types before any other munging
* feat: absolutize all links
* fix: clean content more directly
* doc: Update CLI docs in README
* chore: update dist
* doc: Document extend in custom extractor README
* chore: update .nvmrc
* added prettier and pre-commit hooks
* update docker image to new node
* add karma-cli to get web tests working
* explictly install karma... seems to fix problem
* remove pre-built phantomjs
* swap install order
This commit also swaps in yarn for npm and tweaks circle ci a bit.
* appveyor.yml first go
* changing node
* ps
* narrow it down
* trying this
* fix airbnb module
* trying with yarn
* logging
* hybrid?
* trying yarn w/circle
* bump workers?
* build off?
* updating script
* tweaking script for appveyor
* bumping maxworkers
* cleaning up
* build step?
* yarn it
* added appveyor badge