* feat: Refactor and update fixtures
This patch changes how fixtures are stored. Previously, a fixture's folder identified its domain and its filename identified when it was fetched. This has been changed so that the filename indicates the domain and the modified time of the file indicates how recently it was fetched. A fixture's filename can optionally include a modifier to distinguish between two different page types on the same domain, for example.
Also included here are changes to the update-fixture script, both to accomodate the new filename scheme as well as to actually update all fixtures. The functionality for running automatically and opening PRs has been removed but will likely be reintroduced.
Finally, all fixtures have been updated.
* Remove reference to deleted extractor
* feat: first batch of test and parser updates due to new fixtures
* feat: update more custom parsers and unit tests
* feat: update more custom parsers and unit tests and remove unnecessary parser
* feat: update more custom parsers and unit tests
* feat: update more parsers and add correct bloomberg html files
* fix: remove console statement
* feat: all parsers updated and tests passing
* fix: update date_published tests to account for test server time difference
* fix: cleanup remaining fixtures in folders
* feat: move fixtures for newest custom parsers
* feat: remove script changes
* fix: update dist files to account for reverting script changes
* adding .DS_Store to .gitignore
* adding .DS_Store to .gitignore -- 2
* adding .DS_Store to .gitignore -- 3 lol
* cleaning up some tests
* fix: ran build:generator command to update generate-custom-parser dist file
* fix: update rollup configs to generate source maps and update source maps
* fix: use underscore in place of unused error variable
* fix: remove unused fixture
Co-authored-by: Postlight Bot <adam.pash+postlight-bot@postlight.com>
Co-authored-by: flbn <overasc@gmail.com>
* fix: add alternative word count method
* fix: replace pages_rendered key with rendered_pages for consistency
* fix: return first lead_image_url when multiple og:image present
* fix: properly pull image src from lazy loaded img
* fix: allow drop cap character in medium custom extractor
* fix: refined medium parser
Not to be confused with extractor fixtures, which are snapshots of a webpage.
This change removes the pattern of separate JS files that provide "fixtures" for tests, which are used as provided or expected strings in tests. They were inconsistent and disorganized, and generally just served to add indirection to test files. So now all those strings are defined where they are used in their respective tests.
* Add prototype of custom extractor for clinicaltrials.gov
* Add .DS_Store to gitignore
* Make tests for title, author and date_published selectors pass
* Make content selector test pass
* Fix date_published test
* Rebuild
* Remove .DS-Store from gitignore
* Improve extractor and text/fixture of clinicaltrials.gov
* chore: update .nvmrc
* added prettier and pre-commit hooks
* update docker image to new node
* add karma-cli to get web tests working
* explictly install karma... seems to fix problem
* remove pre-built phantomjs
* swap install order
Big undertaking to support Mercury in the browser. Builds are working and all tests are passing both for web and node builds. Most code is closely shared.
Squashed commit of the following:
commit 9638220124a325322d6cda7d16c645185d5fe827
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 11:02:29 2016 -0700
fix: removed eslint plugin that was adding unneded async parens
commit ce2268c0f7c1b093c06f156730a0f1bc2aaba39c
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 10:47:36 2016 -0700
style: fix async in parens
commit 9591856915eddaf93170da1ce9225b8a378bdf55
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 10:37:11 2016 -0700
fix: remove parens around async
commit 6c56054717acc1f7e5499691780f8273f6d07bac
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 10:35:50 2016 -0700
fix msn fixture; adjusted yahoo test
commit 4fc117ad5fdc5528f29b0873d60a6a1709642f15
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 10:14:38 2016 -0700
removed dek and date_publised tests; neither exist in littlethings
commit 401094b4abc52901255fd2461f5839624f11d8a3
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 10:08:44 2016 -0700
feat: updated buzzfeed for content extraction
commit 19548a5485f70ff9b65e3e725d2364d07734ac9c
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 09:54:30 2016 -0700
fix: generator should make transforms an object, not array
commit b92113f9f7c97aca9e6d3ce9243abac967d26b63
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 08:54:38 2016 -0700
feat: updated politico
commit c026591040f7671cb2a6dd5177a995e21d015482
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 08:48:52 2016 -0700
fix: typos
commit 14aa8fa4ce38ff1c2a212cd0225437ae3042c2c3
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 08:36:12 2016 -0700
fix: incorrect command in readme
commit fe260e6122877e2cb0130a1ecde0e503017057a3
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 08:31:11 2016 -0700
fix: removed dek test because there is no dek on wikia
Squashed commit of the following:
commit 9057d411a5458f80c316604559c469a239ef3a40
Author: Adam Pash <adam.pash@gmail.com>
Date: Fri Sep 9 11:42:19 2016 -0400
feat: links are rewritten to absolute in cleaner