Commit Graph

48 Commits (fix-remove-moment-js)

Author SHA1 Message Date
Sarah Doire c0364ec52b
feat: update all fixtures and custom parsers to match (#713)
* feat: Refactor and update fixtures

This patch changes how fixtures are stored. Previously, a fixture's folder identified its domain and its filename identified when it was fetched. This has been changed so that the filename indicates the domain and the modified time of the file indicates how recently it was fetched. A fixture's filename can optionally include a modifier to distinguish between two different page types on the same domain, for example.

Also included here are changes to the update-fixture script, both to accomodate the new filename scheme as well as to actually update all fixtures. The functionality for running automatically and opening PRs has been removed but will likely be reintroduced.

Finally, all fixtures have been updated.

* Remove reference to deleted extractor

* feat: first batch of test and parser updates due to new fixtures

* feat: update more custom parsers and unit tests

* feat: update more custom parsers and unit tests and remove unnecessary parser

* feat: update more custom parsers and unit tests

* feat: update more parsers and add correct bloomberg html files

* fix: remove console statement

* feat: all parsers updated and tests passing

* fix: update date_published tests to account for test server time difference

* fix: cleanup remaining fixtures in folders

* feat: move fixtures for newest custom parsers

* feat: remove script changes

* fix: update dist files to account for reverting script changes

* adding .DS_Store to .gitignore

* adding .DS_Store to .gitignore -- 2

* adding .DS_Store to .gitignore -- 3 lol

* cleaning up some tests

* fix: ran build:generator command to update generate-custom-parser dist file

* fix: update rollup configs to generate source maps and update source maps

* fix: use underscore in place of unused error variable

* fix: remove unused fixture

Co-authored-by: Postlight Bot <adam.pash+postlight-bot@postlight.com>
Co-authored-by: flbn <overasc@gmail.com>
2 years ago
Michael Ashley ab401822aa
maintenance update - october 2022 (#696)
* fix: add alternative word count method

* fix: replace pages_rendered key with rendered_pages for consistency

* fix: return first lead_image_url when multiple og:image present

* fix: properly pull image src from lazy loaded img

* fix: allow drop cap character in medium custom extractor

* fix: refined medium parser
2 years ago
John Holdun 112846f74f
chore: Inline test fixtures (#683)
Not to be confused with extractor fixtures, which are snapshots of a webpage.

This change removes the pattern of separate JS files that provide "fixtures" for tests, which are used as provided or expected strings in tests. They were inconsistent and disorganized, and generally just served to add indirection to test files. So now all those strings are defined where they are used in their respective tests.
2 years ago
Ethan Jucovy af9cfcd120
fix: don't try to re-decode prepared response (#498)
* fix: don't try to re-decode prepared response

* Remove stray console.log
2 years ago
Toufic Mouallem 939d181951 fix: support query strings in lazy-loaded srcsets (#387) 5 years ago
david0leong 694ea820aa Custom Extractor for clinicaltrials.gov (#305)
* Add prototype of custom extractor for clinicaltrials.gov

* Add .DS_Store to gitignore

* Make tests for title, author and date_published selectors pass

* Make content selector test pass

* Fix date_published test

* Rebuild

* Remove .DS-Store from gitignore

* Improve extractor and text/fixture of clinicaltrials.gov
5 years ago
John Holdun 437f50a5c8 fix: Initialize Content-Type as empty string if not present (#359) 5 years ago
Toufic Mouallem 262dda94b3 fix: explicity reject non-200 status codes (#342) 5 years ago
Toufic Mouallem 144a797564
feat: Support passing custom headers in requests (#337) 5 years ago
Toufic Mouallem 136d6df798
feat: Return specific errors on failed parse attempts 5 years ago
Toufic Mouallem 0940971069 fix: better handling for responsive images (#312) 5 years ago
Drew Bell 785a22245f feat: switch from forked request to postman-request (#319) 5 years ago
Adam Pash 2afd8c9fa8
fix: jquery doesn't like the case insensitive selector (#274) 5 years ago
Ben Ubois 0e27448866 feat: Various Character Encoding Improvements (#270)
* Support HTML5 charset tag

In HTML5 `<meta charset="">` is shorthand for `<meta http-equiv="content-type" content="">`
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta

* Handle more character encoding declaration methods.
5 years ago
Adam Pash 663cc45bf4
fresh run of prettier; remove NOTES.md (#233) 5 years ago
Adam Pash 0e22947e2c
fix: non-forked packages breaking web build (#225) 5 years ago
Ralph Jbeily f3f6e21fd8 fix: author and date published selectors (#189) 5 years ago
Adam Pash 76d333f0be
deps: upgrade (#218) 5 years ago
Adam Pash e4b057f9ea
chore: update node and some deps (#209)
* chore: update .nvmrc

* added prettier and pre-commit hooks

* update docker image to new node

* add karma-cli to get web tests working

* explictly install karma... seems to fix problem

* remove pre-built phantomjs

* swap install order
5 years ago
Adam Pash 96640e3564
fix: failing fetchResource test (#187)
I think was a fixture problem
6 years ago
Kevin Ngao f2e3f055c2 Fixes an issue with encoding (#154)
* fix: fixes an issue with encoding on the fetch level
7 years ago
Kevin Ngao afbef9bc39 Fix Encoding on Body (#143)
* fix: check encoding on body
7 years ago
Adam Pash 8662474d8a feat: changed user agent to latest chrome (#121)
* feat: changed user agent to latest chrome

* removed dead link
7 years ago
Adam Pash 64c0fad2fd fix: preserve whitespace (#51)
No longer normalizing whitespace in html
8 years ago
Adam Pash 7411922c55 feat: encoding response body based on content-type charset (#21)
Also some small code organization
8 years ago
Adam Pash 60a6861e18 Feat: browser support (#19)
Big undertaking to support Mercury in the browser. Builds are working and all tests are passing both for web and node builds. Most code is closely shared.
8 years ago
Adam Pash eaea57461a fix: servers returning bad headers was breaking request. temporarily (#20)
using fork with a fix for this until request merges the necessary pull request
8 years ago
Adam Pash 629eada1f7 feat: recording/playing back network requests with nock (#18)
* feat: recording/playing back network requests with nock

* lint fix
8 years ago
Adam Pash 65c641a879 feat: enforcing line break rules in linter 8 years ago
Adam Pash 63c06c8a00 fix: babel-polyfill mess (I think) 8 years ago
Adam Pash eb0aa0b1f6 feat: some small tweaks to toy's excellent parsers ☺️
Squashed commit of the following:

commit 9638220124a325322d6cda7d16c645185d5fe827
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Oct 10 11:02:29 2016 -0700

    fix: removed eslint plugin that was adding unneded async parens

commit ce2268c0f7c1b093c06f156730a0f1bc2aaba39c
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Oct 10 10:47:36 2016 -0700

    style: fix async in parens

commit 9591856915eddaf93170da1ce9225b8a378bdf55
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Oct 10 10:37:11 2016 -0700

    fix: remove parens around async

commit 6c56054717acc1f7e5499691780f8273f6d07bac
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Oct 10 10:35:50 2016 -0700

    fix msn fixture; adjusted yahoo test

commit 4fc117ad5fdc5528f29b0873d60a6a1709642f15
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Oct 10 10:14:38 2016 -0700

    removed dek and date_publised tests; neither exist in littlethings

commit 401094b4abc52901255fd2461f5839624f11d8a3
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Oct 10 10:08:44 2016 -0700

    feat: updated buzzfeed for content extraction

commit 19548a5485f70ff9b65e3e725d2364d07734ac9c
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Oct 10 09:54:30 2016 -0700

    fix: generator should make transforms an object, not array

commit b92113f9f7c97aca9e6d3ce9243abac967d26b63
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Oct 10 08:54:38 2016 -0700

    feat: updated politico

commit c026591040f7671cb2a6dd5177a995e21d015482
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Oct 10 08:48:52 2016 -0700

    fix: typos

commit 14aa8fa4ce38ff1c2a212cd0225437ae3042c2c3
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Oct 10 08:36:12 2016 -0700

    fix: incorrect command in readme

commit fe260e6122877e2cb0130a1ecde0e503017057a3
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Oct 10 08:31:11 2016 -0700

    fix: removed dek test because there is no dek on wikia
8 years ago
Adam Pash 75b1880f01 chore: cleaned up unused files, slight reorg 8 years ago
Adam Pash ad42055f8f feat: switched test framework to jest 8 years ago
Adam Pash cbd0636dcf chore: cleaned up python and other unneeded comments 8 years ago
Adam Pash bf13b38a9b feat: some basic error handling for bad urls 8 years ago
Adam Pash 4cdc4165d6 fix: encodeURI before fetching 8 years ago
Adam Pash 1343469b6c fix: explicit/better decoding of gzipped content 8 years ago
Adam Pash 7e2a34945f chore: refactored and linted 8 years ago
Adam Pash 8fe3bec6b6 fix: accepting cookies with request (required for sites like
nytimes.com)
8 years ago
Adam Pash edfb54c532 feat: links are rewritten to absolute in cleaner
Squashed commit of the following:

commit 9057d411a5458f80c316604559c469a239ef3a40
Author: Adam Pash <adam.pash@gmail.com>
Date:   Fri Sep 9 11:42:19 2016 -0400

    feat: links are rewritten to absolute in cleaner
8 years ago
Adam Pash 91881df523 refactor: cleaners now run on custom extractors
Squashed commit of the following:

commit e4c7d1d149d1846f0d589b3653655b81b477c682
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 8 19:29:26 2016 -0400

    refactor: cleaners now run on custom extractors

commit ca08d2482c54bf6a40f50758da9353f00987a4d7
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 8 14:42:19 2016 -0400

    moved cleaners, refactored as necessary

commit ec2c5d36410b255c6d8ee264deca990c46709c3c
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 8 14:07:01 2016 -0400

    moved datePublished cleaner

commit 5e55e397eecb3e88d64cd2aa2c6071c9cffed272
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 8 13:34:21 2016 -0400

    moved dek cleaner

commit 2dfb0c44d7882336992fdc864792df6eac094c21
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 8 13:29:37 2016 -0400

    moved lead-image-url

commit cef7a213b80ddd671249225622f1388f9e68896c
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 8 13:26:20 2016 -0400

    moved author
8 years ago
Adam Pash c40b702b93 clean formatting 8 years ago
Adam Pash dfb5334f18 fix: encoding request response as null
This fixes an issue with gzipped content
8 years ago
Adam Pash ddc684c7d3 updated constants 8 years ago
Adam Pash 189361dc20 cleanup 8 years ago
Adam Pash ac62e0fba0 fix: pre-loading html in resource 8 years ago
Adam Pash 86b2ee194c feat: can pass in raw html if already fetched 8 years ago
Adam Pash 8da2425e59 feat: resource fetches content from a URL and prepares for parsing
Squashed commit of the following:

commit 7ba2d2b36d175f5ccbc02f918322ea0dd44bf2c1
Author: Adam Pash <adam.pash@gmail.com>
Date:   Tue Sep 6 17:55:10 2016 -0400

    feat: resource fetches content from a URL and prepares for parsing

commit 0abdfa49eed5b363169070dac6d65d0a5818c918
Author: Adam Pash <adam.pash@gmail.com>
Date:   Tue Sep 6 17:54:07 2016 -0400

    fix: this was messing up double Esses ('ss', as in class => cla)

commit 9dc65a99631e3a68267a68b2b4629c4be8f61546
Author: Adam Pash <adam.pash@gmail.com>
Date:   Tue Sep 6 14:58:57 2016 -0400

    fix: test suite working w/new dirs

commit 993dc33a5229bfa22ea998e3c4fe105be9d91c21
Author: Adam Pash <adam.pash@gmail.com>
Date:   Tue Sep 6 14:49:39 2016 -0400

    feat: convertLazyLoadedImages puts img urls in the src

commit e7fb105443dd16d036e460ad21fbcb47191f475b
Author: Adam Pash <adam.pash@gmail.com>
Date:   Tue Sep 6 14:30:43 2016 -0400

    feat: makeLinksAbsolute to fully qualify urls

commit dbd665078af854efe84bbbfe9b55acd02e1a652f
Author: Adam Pash <adam.pash@gmail.com>
Date:   Tue Sep 6 13:38:33 2016 -0400

    feat: fetchResource to fetch a url and validate the response

commit 42d3937c8f0f8df693996c2edee93625f13dced7
Author: Adam Pash <adam.pash@gmail.com>
Date:   Tue Sep 6 10:25:34 2016 -0400

    feat: normalizing meta tags
8 years ago