Commit Graph

77 Commits

Author SHA1 Message Date
Toufic Mouallem
144a797564
feat: Support passing custom headers in requests (#337) 2019-03-26 13:48:41 +02:00
Drew Bell
b3e2a0ffd1 feat: extract custom types with extend option (#313)
* feat: extract custom types with extend option

Adds an `extend` option that lets you add custom types to be extracted
and returned alongside the defaults, either in a call to `parse()` or in
a custom extractor.

```
Mercury.parse(
  url,
  extend: {
    last_edited: { selectors: ['#last-edited'], defaultCleaner: false }
  }
)
```

* chore: use Reflect.ownKeys

* feat: add CLI options

* doc: add extend param to cli help

* refactor: extract selectExtendedTypes

* feat: only overwrite null extended results

* feat: add allowMultiple extraction option

* feat: accept extendList CLI args

* feat: allow attribute selectors in extends on CLI

* test: update extend tests

* fix: don't invoke cleaner for custom types

* feat: always return array if allowMultiple

* test: add test for array of single result

* refactor: extract extractHtml

* refactor: destructure allowMultiple

* fix: wrap multiple matches in $ for cheerio shim

* fix: find extended types before any other munging

* feat: absolutize all links

* fix: clean content more directly

* doc: Update CLI docs in README

* chore: update dist

* doc: Document extend in custom extractor README
2019-03-25 15:36:20 -07:00
Adam Pash
b044cfa958
release: 2.0.0 (#275) 2019-02-13 15:46:45 -08:00
Adam Pash
9bf88b0ba3
chore: refactor format output adjustments (#272)
I had previously done this in an overly complicated manner. This PR cleans
it up a bit.
2019-02-13 13:30:49 -08:00
Adam Pash
ab56ce0de3
fix: custom parser generator (#271)
- swap fs import
- fix rollup config
2019-02-12 16:14:47 -08:00
Adam Pash
9b0664bc91
feat: add content format output options (#256) 2019-02-07 16:48:13 -08:00
Adam Pash
d884c3470c
release: 1.1.0 (#245) 2019-02-05 14:53:22 -08:00
Adam Pash
76d333f0be
deps: upgrade (#218) 2019-01-23 09:54:42 -08:00
Adam Pash
c643666c88
dx: automate fixture updates (#197) 2019-01-15 15:41:18 -08:00
Adam Pash
fd6c9d4fa3
release: 1.0.13 (#183) 2018-10-12 15:01:42 -07:00
Adam Pash
7fcd9b62eb release: 1.0.12 (#173) 2017-04-10 16:10:52 -07:00
Jeremy Mack
5fcea1c5c3 fix: PARSING_NODE undefined (#172)
* fix: PARSING_NODE undefined

* chore: remove unused cleanup function/call
2017-04-10 15:55:21 -07:00
Adam Pash
a51cc81c27 release: 1.0.11 (#171) 2017-04-10 14:57:32 -07:00
Jeremy Mack
e92e798880 fix: viewport tags leaking to parent page (#170)
* fix: scrub meta viewport tags

They leak to the parent page when using the web version of Mercury
Parser.

* chore: build

* fix: keep DOM in memory to avoid conflicts
2017-04-10 14:35:23 -07:00
Adam Pash
86d6bd1dc1 release: 1.0.10 (#169) 2017-03-24 15:24:06 -07:00
Adam Pash
e56e8e24cd release: 1.0.9 (#167) 2017-03-23 13:39:46 -07:00
Adam Pash
321c087be6 release: 1.0.8 (#164) 2017-03-22 14:08:22 -07:00
Adam Pash
e267d57d78 release: 1.0.7 (#160) 2017-03-15 09:16:04 -07:00
Kevin Ngao
f2e3f055c2 Fixes an issue with encoding (#154)
* fix: fixes an issue with encoding on the fetch level
2017-03-10 17:40:31 -05:00
Kevin Ngao
afbef9bc39 Fix Encoding on Body (#143)
* fix: check encoding on body
2017-03-06 11:36:56 -05:00
Adam Pash
9d4c883d51 release: 1.0.6 (#142) 2017-02-09 08:58:49 -08:00
Adam Pash
601b0fac16 release: 1.0.5 (#136) 2017-02-01 15:39:19 -08:00
Adam Pash
31eb4f9222 Feat: LinkedIn parser (#123)
* feat: rebuild custom parser

* feat: linkedin custom parser
2017-01-26 10:11:10 -08:00
Adam Pash
dbc706410b release: 1.0.4 (#122) 2017-01-26 08:42:37 -08:00
Adam Pash
a710efd2d5 release: 1.0.3 (#62) 2016-12-09 12:15:40 -05:00
Adam Pash
332f85928f release: 1.0.2 (#54) 2016-12-06 14:51:01 -05:00
Adam Pash
15656cb3e1 Refactor: running tests more efficiently (#49)
Only running one parser per page we're testing rather than a parser per field we're testing.
2016-12-05 15:39:45 -05:00
Adam Pash
edcb7295d1 release: 1.0.1 (#48) 2016-12-02 16:14:07 -08:00
Adam Pash
e9a36d6ebd release: 1.0.0 so we can start doing proper releaes (#39) 2016-11-30 17:49:50 -08:00
Janet
c4d72fb735 feat: add money.cnn custom parser (#26)
* feat: add money.cnn custom parser

* added timezone to cnn custom parser
2016-11-29 15:13:29 -08:00
Adam Pash
6343946dd8 Feat: custom timezones (#29)
* using moment-timezone to allow custom timezones

* added tz to tmz, even though still so-so
2016-11-29 14:46:46 -08:00
Adam Pash
a8face796a Fix extension bugs (#23)
* feat: cleaning supplemental elements in nytimes (visible in web only)

closes https://github.com/postlight/mercury-reader-chrome-extension/issues/102

* wip

* fix: more generous date published bits

* feat: added washington post extractor (including figure transforms)

closes https://github.com/postlight/mercury-reader-chrome-extension/issues/100

* feat: cleaning zoom lightbox from gizmodo/kinja

* lint fix
2016-11-28 16:58:21 -08:00
Adam Pash
3a2f32b0eb feat: added tmz custom parser (#22) 2016-11-28 15:10:28 -08:00
Adam Pash
7411922c55 feat: encoding response body based on content-type charset (#21)
Also some small code organization
2016-11-22 10:44:27 -08:00
Adam Pash
60a6861e18 Feat: browser support (#19)
Big undertaking to support Mercury in the browser. Builds are working and all tests are passing both for web and node builds. Most code is closely shared.
2016-11-21 14:17:06 -08:00
Adam Pash
eaea57461a fix: servers returning bad headers was breaking request. temporarily (#20)
using fork with a fix for this until request merges the necessary pull request
2016-11-15 13:17:01 -08:00
Adam Pash
6e29848e9c feat: making yarn-friendly for package manager (#17)
* updated several commands; some fixes exposed by yarn upgrade

* removed unnec dep
2016-10-28 11:10:42 -07:00
Adam Pash
048d654417 feat: parser auto-generates name; lint is more specific 2016-10-27 14:54:38 -07:00
Adam Pash
4d1d950807 updated generator templates for new style of import/export. also some
adjustments for usability
2016-10-27 10:44:06 -07:00
Adam Pash
de5b120b79 feat: allowing extractors to support multiple domains 2016-10-27 09:20:53 -07:00
Adam Pash
d038a36544 feat: custom medium extractor 2016-10-27 08:47:25 -07:00
Adam Pash
b65b0c98b0 feat: supporting all GMG sites using DeadspinExtractor 2016-10-26 16:05:15 -07:00
Adam Pash
17317823de fix: bug that stopped proper attr cleaning in certain cases 2016-10-26 14:17:52 -07:00
Adam Pash
40768fa188 feat: support lazy loading video on deadspin 2016-10-26 11:53:42 -07:00
Adam Pash
c63f500433 fix: narrowed selector to fix blogspot title selector 2016-10-26 11:16:31 -07:00
Adam Pash
d3b11be473 feat: keeping youtube and vimeo iframe embeds (#14)
* feat: keeping youtube and vimeo iframe embeds

* fix: removing class from article correctly
2016-10-26 11:14:37 -07:00
Adam Pash
5c7f2cd28e fix: better selector for nytimes authors 2016-10-17 18:55:58 -07:00
Drew Bell
76db95e884 feat: Add custom extrator for Apartment Therapy 2016-10-17 10:35:22 -05:00
Drew Bell
a708ad3b4f feat: Add custom parser for broadwayworld.com 2016-10-13 16:22:33 -05:00
Adam Pash
896021227d feat: added deadspin custom parser 2016-10-13 13:46:36 -07:00