Commit Graph

410 Commits

Author SHA1 Message Date
Toufic Mouallem
144a797564
feat: Support passing custom headers in requests (#337) 2019-03-26 13:48:41 +02:00
Toufic Mouallem
3ed778b53e fix: Adapt CNBC extractor to article redesign (#336) 2019-03-25 15:43:40 -07:00
Toufic Mouallem
da9606a4cb docs: Add parsing custom HTML to README.md (#326) 2019-03-25 15:40:51 -07:00
Drew Bell
b3e2a0ffd1 feat: extract custom types with extend option (#313)
* feat: extract custom types with extend option

Adds an `extend` option that lets you add custom types to be extracted
and returned alongside the defaults, either in a call to `parse()` or in
a custom extractor.

```
Mercury.parse(
  url,
  extend: {
    last_edited: { selectors: ['#last-edited'], defaultCleaner: false }
  }
)
```

* chore: use Reflect.ownKeys

* feat: add CLI options

* doc: add extend param to cli help

* refactor: extract selectExtendedTypes

* feat: only overwrite null extended results

* feat: add allowMultiple extraction option

* feat: accept extendList CLI args

* feat: allow attribute selectors in extends on CLI

* test: update extend tests

* fix: don't invoke cleaner for custom types

* feat: always return array if allowMultiple

* test: add test for array of single result

* refactor: extract extractHtml

* refactor: destructure allowMultiple

* fix: wrap multiple matches in $ for cheerio shim

* fix: find extended types before any other munging

* feat: absolutize all links

* fix: clean content more directly

* doc: Update CLI docs in README

* chore: update dist

* doc: Document extend in custom extractor README
2019-03-25 15:36:20 -07:00
Toufic Mouallem
136d6df798
feat: Return specific errors on failed parse attempts 2019-03-20 11:23:54 +02:00
Toufic Mouallem
a250f403f5 fix: Preserve whitespace in certain HTML elements (#333) 2019-03-19 09:43:29 -07:00
Adam Pash
2a3ade706d fix: run parser preview 2019-03-15 10:15:50 -07:00
Ben Ubois
a7e4c67d1d Extract content from GitHub repos. (#306)
* Extract content from GitHub repos.

* Add published and dek.

* Timezone fix.
2019-03-14 08:48:33 -07:00
Matthew Watkins
6e66887048 docs: add content formats to README.md (#318) 2019-03-12 08:37:38 -07:00
Toufic Mouallem
0940971069 fix: better handling for responsive images (#312) 2019-03-08 15:47:17 -08:00
Drew Bell
785a22245f feat: switch from forked request to postman-request (#319) 2019-03-08 14:46:45 -08:00
Toufic Mouallem
7844129fda feat: Add custom parser for Reddit (#307) 2019-03-08 14:37:24 -08:00
Drew Bell
13581cd899 feat: upgrade watchify to remove vulnerable hoek dep (#320) 2019-03-08 14:34:33 -08:00
Drew Bell
91fb0dfb46 fix: update parse signature in tests (#315) 2019-03-07 11:30:00 -08:00
Adam Pash
ffb25f34d7
docs: add usage gif (#308) 2019-03-05 11:37:56 -08:00
Toufic Mouallem
9714cb70c5 feat: Use Deadspin parser for all Kinja websites (#304) 2019-03-04 14:47:09 -08:00
Jordan Hotmann
83d1c2401b feat: add custom extractor for blisterreview.com (#299) 2019-03-01 16:48:26 -08:00
kik0220
d9a1e7b22b feat: add news.mynavi.jp custom parser (#287) 2019-03-01 16:45:32 -08:00
Olli Sulopuisto
44a7ec791d docs: typofix (#300) 2019-02-28 22:38:15 -08:00
Adam Pash
0a15a37f04
fix: ci artifact paths (#301) 2019-02-28 11:27:06 -08:00
Adam Pash
9698d9a0c4
dx: comment on custom parser pr fix (#278)
* dx: comment on custom parser pr fix

* fix path

* write json

* chore: rename comment script
2019-02-28 11:11:03 -08:00
Ben Ubois
ed14203e97 fix: return early if creating the resource failed. (#285) 2019-02-20 16:48:51 -08:00
greenkeeper[bot]
52dfdda553 Update mocha to the latest version 🚀 (#282)
* chore(package): update mocha to version 6.0.0

* chore(package): update lockfile yarn.lock
2019-02-19 13:31:40 -08:00
Adam Pash
b044cfa958
release: 2.0.0 (#275) 2019-02-13 15:46:45 -08:00
Adam Pash
2afd8c9fa8
fix: jquery doesn't like the case insensitive selector (#274) 2019-02-13 15:41:47 -08:00
Adam Pash
9bf88b0ba3
chore: refactor format output adjustments (#272)
I had previously done this in an overly complicated manner. This PR cleans
it up a bit.
2019-02-13 13:30:49 -08:00
David Brownman
867623ab33 chore: add files to package.json (#269) 2019-02-12 16:59:02 -08:00
Adam Pash
ab56ce0de3
fix: custom parser generator (#271)
- swap fs import
- fix rollup config
2019-02-12 16:14:47 -08:00
Ben Ubois
0e27448866 feat: Various Character Encoding Improvements (#270)
* Support HTML5 charset tag

In HTML5 `<meta charset="">` is shorthand for `<meta http-equiv="content-type" content="">`
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta

* Handle more character encoding declaration methods.
2019-02-12 15:15:19 -08:00
Madison Kanna
b3fa18b6d9 docs: delete extra semicolon (#266) 2019-02-11 15:44:00 -08:00
Adam Pash
e033835c72
fix: parse signature in cli (#259) 2019-02-07 17:03:42 -08:00
Adam Pash
32748ad4c5
dx: add .prettierignore (#258) 2019-02-07 16:59:43 -08:00
Adam Pash
2d0f10a888
dx: add .prettierignore (#257) 2019-02-07 16:50:45 -08:00
Adam Pash
9b0664bc91
feat: add content format output options (#256) 2019-02-07 16:48:13 -08:00
Adam Pash
a57f29eec3
release: 1.1.1 (#254)
see [changelog](./CHANGELOG.md) for changes.
2019-02-07 10:38:39 -08:00
George Haddad
b15948f3f4 chore: remove all-contributors-cli deps and script since no longer used (#253) 2019-02-07 08:19:12 -08:00
Adam Pash
02476f4336
docs: add instructions for cli to README (#251) 2019-02-06 09:46:13 -08:00
Adam Pash
b77a236dbe
feat: handle cli errors/timeout (#250) 2019-02-06 09:34:22 -08:00
Keith Mancuso
44edcda53f docs: added gitter badge (#249) 2019-02-06 08:18:55 -08:00
Paul Ford
cfd9b59345
docs: add custom parsers to README
Added paragraph about custom parsers to README with links to relevant code and documentation.
2019-02-06 10:10:56 -05:00
Adam Pash
d0726a2d32
chor: remove appveyor yml and badge (#247) 2019-02-05 15:32:42 -08:00
Adam Pash
03c7040065
fix: ci config (#246)
CI config needs filters on all jobs referenced from jobs with filters
2019-02-05 15:13:58 -08:00
Adam Pash
d884c3470c
release: 1.1.0 (#245) 2019-02-05 14:53:22 -08:00
Adam Pash
6844975c94
feat: add mercury-parser cli (#244) 2019-02-05 12:14:38 -08:00
greenkeeper[bot]
7bdbbc8ed8 deps: update dependencies to enable Greenkeeper 🌴 (#243) 2019-02-05 11:39:25 -08:00
Adam Pash
e38aff9c17
docs: add npm install instructions (#240) 2019-02-04 09:03:19 -08:00
Gina Trapani
dc3dff6584 docs: add hero to README (#239) 2019-02-01 18:01:39 -08:00
Adam Pash
15f7fa1e27 a more explicit .prettierrc 2019-02-01 14:11:08 -08:00
Adam Pash
c6f42c1278
docs: cleanup and update docs (#238) 2019-02-01 14:10:59 -08:00
Adam Pash
92de5ce4ed
docs: remove contributors (github already has this covered) (#237) 2019-01-31 09:50:38 -08:00