Commit Graph

239 Commits (a7cd9027e2008116c2cd3b66642c47e007a4ecd6)

Author SHA1 Message Date
Wajeeh Zantout 7c8de71c52 fix: new yorker extractor (#414)
* fix: new yorker extractor

* fix: date_published selector

* fix: remove footer from content

* feat: add additional selector for title

* feat: support article with multiple authors
5 years ago
Wajeeh Zantout e66ad8b81c feat: add le monde extractor (#415) 5 years ago
kik0220 f81dc63617 feat: add rbbtoday.com custom parser (#411)
* feat: add rbbtoday.com custom parser

* fix: content test

* fix: dek and content
5 years ago
kik0220 5e1113b3a9 feat: add japan.zdnet.com custom parser (#410)
* feat: add japan.zdnet.com custom parser

* fix: author and date_published selector
5 years ago
kik0220 77e3bc00e2 feat: add wired.jp custom parser (#409)
* feat: add wired.jp custom parser

* fix: author test

* fix: date_published selector

* test: fix dek and contest

* test: fix content (without clean dek)
5 years ago
kik0220 0b36c96de0 feat: add techlog.iij.ad.jp custom parser (#405)
* feat: add techlog.iij.ad.jp custom parser

* fix: date_published and content selector
5 years ago
kik0220 406bf1b1a9 feat: add weekly.ascii.jp custom parser (#401)
* feat: add weekly.ascii.jp custom parser

* fix: title and date_published selector
5 years ago
kik0220 216bfade00 feat: add www.ipa.go.jp custom parser (#408) 5 years ago
kik0220 3ae8f3bde3 feat: add www.oreilly.co.jp custom parser (#407) 5 years ago
kik0220 7396e81b72 feat: add sect.iij.ad.jp custom parser (#404) 5 years ago
kik0220 3f1d9030ee feat: add www.lifehacker.jp custom parser (#403) 5 years ago
kik0220 b077000c4a feat: add getnews.jp custom parser (#402) 5 years ago
kik0220 b5425c3e8a feat: add www.gizmodo.jp custom parser (#400) 5 years ago
kik0220 a38c727a0a feat: add deadline.com custom parser (#383)
* feat: add deadline.com custom parser

* fix: timezone

* fix: date_published selectors

* fix: title and author selector

* test: transform .embed-twitter

* fix: regenerate the fixture and fix content selector
6 years ago
kik0220 74a3c49a3c feat: add japan.cnet.com custom parser (#382)
* feat: add japan.cnet.com custom parser

* fix: remove transform
6 years ago
kik0220 7b07f88448 feat: add www.yomiuri.co.jp custom parser (#381) 6 years ago
kik0220 779c1154fb fix: add date_published selector in www.sanwa.co.jp extractor (#378) 6 years ago
kik0220 ea5b65f019 fix: add date_published selector in www.elecom.co.jp extractor (#377) 6 years ago
kik0220 7c0949e587 fix: add date_published selector in www.ossnews.jp extractor (#376) 6 years ago
kik0220 3e91ac55db fix: add date_published selector in jvndb.jvn.jp extractor (#375) 6 years ago
kik0220 8ca2894751 feat: add bookwalker.jp custom parser (#374) 6 years ago
kik0220 a5f06ce27a feat: add takagi-hiromitsu.jp custom parser (#364) 6 years ago
kik0220 b9c57dbc2f feat: add www.publickey1.jp custom parser (#365)
* feat: add www.publickey1.jp custom parser

* fix: date_published selector
6 years ago
kik0220 d7dbea8a95 feat: add www.itmedia.co.jp custom parser (#366)
* feat: add www.itmedia.co.jp custom parser

* feat: add nlab.itmedia.co.jp support

* fix: title selectors
6 years ago
kik0220 9218f80da6 feat: add www.moongift.jp custom parser (#367)
* feat: add www.moongift.jp custom parser

* fix: date_published selectors

* fix: pass test

* fix: add timezone
6 years ago
kik0220 4eb73dffb0 feat: add www.infoq.com custom parser (#368)
* feat: add www.infoq.com custom parser

* fix: date_published selector
6 years ago
kik0220 ce5cd2dd0d feat: add phpspot.org custom parser (#369)
* feat: add phpspot.org custom parser

* fix: date_published selector
6 years ago
kik0220 73be0c5a10 feat: add www.jnsa.org custom parser (#346)
* feat: add www.jnsa.org custom parser
6 years ago
Adam Pash eacd1ee97f feat: custom genius parser. (#284)
also adds ability to transform value returned by an attribute selector
6 years ago
kik0220 c389c966d7 feat: add jvndb.jvn.jp custom parser (#345) 6 years ago
kik0220 8493d05cb5 feat: add scan.netsecurity.ne.jp custom parser (#347) 6 years ago
kik0220 2a76c6c212 feat: add www.elecom.co.jp custom parser (#348) 6 years ago
kik0220 a9e010b718 feat: add www.sanwa.co.jp custom parser (#349) 6 years ago
kik0220 1639eae324 feat: add www.asahi.com custom parser (#350) 6 years ago
kik0220 21f7de70c1 feat: add buzzap.jp custom parser (#351) 6 years ago
kik0220 f3a7e393a3 feat: add www.ossnews.jp custom parser (#352) 6 years ago
kik0220 c309bdb373 feat: add otrs.com custom parser (#353) 6 years ago
Toufic Mouallem 3ed778b53e fix: Adapt CNBC extractor to article redesign (#336) 6 years ago
Drew Bell b3e2a0ffd1 feat: extract custom types with extend option (#313)
* feat: extract custom types with extend option

Adds an `extend` option that lets you add custom types to be extracted
and returned alongside the defaults, either in a call to `parse()` or in
a custom extractor.

```
Mercury.parse(
  url,
  extend: {
    last_edited: { selectors: ['#last-edited'], defaultCleaner: false }
  }
)
```

* chore: use Reflect.ownKeys

* feat: add CLI options

* doc: add extend param to cli help

* refactor: extract selectExtendedTypes

* feat: only overwrite null extended results

* feat: add allowMultiple extraction option

* feat: accept extendList CLI args

* feat: allow attribute selectors in extends on CLI

* test: update extend tests

* fix: don't invoke cleaner for custom types

* feat: always return array if allowMultiple

* test: add test for array of single result

* refactor: extract extractHtml

* refactor: destructure allowMultiple

* fix: wrap multiple matches in $ for cheerio shim

* fix: find extended types before any other munging

* feat: absolutize all links

* fix: clean content more directly

* doc: Update CLI docs in README

* chore: update dist

* doc: Document extend in custom extractor README
6 years ago
Ben Ubois a7e4c67d1d Extract content from GitHub repos. (#306)
* Extract content from GitHub repos.

* Add published and dek.

* Timezone fix.
6 years ago
Toufic Mouallem 7844129fda feat: Add custom parser for Reddit (#307) 6 years ago
Drew Bell 91fb0dfb46 fix: update parse signature in tests (#315) 6 years ago
Toufic Mouallem 9714cb70c5 feat: Use Deadspin parser for all Kinja websites (#304) 6 years ago
Jordan Hotmann 83d1c2401b feat: add custom extractor for blisterreview.com (#299) 6 years ago
kik0220 d9a1e7b22b feat: add news.mynavi.jp custom parser (#287) 6 years ago
Olli Sulopuisto 44a7ec791d docs: typofix (#300) 6 years ago
Adam Pash 9bf88b0ba3
chore: refactor format output adjustments (#272)
I had previously done this in an overly complicated manner. This PR cleans
it up a bit.
6 years ago
Adam Pash 9b0664bc91
feat: add content format output options (#256) 6 years ago
Adam Pash c6f42c1278
docs: cleanup and update docs (#238) 6 years ago
George Haddad 5c0325f5a7
feat: hook up ci to publish to npm (#226)
* chore: add missing fields to  package.json

* feat: add postlight org scope to package name

* feat: automate npm publish

* test: npm publish without filters

* fix: add docker image

* test: change directory

* test: add working directory

* fix: defaults syntax

* test: add workspace

* fix: attach workspace

* fix: use standard mercury email

* fix: use ISO time format and preserve original timezone offset

* fix: do not match time zone offset

* chore: move babel runtime-corejs2 to prod deps

* chore: uncomment config to deploy on git tag

* feat: publish to npm public

* adding browser-request

It doesn't seem to impact the build, but technically it should be there
so for good measure, why not...

* chore: roll version back to original state
6 years ago