Commit Graph

550 Commits

Author SHA1 Message Date
kik0220
216bfade00 feat: add www.ipa.go.jp custom parser (#408) 2019-05-03 13:40:42 +03:00
kik0220
3ae8f3bde3 feat: add www.oreilly.co.jp custom parser (#407) 2019-05-03 13:30:48 +03:00
kik0220
7396e81b72 feat: add sect.iij.ad.jp custom parser (#404) 2019-05-03 13:19:06 +03:00
kik0220
3f1d9030ee feat: add www.lifehacker.jp custom parser (#403) 2019-05-03 13:14:53 +03:00
kik0220
b077000c4a feat: add getnews.jp custom parser (#402) 2019-05-03 13:10:55 +03:00
kik0220
b5425c3e8a feat: add www.gizmodo.jp custom parser (#400) 2019-05-03 13:06:51 +03:00
kik0220
a38c727a0a feat: add deadline.com custom parser (#383)
* feat: add deadline.com custom parser

* fix: timezone

* fix: date_published selectors

* fix: title and author selector

* test: transform .embed-twitter

* fix: regenerate the fixture and fix content selector
2019-04-24 15:29:02 +03:00
kik0220
74a3c49a3c feat: add japan.cnet.com custom parser (#382)
* feat: add japan.cnet.com custom parser

* fix: remove transform
2019-04-24 14:39:54 +03:00
kik0220
7b07f88448 feat: add www.yomiuri.co.jp custom parser (#381) 2019-04-24 11:00:56 +03:00
Toufic Mouallem
3f46859d14
fix: skip absolutizing invalid srcsets (#386)
* fix: skip absolutizing empty srcsets

* test: empty srcsets are handled properly
2019-04-24 10:18:57 +03:00
kik0220
779c1154fb fix: add date_published selector in www.sanwa.co.jp extractor (#378) 2019-04-16 13:46:24 +03:00
kik0220
ea5b65f019 fix: add date_published selector in www.elecom.co.jp extractor (#377) 2019-04-16 13:41:40 +03:00
kik0220
7c0949e587 fix: add date_published selector in www.ossnews.jp extractor (#376) 2019-04-16 13:36:42 +03:00
kik0220
3e91ac55db fix: add date_published selector in jvndb.jvn.jp extractor (#375) 2019-04-16 13:32:41 +03:00
kik0220
8ca2894751 feat: add bookwalker.jp custom parser (#374) 2019-04-15 11:06:10 +03:00
kik0220
a5f06ce27a feat: add takagi-hiromitsu.jp custom parser (#364) 2019-04-12 18:11:05 +03:00
kik0220
b9c57dbc2f feat: add www.publickey1.jp custom parser (#365)
* feat: add www.publickey1.jp custom parser

* fix: date_published selector
2019-04-12 18:00:51 +03:00
kik0220
d7dbea8a95 feat: add www.itmedia.co.jp custom parser (#366)
* feat: add www.itmedia.co.jp custom parser

* feat: add nlab.itmedia.co.jp support

* fix: title selectors
2019-04-12 17:51:16 +03:00
kik0220
9218f80da6 feat: add www.moongift.jp custom parser (#367)
* feat: add www.moongift.jp custom parser

* fix: date_published selectors

* fix: pass test

* fix: add timezone
2019-04-12 17:40:55 +03:00
kik0220
4eb73dffb0 feat: add www.infoq.com custom parser (#368)
* feat: add www.infoq.com custom parser

* fix: date_published selector
2019-04-12 17:30:46 +03:00
kik0220
ce5cd2dd0d feat: add phpspot.org custom parser (#369)
* feat: add phpspot.org custom parser

* fix: date_published selector
2019-04-12 17:18:47 +03:00
Adam Pash
ca47f9c7a7
release: 2.1.0 (#373) 2019-04-10 08:42:10 -07:00
Toufic Mouallem
3614e31abc fix: skip absolutizing empty hrefs (#372) 2019-04-10 08:19:15 -07:00
kik0220
73be0c5a10 feat: add www.jnsa.org custom parser (#346)
* feat: add www.jnsa.org custom parser
2019-04-09 16:51:25 +03:00
Adam Pash
eacd1ee97f feat: custom genius parser. (#284)
also adds ability to transform value returned by an attribute selector
2019-04-09 12:49:24 +03:00
kik0220
c389c966d7 feat: add jvndb.jvn.jp custom parser (#345) 2019-04-09 12:05:03 +03:00
kik0220
8493d05cb5 feat: add scan.netsecurity.ne.jp custom parser (#347) 2019-04-09 11:59:27 +03:00
kik0220
2a76c6c212 feat: add www.elecom.co.jp custom parser (#348) 2019-04-09 11:54:57 +03:00
kik0220
a9e010b718 feat: add www.sanwa.co.jp custom parser (#349) 2019-04-09 11:50:48 +03:00
kik0220
1639eae324 feat: add www.asahi.com custom parser (#350) 2019-04-09 11:42:14 +03:00
kik0220
21f7de70c1 feat: add buzzap.jp custom parser (#351) 2019-04-09 11:35:40 +03:00
kik0220
f3a7e393a3 feat: add www.ossnews.jp custom parser (#352) 2019-04-09 11:30:56 +03:00
kik0220
c309bdb373 feat: add otrs.com custom parser (#353) 2019-04-09 11:17:58 +03:00
Alexsander Akers
71c4d05037 Include "src/shims" for webpack builds for web (#302) 2019-04-05 15:47:31 -07:00
Frankie Simms
a3fe02678c chore: small CoC typofix (#358) 2019-04-05 15:46:27 -07:00
John Holdun
437f50a5c8 fix: Initialize Content-Type as empty string if not present (#359) 2019-04-05 15:45:58 -07:00
Frankie Simms
da9a836eab chore: remove unneeded import (#357) 2019-04-03 10:30:15 -07:00
Frankie Simms
bafa764000 chore: set up ciftr for failed test reports (#343) 2019-04-01 14:25:01 -07:00
Toufic Mouallem
262dda94b3 fix: explicity reject non-200 status codes (#342) 2019-03-29 15:50:55 -07:00
Drew Bell
b6c82f2b16 doc: fix extend typo in README (#340) 2019-03-26 15:23:50 -07:00
Toufic Mouallem
144a797564
feat: Support passing custom headers in requests (#337) 2019-03-26 13:48:41 +02:00
Toufic Mouallem
3ed778b53e fix: Adapt CNBC extractor to article redesign (#336) 2019-03-25 15:43:40 -07:00
Toufic Mouallem
da9606a4cb docs: Add parsing custom HTML to README.md (#326) 2019-03-25 15:40:51 -07:00
Drew Bell
b3e2a0ffd1 feat: extract custom types with extend option (#313)
* feat: extract custom types with extend option

Adds an `extend` option that lets you add custom types to be extracted
and returned alongside the defaults, either in a call to `parse()` or in
a custom extractor.

```
Mercury.parse(
  url,
  extend: {
    last_edited: { selectors: ['#last-edited'], defaultCleaner: false }
  }
)
```

* chore: use Reflect.ownKeys

* feat: add CLI options

* doc: add extend param to cli help

* refactor: extract selectExtendedTypes

* feat: only overwrite null extended results

* feat: add allowMultiple extraction option

* feat: accept extendList CLI args

* feat: allow attribute selectors in extends on CLI

* test: update extend tests

* fix: don't invoke cleaner for custom types

* feat: always return array if allowMultiple

* test: add test for array of single result

* refactor: extract extractHtml

* refactor: destructure allowMultiple

* fix: wrap multiple matches in $ for cheerio shim

* fix: find extended types before any other munging

* feat: absolutize all links

* fix: clean content more directly

* doc: Update CLI docs in README

* chore: update dist

* doc: Document extend in custom extractor README
2019-03-25 15:36:20 -07:00
Toufic Mouallem
136d6df798
feat: Return specific errors on failed parse attempts 2019-03-20 11:23:54 +02:00
Toufic Mouallem
a250f403f5 fix: Preserve whitespace in certain HTML elements (#333) 2019-03-19 09:43:29 -07:00
Adam Pash
2a3ade706d fix: run parser preview 2019-03-15 10:15:50 -07:00
Ben Ubois
a7e4c67d1d Extract content from GitHub repos. (#306)
* Extract content from GitHub repos.

* Add published and dek.

* Timezone fix.
2019-03-14 08:48:33 -07:00
Matthew Watkins
6e66887048 docs: add content formats to README.md (#318) 2019-03-12 08:37:38 -07:00
Toufic Mouallem
0940971069 fix: better handling for responsive images (#312) 2019-03-08 15:47:17 -08:00