kik0220
77e3bc00e2
feat: add wired.jp custom parser ( #409 )
...
* feat: add wired.jp custom parser
* fix: author test
* fix: date_published selector
* test: fix dek and contest
* test: fix content (without clean dek)
5 years ago
kik0220
0b36c96de0
feat: add techlog.iij.ad.jp custom parser ( #405 )
...
* feat: add techlog.iij.ad.jp custom parser
* fix: date_published and content selector
5 years ago
kik0220
406bf1b1a9
feat: add weekly.ascii.jp custom parser ( #401 )
...
* feat: add weekly.ascii.jp custom parser
* fix: title and date_published selector
5 years ago
kik0220
216bfade00
feat: add www.ipa.go.jp custom parser ( #408 )
5 years ago
kik0220
3ae8f3bde3
feat: add www.oreilly.co.jp custom parser ( #407 )
5 years ago
kik0220
7396e81b72
feat: add sect.iij.ad.jp custom parser ( #404 )
5 years ago
kik0220
3f1d9030ee
feat: add www.lifehacker.jp custom parser ( #403 )
5 years ago
kik0220
b077000c4a
feat: add getnews.jp custom parser ( #402 )
5 years ago
kik0220
b5425c3e8a
feat: add www.gizmodo.jp custom parser ( #400 )
5 years ago
kik0220
a38c727a0a
feat: add deadline.com custom parser ( #383 )
...
* feat: add deadline.com custom parser
* fix: timezone
* fix: date_published selectors
* fix: title and author selector
* test: transform .embed-twitter
* fix: regenerate the fixture and fix content selector
5 years ago
kik0220
74a3c49a3c
feat: add japan.cnet.com custom parser ( #382 )
...
* feat: add japan.cnet.com custom parser
* fix: remove transform
5 years ago
kik0220
7b07f88448
feat: add www.yomiuri.co.jp custom parser ( #381 )
5 years ago
Toufic Mouallem
3f46859d14
fix: skip absolutizing invalid srcsets ( #386 )
...
* fix: skip absolutizing empty srcsets
* test: empty srcsets are handled properly
5 years ago
kik0220
779c1154fb
fix: add date_published selector in www.sanwa.co.jp extractor ( #378 )
5 years ago
kik0220
ea5b65f019
fix: add date_published selector in www.elecom.co.jp extractor ( #377 )
5 years ago
kik0220
7c0949e587
fix: add date_published selector in www.ossnews.jp extractor ( #376 )
5 years ago
kik0220
3e91ac55db
fix: add date_published selector in jvndb.jvn.jp extractor ( #375 )
5 years ago
kik0220
8ca2894751
feat: add bookwalker.jp custom parser ( #374 )
5 years ago
kik0220
a5f06ce27a
feat: add takagi-hiromitsu.jp custom parser ( #364 )
5 years ago
kik0220
b9c57dbc2f
feat: add www.publickey1.jp custom parser ( #365 )
...
* feat: add www.publickey1.jp custom parser
* fix: date_published selector
5 years ago
kik0220
d7dbea8a95
feat: add www.itmedia.co.jp custom parser ( #366 )
...
* feat: add www.itmedia.co.jp custom parser
* feat: add nlab.itmedia.co.jp support
* fix: title selectors
5 years ago
kik0220
9218f80da6
feat: add www.moongift.jp custom parser ( #367 )
...
* feat: add www.moongift.jp custom parser
* fix: date_published selectors
* fix: pass test
* fix: add timezone
5 years ago
kik0220
4eb73dffb0
feat: add www.infoq.com custom parser ( #368 )
...
* feat: add www.infoq.com custom parser
* fix: date_published selector
5 years ago
kik0220
ce5cd2dd0d
feat: add phpspot.org custom parser ( #369 )
...
* feat: add phpspot.org custom parser
* fix: date_published selector
5 years ago
Adam Pash
ca47f9c7a7
release: 2.1.0 ( #373 )
5 years ago
Toufic Mouallem
3614e31abc
fix: skip absolutizing empty hrefs ( #372 )
5 years ago
kik0220
73be0c5a10
feat: add www.jnsa.org custom parser ( #346 )
...
* feat: add www.jnsa.org custom parser
5 years ago
Adam Pash
eacd1ee97f
feat: custom genius parser. ( #284 )
...
also adds ability to transform value returned by an attribute selector
5 years ago
kik0220
c389c966d7
feat: add jvndb.jvn.jp custom parser ( #345 )
5 years ago
kik0220
8493d05cb5
feat: add scan.netsecurity.ne.jp custom parser ( #347 )
5 years ago
kik0220
2a76c6c212
feat: add www.elecom.co.jp custom parser ( #348 )
5 years ago
kik0220
a9e010b718
feat: add www.sanwa.co.jp custom parser ( #349 )
5 years ago
kik0220
1639eae324
feat: add www.asahi.com custom parser ( #350 )
5 years ago
kik0220
21f7de70c1
feat: add buzzap.jp custom parser ( #351 )
5 years ago
kik0220
f3a7e393a3
feat: add www.ossnews.jp custom parser ( #352 )
5 years ago
kik0220
c309bdb373
feat: add otrs.com custom parser ( #353 )
5 years ago
Alexsander Akers
71c4d05037
Include "src/shims" for webpack builds for web ( #302 )
5 years ago
Frankie Simms
a3fe02678c
chore: small CoC typofix ( #358 )
5 years ago
John Holdun
437f50a5c8
fix: Initialize Content-Type as empty string if not present ( #359 )
5 years ago
Frankie Simms
da9a836eab
chore: remove unneeded import ( #357 )
5 years ago
Frankie Simms
bafa764000
chore: set up ciftr for failed test reports ( #343 )
5 years ago
Toufic Mouallem
262dda94b3
fix: explicity reject non-200 status codes ( #342 )
5 years ago
Drew Bell
b6c82f2b16
doc: fix extend typo in README ( #340 )
5 years ago
Toufic Mouallem
144a797564
feat: Support passing custom headers in requests ( #337 )
5 years ago
Toufic Mouallem
3ed778b53e
fix: Adapt CNBC extractor to article redesign ( #336 )
5 years ago
Toufic Mouallem
da9606a4cb
docs: Add parsing custom HTML to README.md ( #326 )
5 years ago
Drew Bell
b3e2a0ffd1
feat: extract custom types with extend option ( #313 )
...
* feat: extract custom types with extend option
Adds an `extend` option that lets you add custom types to be extracted
and returned alongside the defaults, either in a call to `parse()` or in
a custom extractor.
```
Mercury.parse(
url,
extend: {
last_edited: { selectors: ['#last-edited'], defaultCleaner: false }
}
)
```
* chore: use Reflect.ownKeys
* feat: add CLI options
* doc: add extend param to cli help
* refactor: extract selectExtendedTypes
* feat: only overwrite null extended results
* feat: add allowMultiple extraction option
* feat: accept extendList CLI args
* feat: allow attribute selectors in extends on CLI
* test: update extend tests
* fix: don't invoke cleaner for custom types
* feat: always return array if allowMultiple
* test: add test for array of single result
* refactor: extract extractHtml
* refactor: destructure allowMultiple
* fix: wrap multiple matches in $ for cheerio shim
* fix: find extended types before any other munging
* feat: absolutize all links
* fix: clean content more directly
* doc: Update CLI docs in README
* chore: update dist
* doc: Document extend in custom extractor README
5 years ago
Toufic Mouallem
136d6df798
feat: Return specific errors on failed parse attempts
5 years ago
Toufic Mouallem
a250f403f5
fix: Preserve whitespace in certain HTML elements ( #333 )
5 years ago
Adam Pash
2a3ade706d
fix: run parser preview
5 years ago