Michael Ashley
e12c916499
feat: ability to add custom extractors via api ( #484 )
...
* feat: ability to add custom extractors via api
* docs: updating readme
* fix: example.com was being used in another test
* fix: timezone was messing up date_published test
* fix: using a unique site for testing
* fix: updated custom extractor api
* docs: updating readme
* fix: removing unused fixture
* fix: updating test description
* feat: ability to add custom extractors via cli
2019-09-04 07:32:28 -07:00
Sven Wiegand
f95947fe88
Implemented custom extractor epaper.zeit.de ( #488 )
2019-08-28 07:15:14 -07:00
Michael Ashley
2422e4717d
fix: incorrect parsing on medium.com ( #477 )
...
* fix: medium extractor now pulls content
* fix: remove youtube caption if no preview available
* fix: remove youtube node if no image
* fix: removing dek from medium.com extractor
2019-08-28 07:04:27 -07:00
Michael Ashley
0686ee7956
fix: incorrect parsing on theatlantic.com ( #475 )
...
* fix: incorrect parsing on theatlantic.com
* chore: updating theatlantic.com tests & fixtures
* chore: removing script data from minified fixture
2019-08-20 09:58:24 -07:00
Michael Ashley
5e33263d25
chore: minifying biorxiv.com fixture ( #478 )
2019-08-20 09:46:15 -07:00
david0leong
911b0f87c8
Add custom extractor for biorxiv.org ( #467 )
...
* Add custom extractor for biorxiv.org
* Fix content selector
* Improve content selector
2019-08-19 13:46:03 -07:00
Ben Ubois
0942c37876
feat: custom parser for phoronix.com. ( #431 )
2019-06-26 09:55:13 -07:00
Michael P. Geraci
571a913745
feat: pitchfork extractor ( #439 )
...
* generate the custom extractor and get the first test to pass
* add the basic extractors (title, author, date, etc)
* select the score as well as the review text, and break the content test
* prepend the score to the content
* get the date from the datetime attribute
* mangle this test a little, but just a little (it does work properly)
* move from prepending the score to the review text to adding it as a custom field in the extractor
2019-06-26 09:02:17 -07:00
david0leong
694ea820aa
Custom Extractor for clinicaltrials.gov ( #305 )
...
* Add prototype of custom extractor for clinicaltrials.gov
* Add .DS_Store to gitignore
* Make tests for title, author and date_published selectors pass
* Make content selector test pass
* Fix date_published test
* Rebuild
* Remove .DS-Store from gitignore
* Improve extractor and text/fixture of clinicaltrials.gov
2019-05-27 09:25:51 +03:00
Wajeeh Zantout
7c8de71c52
fix: new yorker extractor ( #414 )
...
* fix: new yorker extractor
* fix: date_published selector
* fix: remove footer from content
* feat: add additional selector for title
* feat: support article with multiple authors
2019-05-15 11:00:50 +03:00
Wajeeh Zantout
e66ad8b81c
feat: add le monde extractor ( #415 )
2019-05-14 14:53:49 +03:00
kik0220
f81dc63617
feat: add rbbtoday.com custom parser ( #411 )
...
* feat: add rbbtoday.com custom parser
* fix: content test
* fix: dek and content
2019-05-08 14:04:03 +03:00
kik0220
5e1113b3a9
feat: add japan.zdnet.com custom parser ( #410 )
...
* feat: add japan.zdnet.com custom parser
* fix: author and date_published selector
2019-05-08 13:51:03 +03:00
kik0220
77e3bc00e2
feat: add wired.jp custom parser ( #409 )
...
* feat: add wired.jp custom parser
* fix: author test
* fix: date_published selector
* test: fix dek and contest
* test: fix content (without clean dek)
2019-05-08 13:32:04 +03:00
kik0220
0b36c96de0
feat: add techlog.iij.ad.jp custom parser ( #405 )
...
* feat: add techlog.iij.ad.jp custom parser
* fix: date_published and content selector
2019-05-08 13:20:47 +03:00
kik0220
406bf1b1a9
feat: add weekly.ascii.jp custom parser ( #401 )
...
* feat: add weekly.ascii.jp custom parser
* fix: title and date_published selector
2019-05-08 13:10:42 +03:00
kik0220
216bfade00
feat: add www.ipa.go.jp custom parser ( #408 )
2019-05-03 13:40:42 +03:00
kik0220
3ae8f3bde3
feat: add www.oreilly.co.jp custom parser ( #407 )
2019-05-03 13:30:48 +03:00
kik0220
7396e81b72
feat: add sect.iij.ad.jp custom parser ( #404 )
2019-05-03 13:19:06 +03:00
kik0220
3f1d9030ee
feat: add www.lifehacker.jp custom parser ( #403 )
2019-05-03 13:14:53 +03:00
kik0220
b077000c4a
feat: add getnews.jp custom parser ( #402 )
2019-05-03 13:10:55 +03:00
kik0220
b5425c3e8a
feat: add www.gizmodo.jp custom parser ( #400 )
2019-05-03 13:06:51 +03:00
kik0220
a38c727a0a
feat: add deadline.com custom parser ( #383 )
...
* feat: add deadline.com custom parser
* fix: timezone
* fix: date_published selectors
* fix: title and author selector
* test: transform .embed-twitter
* fix: regenerate the fixture and fix content selector
2019-04-24 15:29:02 +03:00
kik0220
74a3c49a3c
feat: add japan.cnet.com custom parser ( #382 )
...
* feat: add japan.cnet.com custom parser
* fix: remove transform
2019-04-24 14:39:54 +03:00
kik0220
7b07f88448
feat: add www.yomiuri.co.jp custom parser ( #381 )
2019-04-24 11:00:56 +03:00
kik0220
8ca2894751
feat: add bookwalker.jp custom parser ( #374 )
2019-04-15 11:06:10 +03:00
kik0220
a5f06ce27a
feat: add takagi-hiromitsu.jp custom parser ( #364 )
2019-04-12 18:11:05 +03:00
kik0220
b9c57dbc2f
feat: add www.publickey1.jp custom parser ( #365 )
...
* feat: add www.publickey1.jp custom parser
* fix: date_published selector
2019-04-12 18:00:51 +03:00
kik0220
d7dbea8a95
feat: add www.itmedia.co.jp custom parser ( #366 )
...
* feat: add www.itmedia.co.jp custom parser
* feat: add nlab.itmedia.co.jp support
* fix: title selectors
2019-04-12 17:51:16 +03:00
kik0220
9218f80da6
feat: add www.moongift.jp custom parser ( #367 )
...
* feat: add www.moongift.jp custom parser
* fix: date_published selectors
* fix: pass test
* fix: add timezone
2019-04-12 17:40:55 +03:00
kik0220
4eb73dffb0
feat: add www.infoq.com custom parser ( #368 )
...
* feat: add www.infoq.com custom parser
* fix: date_published selector
2019-04-12 17:30:46 +03:00
kik0220
ce5cd2dd0d
feat: add phpspot.org custom parser ( #369 )
...
* feat: add phpspot.org custom parser
* fix: date_published selector
2019-04-12 17:18:47 +03:00
kik0220
73be0c5a10
feat: add www.jnsa.org custom parser ( #346 )
...
* feat: add www.jnsa.org custom parser
2019-04-09 16:51:25 +03:00
Adam Pash
eacd1ee97f
feat: custom genius parser. ( #284 )
...
also adds ability to transform value returned by an attribute selector
2019-04-09 12:49:24 +03:00
kik0220
c389c966d7
feat: add jvndb.jvn.jp custom parser ( #345 )
2019-04-09 12:05:03 +03:00
kik0220
8493d05cb5
feat: add scan.netsecurity.ne.jp custom parser ( #347 )
2019-04-09 11:59:27 +03:00
kik0220
2a76c6c212
feat: add www.elecom.co.jp custom parser ( #348 )
2019-04-09 11:54:57 +03:00
kik0220
a9e010b718
feat: add www.sanwa.co.jp custom parser ( #349 )
2019-04-09 11:50:48 +03:00
kik0220
1639eae324
feat: add www.asahi.com custom parser ( #350 )
2019-04-09 11:42:14 +03:00
kik0220
21f7de70c1
feat: add buzzap.jp custom parser ( #351 )
2019-04-09 11:35:40 +03:00
kik0220
f3a7e393a3
feat: add www.ossnews.jp custom parser ( #352 )
2019-04-09 11:30:56 +03:00
kik0220
c309bdb373
feat: add otrs.com custom parser ( #353 )
2019-04-09 11:17:58 +03:00
Toufic Mouallem
3ed778b53e
fix: Adapt CNBC extractor to article redesign ( #336 )
2019-03-25 15:43:40 -07:00
Ben Ubois
a7e4c67d1d
Extract content from GitHub repos. ( #306 )
...
* Extract content from GitHub repos.
* Add published and dek.
* Timezone fix.
2019-03-14 08:48:33 -07:00
Toufic Mouallem
7844129fda
feat: Add custom parser for Reddit ( #307 )
2019-03-08 14:37:24 -08:00
Jordan Hotmann
83d1c2401b
feat: add custom extractor for blisterreview.com ( #299 )
2019-03-01 16:48:26 -08:00
kik0220
d9a1e7b22b
feat: add news.mynavi.jp custom parser ( #287 )
2019-03-01 16:45:32 -08:00
Adam Pash
9698d9a0c4
dx: comment on custom parser pr fix ( #278 )
...
* dx: comment on custom parser pr fix
* fix path
* write json
* chore: rename comment script
2019-02-28 11:11:03 -08:00
Ben Ubois
ed14203e97
fix: return early if creating the resource failed. ( #285 )
2019-02-20 16:48:51 -08:00
Ben Ubois
0e27448866
feat: Various Character Encoding Improvements ( #270 )
...
* Support HTML5 charset tag
In HTML5 `<meta charset="">` is shorthand for `<meta http-equiv="content-type" content="">`
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta
* Handle more character encoding declaration methods.
2019-02-12 15:15:19 -08:00