Adam Pash
f9902cfa05
Fix: extension bugs ( #47 )
...
* feat: lead image on atlantic stories now included
* feat: supporting buzzfeed "longform" template
* feat: cleaning .parter-box from the atlantic
8 years ago
Adam Pash
d0453efbf8
feat: improvements for nyer magazine articles ( #45 )
...
adds dek and date_published for magazine template
8 years ago
Janet
b415d1d37c
feat: aol custom extractor ( #42 )
...
* feat: aol custom parser
* removed work from other commits. merged with latest master
8 years ago
Silas Burton
c3d98a0d76
Feat cnn extractor ( #34 )
...
* wip: cnn custom extactor
* wip: cnn works except first paragraph
* final touches on cnn parser
* cleanup
8 years ago
Silas Burton
a0570f8e94
feat: extractor for the verge ( #33 )
...
* feat: extractor for the verge's standard article template
* feat: basic support for the verge feature template
* feat: allow multiple links to be previewed
* feat: content selector arrays
Content selector arrays allow custom parsers to select multiple elements
to match and include in the result.
* feat: updated verge parser to use multimatch selectors
* lint fix
* cleanup test builds
8 years ago
Silas Burton
be2e4b5c80
Feat: huffington post extractor ( #28 )
...
* wip: huffpo custom extractor
* wip: some huffpo cleanup
8 years ago
Adam Pash
94198c0a65
feat: new republic custom extractor ( #25 )
...
* wip: new republic custom extractor
* feat: new republic article extractor
* feat: new republic minutes article extractor
8 years ago
Janet
c4d72fb735
feat: add money.cnn custom parser ( #26 )
...
* feat: add money.cnn custom parser
* added timezone to cnn custom parser
8 years ago
Adam Pash
a8face796a
Fix extension bugs ( #23 )
...
* feat: cleaning supplemental elements in nytimes (visible in web only)
closes https://github.com/postlight/mercury-reader-chrome-extension/issues/102
* wip
* fix: more generous date published bits
* feat: added washington post extractor (including figure transforms)
closes https://github.com/postlight/mercury-reader-chrome-extension/issues/100
* feat: cleaning zoom lightbox from gizmodo/kinja
* lint fix
8 years ago
Adam Pash
3a2f32b0eb
feat: added tmz custom parser ( #22 )
8 years ago
Adam Pash
7411922c55
feat: encoding response body based on content-type charset ( #21 )
...
Also some small code organization
8 years ago
Adam Pash
629eada1f7
feat: recording/playing back network requests with nock ( #18 )
...
* feat: recording/playing back network requests with nock
* lint fix
8 years ago
Adam Pash
d038a36544
feat: custom medium extractor
8 years ago
Adam Pash
40768fa188
feat: support lazy loading video on deadspin
8 years ago
Drew Bell
76db95e884
feat: Add custom extrator for Apartment Therapy
8 years ago
Drew Bell
a708ad3b4f
feat: Add custom parser for broadwayworld.com
8 years ago
Adam Pash
896021227d
feat: added deadspin custom parser
8 years ago
Toy Vano
e766494922
feat: added politico extractor
8 years ago
Toy Vano
fd1ac3f2b9
feat: added littlethings extractor
8 years ago
Toy Vano
1519eed3e5
feat: added wikia extractor
8 years ago
Toy Vano
9416ec73a4
feat: added incomplete buzzfeed extractor
8 years ago
Toy Vano
c6c35bd237
feat: added incomplete yahoo extractor
8 years ago
Toy Vano
320c740676
feat: added incomplete msn extractor
8 years ago
Toy Vano
7ecc696248
feat: added wired custom extractor
8 years ago
Adam Pash
173f885674
feat: custom parser + generator + detailed readme instructions
...
Squashed commit of the following:
commit 02563daa67712c3679258ebebac60dfa9568dffb
Author: Adam Pash <adam.pash@gmail.com>
Date: Fri Sep 30 12:25:44 2016 -0400
updated readme, added newyorker parser for readme guide
commit 0ac613ef823efbffbf4cc9a89e5cb2489d1c4f6f
Author: Adam Pash <adam.pash@gmail.com>
Date: Fri Sep 30 11:16:52 2016 -0400
feat: updated parser so the saved fixture absolutizes urls
commit 85c7a2660b21f95c2205ca4a4378a7570687fed0
Author: Adam Pash <adam.pash@gmail.com>
Date: Fri Sep 30 10:15:26 2016 -0400
refactor: attribute selectors must be an array for custom extractors
commit f60f93d5d3d9b2f2d9ec6f28d27ae9dcf16ef01e
Author: Adam Pash <adam.pash@gmail.com>
Date: Thu Sep 29 10:13:14 2016 -0400
fix: whitelisting srcset and alt attributes
commit e31cb1f4e8a9fc9c3d9b20ef9f40ca6c8d6ad51a
Author: Adam Pash <adam.pash@gmail.com>
Date: Thu Sep 29 09:44:21 2016 -0400
some housekeeping for coverage tests
commit 39eafe420c776a1fe7f9fea634fb529a3ed75a71
Author: Adam Pash <adam.pash@gmail.com>
Date: Wed Sep 28 17:52:08 2016 -0400
fix: word count for multi-page articles
commit b04e0066b52f190481b1b604c64e3d0b1226ff02
Author: Adam Pash <adam.pash@gmail.com>
Date: Thu Sep 22 10:40:23 2016 -0400
major improvements to output
commit 3f3a880b63b47fe21953485da670b6e291ac60e5
Author: Adam Pash <adam.pash@gmail.com>
Date: Wed Sep 21 17:27:53 2016 -0400
updated test command
commit 14503426557a870755453572221d95c92cff4bd2
Author: Adam Pash <adam.pash@gmail.com>
Date: Wed Sep 21 16:00:30 2016 -0400
shortened generator command
commit 5ebd8343cd4b87b3f5787dab665bff0de96846e1
Author: Adam Pash <adam.pash@gmail.com>
Date: Wed Sep 21 15:59:14 2016 -0400
feat: can disable fallback to generic parser (this will be useful for testing custom parsers)
8 years ago
Adam Pash
8f42e119e8
feat: generator for custom parsers and some documentation
...
Squashed commit of the following:
commit deaf9e60d031d9ee06e74b8c0895495b187032a5
Author: Adam Pash <adam.pash@gmail.com>
Date: Tue Sep 20 10:31:09 2016 -0400
chore: README for custom parsers
commit a8e8ad633e0d1576a52dbc90ce31b98fb2ec21ee
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Sep 19 23:36:09 2016 -0400
draft of readme
commit 4f0f463f821465c282ce006378e5d55f8f41df5f
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Sep 19 17:56:34 2016 -0400
custom extractor used to build basic parser for theatlantic
commit c5562a3cede41f56c4e723dcfa1181b49dcaae4d
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Sep 19 17:20:13 2016 -0400
pre-commit to test custom parser generator
commit 7d50d5b7ab780b79fae38afcb87a7d1da5d139b2
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Sep 19 17:19:55 2016 -0400
feat: added nytimes parser
commit 58b8d83a56927177984ddfdf70830bc4f328f200
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Sep 19 17:17:28 2016 -0400
feat: can do fuzzy search or go straight to file
commit c99add753723a8e2ac64d51d7379ac8e23125526
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Sep 19 10:52:26 2016 -0400
refactored export for custom extractors for easier renames
commit 22563413669651bb497f1bb2a92085b71f2ae324
Author: Adam Pash <adam.pash@gmail.com>
Date: Fri Sep 16 17:36:13 2016 -0400
feat: custom extractor generation in place
commit 2285a29908a7f82a5de3c81f6b2b902ddec9bdaa
Author: Adam Pash <adam.pash@gmail.com>
Date: Fri Sep 16 16:42:20 2016 -0400
good progress
8 years ago
Adam Pash
81ed4f00ed
feat: improve nymag.com extractor to grab deks from features
8 years ago
Adam Pash
62ae330db2
fix: bug in scoring and converting to paragraphs
8 years ago
Adam Pash
7ec0ed0d31
feat: nextPageUrl handles multi-page articles
...
Squashed commit of the following:
commit b5070c0967a7f1a0c0c449ba7ea40aebe8fe4bb8
Author: Adam Pash <adam.pash@gmail.com>
Date: Tue Sep 13 10:03:00 2016 -0400
root extractor includes next page url
commit 79be83127d5342d89eef33665586fabea227d6b3
Author: Adam Pash <adam.pash@gmail.com>
Date: Tue Sep 13 09:58:20 2016 -0400
small score adjustment
commit 0f00507dbff43401145a892e849311518edec68a
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Sep 12 18:17:38 2016 -0400
feat: nextPageUrl generic parser up and running
commit be91c589fc0c6d6f9b573080a76c9b1ac7af710c
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Sep 12 11:53:58 2016 -0400
feat: pageNumFromUrl extracts the pagenum of the current url
commit ad879d7aabedadfd051c01b42d841703bf4763fa
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Sep 12 11:52:37 2016 -0400
feat: isWordpress checks if a page is generated by wordpress
8 years ago
Adam Pash
74694ba8e2
debugging: cheerio isn't always consistent in setting scores
8 years ago
Adam Pash
45ef18ba37
fix: brought .html fixtures into project dir
8 years ago