Commit Graph

376 Commits (a57f29eec37cb96b9cb79b32f60e984a47279bf3)
 

Author SHA1 Message Date
Janet 6337231697 feat: usmagazine extractor (#63) 8 years ago
Janet c06b19efe7 feat: people extractor (#70)
No major problems!
8 years ago
Janet 3cf2bb78c4 feat: vox custom parser (#67) 8 years ago
Adam Pash a710efd2d5 release: 1.0.3 (#62) 8 years ago
Janet 861c5f0dcb feat: bustle extractor (#60) 8 years ago
Adam Pash 06397a4360 feat: browser-friendly selector for medium (#61) 8 years ago
Adam Pash 3297ab079d feat: bloomberg extractor (#59)
Bloomberg has several templates. I'm supporting three different templates here, but I'm not sure that this is complete by any means.

It's also worth noting that SVGs don't make it through the parser terribly well for many reasons. One, for example, is that a lot of SVGs require custom CSS in order for them to make sense. I'm not sure this is something we can expect to address in the parser.
8 years ago
Janet e55e9da534 feat: sbnation extractor (#55) 8 years ago
Adam Pash 8070e4790b test: streamlined guardian tests w/new single-extraction (#58) 8 years ago
Adam Pash bdb751fb53 feat: more cleaning for wired (#56) 8 years ago
Janet e7e41bd242 feat: the guardian custom extractor (#41) 8 years ago
Adam Pash 332f85928f release: 1.0.2 (#54) 8 years ago
Adam Pash 81aa89f2c1 feat: youtube custom extractor (#53) 8 years ago
Adam Pash 2fb47640f2 Feat: detect platforms (#52)
Detectors for matching extractors for publishing platforms. Currently supporting Medium and Blogger.
8 years ago
Adam Pash 64c0fad2fd fix: preserve whitespace (#51)
No longer normalizing whitespace in html
8 years ago
Adam Pash 15656cb3e1 Refactor: running tests more efficiently (#49)
Only running one parser per page we're testing rather than a parser per field we're testing.
8 years ago
Adam Pash edcb7295d1 release: 1.0.1 (#48) 8 years ago
Adam Pash f9902cfa05 Fix: extension bugs (#47)
* feat: lead image on atlantic stories now included

* feat: supporting buzzfeed "longform" template

* feat: cleaning .parter-box from the atlantic
8 years ago
Adam Pash 16860f1d85 feat: improved nyt parser (#46)
NYT was one of the first, and its test was stale and it didn't have all
of its fields well defined.
8 years ago
Adam Pash d0453efbf8 feat: improvements for nyer magazine articles (#45)
adds dek and date_published for magazine template
8 years ago
Adam Pash 00f8965c1f fix: cleaning up deks (#44)
We've solidified what we consider a dek. This PR removes the dek selectors that do not fit that mold.
8 years ago
Janet b415d1d37c feat: aol custom extractor (#42)
* feat: aol custom parser

* removed work from other commits. merged with latest master
8 years ago
Matt 4cc3b68b5e feat: remove footer links (#40)
the links at the bottom of the stories feel a little spammy because of how we treat links vs. the way they are displayed on the Times, would like to clean them
8 years ago
Adam Pash e9a36d6ebd release: 1.0.0 so we can start doing proper releaes (#39) 8 years ago
Adam Pash ff1963bdca feat: new cleaner for wapo (#38) 8 years ago
Adam Pash 0e6ccdf622 fix: browser cleanup (#35)
Cleaning up after the parser when it's done in the browser, before
returning result.
8 years ago
Adam Pash bd0694fbba feat: preview with optional rebuild (#36)
Now the preview script has an optional build step. Adding --no-rebuild
as an argument to the script will skip the rebuild step and just show a
preview of the parse as is with the current build.
8 years ago
Adam Pash 181b39b238 feat: ci speedup (#37)
minor speedup to see failing tests. linting happens first
8 years ago
Silas Burton c3d98a0d76 Feat cnn extractor (#34)
* wip: cnn custom extactor

* wip: cnn works except first paragraph

* final touches on cnn parser

* cleanup
8 years ago
Silas Burton a0570f8e94 feat: extractor for the verge (#33)
* feat: extractor for the verge's standard article template

* feat: basic support for the verge feature template

* feat: allow multiple links to be previewed

* feat: content selector arrays

Content selector arrays allow custom parsers to select multiple elements
to match and include in the result.

* feat: updated verge parser to use multimatch selectors

* lint fix

* cleanup test builds
8 years ago
Adam Pash 233ca11a33 fix: added timezone to new republic date (#32) 8 years ago
Adam Pash cfe7f34be4 fix: normalizing spaces for authors/dek/title (#31)
* fix: normalizing spaces for authors/dek/title
8 years ago
Adam Pash 9a23b24a89 feat: adjustment for huffpo. skipping overly aggressive default cleaners (#30) 8 years ago
Silas Burton be2e4b5c80 Feat: huffington post extractor (#28)
* wip: huffpo custom extractor

* wip: some huffpo cleanup
8 years ago
Adam Pash 94198c0a65 feat: new republic custom extractor (#25)
* wip: new republic custom extractor

* feat: new republic article extractor

* feat: new republic minutes article extractor
8 years ago
Janet c4d72fb735 feat: add money.cnn custom parser (#26)
* feat: add money.cnn custom parser

* added timezone to cnn custom parser
8 years ago
Adam Pash 6343946dd8 Feat: custom timezones (#29)
* using moment-timezone to allow custom timezones

* added tz to tmz, even though still so-so
8 years ago
Adam Pash 19e7345bfb feat: test builds are created for preview purposes so we aren't committing dist every time (#27) 8 years ago
Adam Pash a8face796a Fix extension bugs (#23)
* feat: cleaning supplemental elements in nytimes (visible in web only)

closes https://github.com/postlight/mercury-reader-chrome-extension/issues/102

* wip

* fix: more generous date published bits

* feat: added washington post extractor (including figure transforms)

closes https://github.com/postlight/mercury-reader-chrome-extension/issues/100

* feat: cleaning zoom lightbox from gizmodo/kinja

* lint fix
8 years ago
Adam Pash 3a2f32b0eb feat: added tmz custom parser (#22) 8 years ago
Adam Pash 783a9cfb2f fix: changed overly liberal regex for removing transparent images 8 years ago
Adam Pash 7411922c55 feat: encoding response body based on content-type charset (#21)
Also some small code organization
8 years ago
Adam Pash 88c125d022 chore: package upgrades 8 years ago
Adam Pash c30fb2e4c0 chore: updated readme 8 years ago
Adam Pash 60a6861e18 Feat: browser support (#19)
Big undertaking to support Mercury in the browser. Builds are working and all tests are passing both for web and node builds. Most code is closely shared.
8 years ago
Adam Pash eaea57461a fix: servers returning bad headers was breaking request. temporarily (#20)
using fork with a fix for this until request merges the necessary pull request
8 years ago
Adam Pash 629eada1f7 feat: recording/playing back network requests with nock (#18)
* feat: recording/playing back network requests with nock

* lint fix
8 years ago
Adam Pash 6e29848e9c feat: making yarn-friendly for package manager (#17)
* updated several commands; some fixes exposed by yarn upgrade

* removed unnec dep
8 years ago
Adam Pash e325d860fd Feat: improving ci (#16)
This commit also swaps in yarn for npm and tweaks circle ci a bit.

* appveyor.yml first go

* changing node

* ps

* narrow it down

* trying this

* fix airbnb module

* trying with yarn

* logging

* hybrid?

* trying yarn w/circle

* bump workers?

* build off?

* updating script

* tweaking script for appveyor

* bumping maxworkers

* cleaning up

* build step?

* yarn it

* added appveyor badge
8 years ago
Adam Pash 071218ab3c chore: added repo 8 years ago