Commit Graph

66 Commits

Author SHA1 Message Date
Janet
4c48acba59 feat: fusion parser
Looks okay — image did not load in preview.
2017-02-02 10:54:49 -07:00
Janet
d292d8ef3a feat: ny daily news parser (#87)
* feat: ny daily news parser
2017-02-02 12:30:16 -05:00
Janet
385b9d76a3 feat: sciencefly extractor (#116)
* feat: sciencefly extractor, use loading image rather than 404'ing meta
2017-02-02 11:26:29 -05:00
Adam Pash
6bd6278a07 feat: custom parser for wh blog (#130) 2017-01-31 15:50:39 -08:00
Adam Pash
aa682d71e8 fix: medium bug (#129)
* fix: improved medium parser for images and multi-section content

* fix: duplicate video
2017-01-31 15:28:25 -08:00
Adam Pash
31eb4f9222 Feat: LinkedIn parser (#123)
* feat: rebuild custom parser

* feat: linkedin custom parser
2017-01-26 10:11:10 -08:00
Janet
7709d69379 feat: npr parser (#86)
* feat: npr parser

Lead image appears in preview, but the test fails for some reason.

AssertionError: null ==
'https://media.npr.org/assets/img/2016/12/15/gettyimages-540681598_wide-
8b160732b96c083dc115134c3c019f3ac73586ba.jpg?s=1400'

Looks okay otherwise.

* feat: transformed figures/figcaptions, improved date_published and
addressed NPR's bad image metadata
2017-01-23 17:23:02 -08:00
Janet
8a82f2c0ab feat: recode parser (#85)
* feat: recode parser

Thumbs up, as far as I can tell.

Note: No image appeared in the preview.

* feat: pulling in lead image
2017-01-23 17:02:33 -08:00
Janet
ad29acd7b7 feat: fortune parser (#84)
* feat: fortune parser

For some reason, the dek doesn’t appear in the local version of the
article I selected. I tried parsing the meta tag containing
og:description but it’s not working, and the description is slightly
longer than the dek in the original article.

I’m not sure why, but for the lead image, the meta tag for og:image is
not parsing the image url.

:(

* feat: fortune redesigned, so re-did extractor

* fix: added timezone
2017-01-23 16:47:06 -08:00
Janet
c133ddf614 feat: qz parser (#81)
* feat: qz parser

I couldn’t figure out how to parse the date, but otherwise should be
fine. I added a clean for the div.article-aside element based on what I
saw in how the chrome extension worked.

* feat: updated content to grab top image

test: date is null :/
2017-01-23 16:08:07 -08:00
Janet
84312b6ef1 feat: dmagazine parser (#80)
* feat: dmagazine parser

I’m sorry to have failed you. :-( These are the issues I encountered:

1) author - does not have a unique selector to distinguish it from the
date, couldn’t parse it
2) date - no meta data in the head
3) no meta og:image in the head (my go to), so I couldn’t get the image
test to pass, but it appears to be parsing. The caption below it is the
same size as the body copy in the preview. I couldn’t figure out how to
“transform” it to caption size.

* feat: update date, image, and author selectors and corresponding tests

* feat: generalized content selector
2017-01-23 15:52:05 -08:00
Janet
e035f36361 feat: reuters parser (#78)
* feat: reuters parser

Date parses correctly but fails test because of format discrepancy.

Author tags are nested within the content, which is why the author
names are appearing twice. I wasn’t sure how to address this.

Additionally, the location appears twice, so I cleaned the location
tags from the content.

* test: fix date format

* transform .article-subtitle to h4; cleaning author but leaving location
2017-01-23 15:16:37 -08:00
Janet
dec49ab073 feat: mashable parser (#76)
* feat: mashable parser

As usual the date is giving me issues because of formatting
discrepancies:
AssertionError: '2016-12-13T22:33:06.000Z' == '2016-12-14T03:33:06.000Z'

Not sure how we wanna deal with Twitter card embeds that don’t show up?

Also, image credits did not show up in preview.

* test: fixed date format

* transforming .image-credit to figcaption
2017-01-23 15:00:18 -08:00
Janet
cddc1afb69 feat: chicago tribune parser (#75)
* feat: chicago tribune parser

Date is parsing but failing the test because:
AssertionError: '2016-12-13T21:45:00.000Z' == '2016-12-13T13:45:00-0800'

I tried to insert a line of code for Time Zone but I’m a n00b so I
don’t think I did it right.

No image showing up in the preview.

* fix: remove timezone from date_published extractor

* test: update unit tests to assert the correct value for date_published
2017-01-22 12:18:10 -05:00
Janet
aff651c2d8 feat: hellogiggles parser (#107)
Looks good to me!
2017-01-21 14:07:20 -05:00
Janet
11ad7b9a92 feat: thought catalog parser (#102)
Looks good!
2017-01-21 13:52:00 -05:00
Janet
aa43a6091c feat: cnbc parser (#96)
Should be good to go!
2017-01-21 13:25:23 -05:00
Janet
cd245f7980 feat: popsugar parser (#93)
I think this one is good to go!
2017-01-21 13:11:00 -05:00
Janet
a8ab7135e1 feat: observer parser (#91)
no problems
2017-01-21 12:47:26 -05:00
Janet
3bee7224cb feat: nbc news parser (#74) 2017-01-18 17:28:21 -08:00
Janet
88242dd233 feat: nj.com parser (#73) 2017-01-18 16:49:05 -08:00
Janet
1ac5670a54 feat: inquisitor parser (#72) 2017-01-18 16:34:22 -08:00
Janet
9e5b91ed8b feat: refinery29 parser (#71) 2016-12-21 21:57:13 -08:00
Janet
b78c58c43a feat: miami herald parser (#69) 2016-12-21 21:35:34 -08:00
Janet
aedf83edc6 feat: eonline parser (#68) 2016-12-21 21:24:14 -08:00
Janet
a20da5eb31 uproxx extractor (#66) 2016-12-21 21:05:10 -08:00
Janet
87c42b6358 feat: 247sports.com extractor (#64) 2016-12-21 20:52:23 -08:00
Janet
22e6c884fb feat: rolling stone extractor (#65) 2016-12-21 20:30:34 -08:00
Janet
6337231697 feat: usmagazine extractor (#63) 2016-12-21 20:06:47 -08:00
Janet
c06b19efe7 feat: people extractor (#70)
No major problems!
2016-12-21 19:46:48 -08:00
Janet
3cf2bb78c4 feat: vox custom parser (#67) 2016-12-15 17:48:15 -08:00
Janet
861c5f0dcb feat: bustle extractor (#60) 2016-12-08 15:32:08 -05:00
Adam Pash
3297ab079d feat: bloomberg extractor (#59)
Bloomberg has several templates. I'm supporting three different templates here, but I'm not sure that this is complete by any means.

It's also worth noting that SVGs don't make it through the parser terribly well for many reasons. One, for example, is that a lot of SVGs require custom CSS in order for them to make sense. I'm not sure this is something we can expect to address in the parser.
2016-12-07 14:39:00 -05:00
Janet
e55e9da534 feat: sbnation extractor (#55) 2016-12-07 14:25:57 -05:00
Adam Pash
81aa89f2c1 feat: youtube custom extractor (#53) 2016-12-06 12:36:51 -05:00
Adam Pash
f9902cfa05 Fix: extension bugs (#47)
* feat: lead image on atlantic stories now included

* feat: supporting buzzfeed "longform" template

* feat: cleaning .parter-box from the atlantic
2016-12-02 16:02:00 -08:00
Adam Pash
d0453efbf8 feat: improvements for nyer magazine articles (#45)
adds dek and date_published for magazine template
2016-12-02 15:30:09 -08:00
Janet
b415d1d37c feat: aol custom extractor (#42)
* feat: aol custom parser

* removed work from other commits. merged with latest master
2016-12-01 17:05:15 -08:00
Silas Burton
c3d98a0d76 Feat cnn extractor (#34)
* wip: cnn custom extactor

* wip: cnn works except first paragraph

* final touches on cnn parser

* cleanup
2016-11-30 14:55:04 -08:00
Silas Burton
a0570f8e94 feat: extractor for the verge (#33)
* feat: extractor for the verge's standard article template

* feat: basic support for the verge feature template

* feat: allow multiple links to be previewed

* feat: content selector arrays

Content selector arrays allow custom parsers to select multiple elements
to match and include in the result.

* feat: updated verge parser to use multimatch selectors

* lint fix

* cleanup test builds
2016-11-30 14:08:56 -08:00
Silas Burton
be2e4b5c80 Feat: huffington post extractor (#28)
* wip: huffpo custom extractor

* wip: some huffpo cleanup
2016-11-29 15:50:48 -08:00
Adam Pash
94198c0a65 feat: new republic custom extractor (#25)
* wip: new republic custom extractor

* feat: new republic article extractor

* feat: new republic minutes article extractor
2016-11-29 15:30:52 -08:00
Janet
c4d72fb735 feat: add money.cnn custom parser (#26)
* feat: add money.cnn custom parser

* added timezone to cnn custom parser
2016-11-29 15:13:29 -08:00
Adam Pash
a8face796a Fix extension bugs (#23)
* feat: cleaning supplemental elements in nytimes (visible in web only)

closes https://github.com/postlight/mercury-reader-chrome-extension/issues/102

* wip

* fix: more generous date published bits

* feat: added washington post extractor (including figure transforms)

closes https://github.com/postlight/mercury-reader-chrome-extension/issues/100

* feat: cleaning zoom lightbox from gizmodo/kinja

* lint fix
2016-11-28 16:58:21 -08:00
Adam Pash
3a2f32b0eb feat: added tmz custom parser (#22) 2016-11-28 15:10:28 -08:00
Adam Pash
7411922c55 feat: encoding response body based on content-type charset (#21)
Also some small code organization
2016-11-22 10:44:27 -08:00
Adam Pash
629eada1f7 feat: recording/playing back network requests with nock (#18)
* feat: recording/playing back network requests with nock

* lint fix
2016-10-28 14:54:12 -07:00
Adam Pash
d038a36544 feat: custom medium extractor 2016-10-27 08:47:25 -07:00
Adam Pash
40768fa188 feat: support lazy loading video on deadspin 2016-10-26 11:53:42 -07:00
Drew Bell
76db95e884 feat: Add custom extrator for Apartment Therapy 2016-10-17 10:35:22 -05:00