Commit Graph

296 Commits (4509b341e6930e35cab6ed957cae8e80b2e5b639)
 

Author SHA1 Message Date
Silas Burton 4509b341e6 feat: better cleanup of atlantic articles (#157) 8 years ago
Kevin Ngao f2e3f055c2 Fixes an issue with encoding (#154)
* fix: fixes an issue with encoding on the fetch level
8 years ago
Silas Burton 9b371e51ac Feat: gothamist extractor (#151)
* feat: gothamist extractor

* feat: add other gothamist network sites

* fix: try getting date another way

* fix: add gothamist timezone

* fix: generalize selectors

* fix: h1 is inside entry-header, needs to be specific because of another h1 on the page

* fix: general and specific selector
8 years ago
Kevin Ngao afbef9bc39 Fix Encoding on Body (#143)
* fix: check encoding on body
8 years ago
Adam Pash 9d4c883d51 release: 1.0.6 (#142) 8 years ago
Janet 93d2baf5cf feat: news.natgeo parser (#88)
* feat: natgeo parser

For some reason, the local copy of the article didn’t grab the author
name in it, so I couldn’t figure out how to parse it. The generic
parser took a name of an author of a paper mentioned in the article,
and thought that was the author name, which was funny.

I cleaned a large block quote that didn’t make sense as it was shown in
the preview, although I noticed that the Mercury chrome extension
didn’t even display it.

* fix: add date_published transform

* fix: date_published assertion

* disable: author assertion, generlize author selector

* rm: author assertion

* fix: image lead

* fix: guard agaist missing img url

* fix: generalize dek and title selectors
8 years ago
Janet 2279c2d486 feat: natgeo parser (#89)
* feat: natgeo parser

Same as the news.nationalgeographic.com parser - for some reason the
author name doesn’t appear to be getting pulled into the local copy of
the file.

* fix: content assertion

* fix: generalize author byline

* disable: author assertion

* rm: author assertion

* fix: image lead, handles image-group

* fix: guard agaist missing img url

* fix: generalize dek and title selectors
8 years ago
Adam Pash 08b5bb7ff1 feat: allow parser to define custom date formats (#141)
* feat: allow parser to define custom date formats

* feat: updating macrumors to test/verify format working correctly
8 years ago
Janet 11f466ccb3 feat: latimes parser (#92)
* feat: latimes parser
8 years ago
Kevin Ngao 26a8e4f75a feat: macrumors parser (#120)
* feat: add macrumors
8 years ago
Kevin Ngao b4fec6af98 feat: androidcentral parser (#119)
* feat: androidcentral parser
8 years ago
Janet beb0b89a4f feat: pagesix parser (#97)
* feat: pagesix parser
8 years ago
Janet f2160eb5b6 feat: si parser (#118)
* feat: si parser
8 years ago
Janet 2af0f6179a feat: rawstory parser (#109)
* feat: rawstory parser

Finished, with a little help from Frankie (thanks Frankie!)

* fix: date_published timezone
8 years ago
Janet 765032452d feat: thefederalistpapers parser (#101)
* feat: thefederalistpapers parser
8 years ago
Janet fb5eb2e104 feat: cnet parser (#104)
* feat: cnet parser

Date test fail - please take a look!

Also, image didn’t load in preview.

* fix: timezone

* fix: image lead
8 years ago
Janet 3c5fa28f10 feat: cbs sports parser (#98)
* feat: cbs sports parser
8 years ago
Janet 3cf2d0d3ef feat: msnbc parser (#100)
* feat: msnbc parser
8 years ago
Janet f9ab9eb885 feat: howtogeek extractor (#108)
* feat: howtogeek extractor

This one is a bit tricky - the author and date info appear in a comment
section at the bottom. Was able to parse the author but not the date
info. Halp

* howtogeek update

Thanks to @fdsimms I was able to parse the date, but not sure what to
test it against, so I left it blank.

* fix: date_published assertion, it was comparing against empty string

* fix: timezone

* amend: generalize author selector
8 years ago
Janet 258acdfd02 feat: opposing views parser (#103)
* feat: opposing views parser
8 years ago
Janet b63dd33579 feat: today parser (#106)
* feat: today parser

This looks fine — there are a couple of lines of “Related” but they are
within the body (and don’t have their own classes) so I couldn’t clean
them out.

* fix: fix content assertion
8 years ago
Janet c94eee7f92 feat: cinema blend parser (#105)
* feat: cinema blend parser

all systems go

* fix: timezone
8 years ago
Janet 64e3c205e8 feat: the political insider parser (#99)
* feat: the political insider parser with timezone
8 years ago
Janet 7b52d3d1fc feat: al.com parser (#110)
* feat: al.com parser

I think this is good but could you pls double check time zone on the
date? Thanks

* fix: date_published timezone
8 years ago
Janet 15df58496f feat: westernjournalism parser (#113)
* feat: westernjournalism parser

Adjacent sibling selector FTW!

Image not displaying in preview.

* feat: fix assertion, body does not include _Advertisement_ subtext
8 years ago
Janet ae12a1d701 feat: mental floss parser (#94)
* feat: mental floss parser
8 years ago
Janet bf29291395 feat: thepennyhoarder parser (#112)
* feat: thepennyhoarder parser

Looks good, although no image in preview!

* fix: adds selector for article lead image
8 years ago
Janet fadd198d04 feat: abcnewsgo parser (#90)
* feat: abcnewsgo parser
8 years ago
Adam Pash 25d9642ff9 feat: support cleaning and transforms for all fields (#138) 8 years ago
Janet 1054d854dd feat: america now parser (#114)
* feat: america now parser

Looks good but lead image did not display in preview.

* feat: adds selector for lead image
8 years ago
David A. Viramontes 7b3ad73282 Merge pull request #115 from postlight/feat-fusion-extractor
feat: fusion parser
8 years ago
dviramontes 93c8ba0e56 feat: adds selector for lead image 8 years ago
dviramontes f71fe7685d feat: adds video embed transform 8 years ago
dviramontes a77515d861 fix: author selector, less brittle 8 years ago
Janet 4c48acba59 feat: fusion parser
Looks okay — image did not load in preview.
8 years ago
David A. Viramontes fa71cacf5a Merge pull request #137 from postlight/feat-the-verge-polygon-supported-domain
feat: adds www.polygon.com to list of www.theverge.com supportedDomains
8 years ago
David A. Viramontes c679e493de Merge branch 'master' into feat-the-verge-polygon-supported-domain 8 years ago
Janet d292d8ef3a feat: ny daily news parser (#87)
* feat: ny daily news parser
8 years ago
dviramontes a53587acef feat: adds www.polygon.com to list of www.theverge.com supportedDomains 8 years ago
Janet 385b9d76a3 feat: sciencefly extractor (#116)
* feat: sciencefly extractor, use loading image rather than 404'ing meta
8 years ago
Adam Pash 601b0fac16 release: 1.0.5 (#136) 8 years ago
Adam Pash 6bd6278a07 feat: custom parser for wh blog (#130) 8 years ago
Adam Pash aa682d71e8 fix: medium bug (#129)
* fix: improved medium parser for images and multi-section content

* fix: duplicate video
8 years ago
Adam Pash 4e049de61a fix: i put a bad comment in .gitattributes (#125)
* marking html fixtures as "vendored"

* fix: bad comment
8 years ago
Adam Pash 8aa215c4c2 chore: marking html fixtures as "vendored" (#124) 8 years ago
Adam Pash 31eb4f9222 Feat: LinkedIn parser (#123)
* feat: rebuild custom parser

* feat: linkedin custom parser
8 years ago
Adam Pash dbc706410b release: 1.0.4 (#122) 8 years ago
Adam Pash 8662474d8a feat: changed user agent to latest chrome (#121)
* feat: changed user agent to latest chrome

* removed dead link
8 years ago
Janet 7709d69379 feat: npr parser (#86)
* feat: npr parser

Lead image appears in preview, but the test fails for some reason.

AssertionError: null ==
'https://media.npr.org/assets/img/2016/12/15/gettyimages-540681598_wide-
8b160732b96c083dc115134c3c019f3ac73586ba.jpg?s=1400'

Looks okay otherwise.

* feat: transformed figures/figcaptions, improved date_published and
addressed NPR's bad image metadata
8 years ago
Janet 8a82f2c0ab feat: recode parser (#85)
* feat: recode parser

Thumbs up, as far as I can tell.

Note: No image appeared in the preview.

* feat: pulling in lead image
8 years ago