* feat: npr parser
Lead image appears in preview, but the test fails for some reason.
AssertionError: null ==
'https://media.npr.org/assets/img/2016/12/15/gettyimages-540681598_wide-
8b160732b96c083dc115134c3c019f3ac73586ba.jpg?s=1400'
Looks okay otherwise.
* feat: transformed figures/figcaptions, improved date_published and
addressed NPR's bad image metadata
* feat: fortune parser
For some reason, the dek doesn’t appear in the local version of the
article I selected. I tried parsing the meta tag containing
og:description but it’s not working, and the description is slightly
longer than the dek in the original article.
I’m not sure why, but for the lead image, the meta tag for og:image is
not parsing the image url.
:(
* feat: fortune redesigned, so re-did extractor
* fix: added timezone
* feat: qz parser
I couldn’t figure out how to parse the date, but otherwise should be
fine. I added a clean for the div.article-aside element based on what I
saw in how the chrome extension worked.
* feat: updated content to grab top image
test: date is null :/
* feat: dmagazine parser
I’m sorry to have failed you. :-( These are the issues I encountered:
1) author - does not have a unique selector to distinguish it from the
date, couldn’t parse it
2) date - no meta data in the head
3) no meta og:image in the head (my go to), so I couldn’t get the image
test to pass, but it appears to be parsing. The caption below it is the
same size as the body copy in the preview. I couldn’t figure out how to
“transform” it to caption size.
* feat: update date, image, and author selectors and corresponding tests
* feat: generalized content selector
* feat: reuters parser
Date parses correctly but fails test because of format discrepancy.
Author tags are nested within the content, which is why the author
names are appearing twice. I wasn’t sure how to address this.
Additionally, the location appears twice, so I cleaned the location
tags from the content.
* test: fix date format
* transform .article-subtitle to h4; cleaning author but leaving location
* feat: mashable parser
As usual the date is giving me issues because of formatting
discrepancies:
AssertionError: '2016-12-13T22:33:06.000Z' == '2016-12-14T03:33:06.000Z'
Not sure how we wanna deal with Twitter card embeds that don’t show up?
Also, image credits did not show up in preview.
* test: fixed date format
* transforming .image-credit to figcaption
* feat: chicago tribune parser
Date is parsing but failing the test because:
AssertionError: '2016-12-13T21:45:00.000Z' == '2016-12-13T13:45:00-0800'
I tried to insert a line of code for Time Zone but I’m a n00b so I
don’t think I did it right.
No image showing up in the preview.
* fix: remove timezone from date_published extractor
* test: update unit tests to assert the correct value for date_published
Bloomberg has several templates. I'm supporting three different templates here, but I'm not sure that this is complete by any means.
It's also worth noting that SVGs don't make it through the parser terribly well for many reasons. One, for example, is that a lot of SVGs require custom CSS in order for them to make sense. I'm not sure this is something we can expect to address in the parser.
the links at the bottom of the stories feel a little spammy because of how we treat links vs. the way they are displayed on the Times, would like to clean them
* feat: extractor for the verge's standard article template
* feat: basic support for the verge feature template
* feat: allow multiple links to be previewed
* feat: content selector arrays
Content selector arrays allow custom parsers to select multiple elements
to match and include in the result.
* feat: updated verge parser to use multimatch selectors
* lint fix
* cleanup test builds
Big undertaking to support Mercury in the browser. Builds are working and all tests are passing both for web and node builds. Most code is closely shared.
Squashed commit of the following:
commit 0ee7d51ce609ad23d2deca1af41e7b4e56681bd7
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 15:44:28 2016 -0700
feat: dek does not return if it's basically the same as the excerpt
commit 6ad27f994fff3652e04ffe7c81f1ae0b1647e941
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 14:35:54 2016 -0700
feat: added excerpt util
Squashed commit of the following:
commit 9638220124a325322d6cda7d16c645185d5fe827
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 11:02:29 2016 -0700
fix: removed eslint plugin that was adding unneded async parens
commit ce2268c0f7c1b093c06f156730a0f1bc2aaba39c
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 10:47:36 2016 -0700
style: fix async in parens
commit 9591856915eddaf93170da1ce9225b8a378bdf55
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 10:37:11 2016 -0700
fix: remove parens around async
commit 6c56054717acc1f7e5499691780f8273f6d07bac
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 10:35:50 2016 -0700
fix msn fixture; adjusted yahoo test
commit 4fc117ad5fdc5528f29b0873d60a6a1709642f15
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 10:14:38 2016 -0700
removed dek and date_publised tests; neither exist in littlethings
commit 401094b4abc52901255fd2461f5839624f11d8a3
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 10:08:44 2016 -0700
feat: updated buzzfeed for content extraction
commit 19548a5485f70ff9b65e3e725d2364d07734ac9c
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 09:54:30 2016 -0700
fix: generator should make transforms an object, not array
commit b92113f9f7c97aca9e6d3ce9243abac967d26b63
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 08:54:38 2016 -0700
feat: updated politico
commit c026591040f7671cb2a6dd5177a995e21d015482
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 08:48:52 2016 -0700
fix: typos
commit 14aa8fa4ce38ff1c2a212cd0225437ae3042c2c3
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 08:36:12 2016 -0700
fix: incorrect command in readme
commit fe260e6122877e2cb0130a1ecde0e503017057a3
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 08:31:11 2016 -0700
fix: removed dek test because there is no dek on wikia