Janet
cddc1afb69
feat: chicago tribune parser ( #75 )
...
* feat: chicago tribune parser
Date is parsing but failing the test because:
AssertionError: '2016-12-13T21:45:00.000Z' == '2016-12-13T13:45:00-0800'
I tried to insert a line of code for Time Zone but I’m a n00b so I
don’t think I did it right.
No image showing up in the preview.
* fix: remove timezone from date_published extractor
* test: update unit tests to assert the correct value for date_published
2017-01-22 12:18:10 -05:00
Janet
aff651c2d8
feat: hellogiggles parser ( #107 )
...
Looks good to me!
2017-01-21 14:07:20 -05:00
Janet
11ad7b9a92
feat: thought catalog parser ( #102 )
...
Looks good!
2017-01-21 13:52:00 -05:00
Janet
aa43a6091c
feat: cnbc parser ( #96 )
...
Should be good to go!
2017-01-21 13:25:23 -05:00
Janet
cd245f7980
feat: popsugar parser ( #93 )
...
I think this one is good to go!
2017-01-21 13:11:00 -05:00
Janet
a8ab7135e1
feat: observer parser ( #91 )
...
no problems
2017-01-21 12:47:26 -05:00
Janet
3bee7224cb
feat: nbc news parser ( #74 )
2017-01-18 17:28:21 -08:00
Janet
88242dd233
feat: nj.com parser ( #73 )
2017-01-18 16:49:05 -08:00
Janet
1ac5670a54
feat: inquisitor parser ( #72 )
2017-01-18 16:34:22 -08:00
Janet
9e5b91ed8b
feat: refinery29 parser ( #71 )
2016-12-21 21:57:13 -08:00
Janet
b78c58c43a
feat: miami herald parser ( #69 )
2016-12-21 21:35:34 -08:00
Janet
aedf83edc6
feat: eonline parser ( #68 )
2016-12-21 21:24:14 -08:00
Janet
a20da5eb31
uproxx extractor ( #66 )
2016-12-21 21:05:10 -08:00
Janet
87c42b6358
feat: 247sports.com extractor ( #64 )
2016-12-21 20:52:23 -08:00
Janet
22e6c884fb
feat: rolling stone extractor ( #65 )
2016-12-21 20:30:34 -08:00
Janet
6337231697
feat: usmagazine extractor ( #63 )
2016-12-21 20:06:47 -08:00
Janet
c06b19efe7
feat: people extractor ( #70 )
...
No major problems!
2016-12-21 19:46:48 -08:00
Janet
3cf2bb78c4
feat: vox custom parser ( #67 )
2016-12-15 17:48:15 -08:00
Adam Pash
a710efd2d5
release: 1.0.3 ( #62 )
2016-12-09 12:15:40 -05:00
Janet
861c5f0dcb
feat: bustle extractor ( #60 )
2016-12-08 15:32:08 -05:00
Adam Pash
06397a4360
feat: browser-friendly selector for medium ( #61 )
2016-12-07 17:58:29 -05:00
Adam Pash
3297ab079d
feat: bloomberg extractor ( #59 )
...
Bloomberg has several templates. I'm supporting three different templates here, but I'm not sure that this is complete by any means.
It's also worth noting that SVGs don't make it through the parser terribly well for many reasons. One, for example, is that a lot of SVGs require custom CSS in order for them to make sense. I'm not sure this is something we can expect to address in the parser.
2016-12-07 14:39:00 -05:00
Janet
e55e9da534
feat: sbnation extractor ( #55 )
2016-12-07 14:25:57 -05:00
Adam Pash
8070e4790b
test: streamlined guardian tests w/new single-extraction ( #58 )
2016-12-07 13:17:25 -05:00
Adam Pash
bdb751fb53
feat: more cleaning for wired ( #56 )
2016-12-07 12:15:39 -05:00
Janet
e7e41bd242
feat: the guardian custom extractor ( #41 )
2016-12-07 12:05:18 -05:00
Adam Pash
332f85928f
release: 1.0.2 ( #54 )
2016-12-06 14:51:01 -05:00
Adam Pash
81aa89f2c1
feat: youtube custom extractor ( #53 )
2016-12-06 12:36:51 -05:00
Adam Pash
2fb47640f2
Feat: detect platforms ( #52 )
...
Detectors for matching extractors for publishing platforms. Currently supporting Medium and Blogger.
2016-12-06 12:17:03 -05:00
Adam Pash
64c0fad2fd
fix: preserve whitespace ( #51 )
...
No longer normalizing whitespace in html
2016-12-06 11:31:50 -05:00
Adam Pash
15656cb3e1
Refactor: running tests more efficiently ( #49 )
...
Only running one parser per page we're testing rather than a parser per field we're testing.
2016-12-05 15:39:45 -05:00
Adam Pash
edcb7295d1
release: 1.0.1 ( #48 )
2016-12-02 16:14:07 -08:00
Adam Pash
f9902cfa05
Fix: extension bugs ( #47 )
...
* feat: lead image on atlantic stories now included
* feat: supporting buzzfeed "longform" template
* feat: cleaning .parter-box from the atlantic
2016-12-02 16:02:00 -08:00
Adam Pash
16860f1d85
feat: improved nyt parser ( #46 )
...
NYT was one of the first, and its test was stale and it didn't have all
of its fields well defined.
2016-12-02 15:41:26 -08:00
Adam Pash
d0453efbf8
feat: improvements for nyer magazine articles ( #45 )
...
adds dek and date_published for magazine template
2016-12-02 15:30:09 -08:00
Adam Pash
00f8965c1f
fix: cleaning up deks ( #44 )
...
We've solidified what we consider a dek. This PR removes the dek selectors that do not fit that mold.
2016-12-02 15:17:49 -08:00
Janet
b415d1d37c
feat: aol custom extractor ( #42 )
...
* feat: aol custom parser
* removed work from other commits. merged with latest master
2016-12-01 17:05:15 -08:00
Matt
4cc3b68b5e
feat: remove footer links ( #40 )
...
the links at the bottom of the stories feel a little spammy because of how we treat links vs. the way they are displayed on the Times, would like to clean them
2016-12-01 08:31:43 -08:00
Adam Pash
e9a36d6ebd
release: 1.0.0 so we can start doing proper releaes ( #39 )
2016-11-30 17:49:50 -08:00
Adam Pash
ff1963bdca
feat: new cleaner for wapo ( #38 )
2016-11-30 17:01:53 -08:00
Adam Pash
0e6ccdf622
fix: browser cleanup ( #35 )
...
Cleaning up after the parser when it's done in the browser, before
returning result.
2016-11-30 16:49:18 -08:00
Adam Pash
bd0694fbba
feat: preview with optional rebuild ( #36 )
...
Now the preview script has an optional build step. Adding --no-rebuild
as an argument to the script will skip the rebuild step and just show a
preview of the parse as is with the current build.
2016-11-30 16:37:42 -08:00
Adam Pash
181b39b238
feat: ci speedup ( #37 )
...
minor speedup to see failing tests. linting happens first
2016-11-30 16:11:35 -08:00
Silas Burton
c3d98a0d76
Feat cnn extractor ( #34 )
...
* wip: cnn custom extactor
* wip: cnn works except first paragraph
* final touches on cnn parser
* cleanup
2016-11-30 14:55:04 -08:00
Silas Burton
a0570f8e94
feat: extractor for the verge ( #33 )
...
* feat: extractor for the verge's standard article template
* feat: basic support for the verge feature template
* feat: allow multiple links to be previewed
* feat: content selector arrays
Content selector arrays allow custom parsers to select multiple elements
to match and include in the result.
* feat: updated verge parser to use multimatch selectors
* lint fix
* cleanup test builds
2016-11-30 14:08:56 -08:00
Adam Pash
233ca11a33
fix: added timezone to new republic date ( #32 )
2016-11-29 16:54:52 -08:00
Adam Pash
cfe7f34be4
fix: normalizing spaces for authors/dek/title ( #31 )
...
* fix: normalizing spaces for authors/dek/title
2016-11-29 16:43:46 -08:00
Adam Pash
9a23b24a89
feat: adjustment for huffpo. skipping overly aggressive default cleaners ( #30 )
2016-11-29 16:16:39 -08:00
Silas Burton
be2e4b5c80
Feat: huffington post extractor ( #28 )
...
* wip: huffpo custom extractor
* wip: some huffpo cleanup
2016-11-29 15:50:48 -08:00
Adam Pash
94198c0a65
feat: new republic custom extractor ( #25 )
...
* wip: new republic custom extractor
* feat: new republic article extractor
* feat: new republic minutes article extractor
2016-11-29 15:30:52 -08:00