Adam Pash
cbd0636dcf
chore: cleaned up python and other unneeded comments
2016-09-16 11:21:23 -04:00
Adam Pash
bf13b38a9b
feat: some basic error handling for bad urls
2016-09-15 17:41:29 -04:00
Adam Pash
9f0c075de4
Merge pull request #3 from postlight/fix-date-not-local
...
fix: some improvements to date parsing. punting on localization issues
2016-09-15 16:59:31 -04:00
Adam Pash
ffaf7db0f1
fix: some improvements to date parsing. punting on localization issues
2016-09-15 16:57:14 -04:00
Adam Pash
396313aeae
feat: added twitter custom extractor
...
Squashed commit of the following:
commit 8116f14364869b72a8afabfcb44b2ac154caed96
Author: Adam Pash <adam.pash@gmail.com>
Date: Thu Sep 15 16:27:27 2016 -0400
feat: added twitter custom extractor
commit e478eb1b0bcdcb65fdd5fa64e37be92b6defd702
Author: Adam Pash <adam.pash@gmail.com>
Date: Thu Sep 15 16:22:54 2016 -0400
fix: made custom extractors and cleaners adhere to underscore keys
2016-09-15 16:27:46 -04:00
Adam Pash
d60d396c98
feat: added text direction to response
2016-09-15 15:08:04 -04:00
Adam Pash
f0f216c7b9
feat: add option to allow custom extractors to skip default cleaners
2016-09-15 14:50:51 -04:00
Adam Pash
97a0728ecf
test: added sanity test for get-extractor
2016-09-15 14:33:06 -04:00
Adam Pash
7c375aded7
chore: cleanup
2016-09-15 14:29:14 -04:00
Adam Pash
4cdc4165d6
fix: encodeURI before fetching
2016-09-15 14:25:22 -04:00
Adam Pash
1343469b6c
fix: explicit/better decoding of gzipped content
2016-09-15 12:39:54 -04:00
Adam Pash
7638c15077
push new build for testing
2016-09-15 12:22:38 -04:00
Adam Pash
c338098f21
refactor: renamed child to sibling for clarity
2016-09-15 12:19:33 -04:00
Adam Pash
6263e505d5
fix: handling case where node.get(0) returns null
2016-09-15 12:17:25 -04:00
Adam Pash
2bf274114f
chore: disable camelcase for linting
2016-09-14 16:00:36 -04:00
Adam Pash
3b36a33e36
chore: change result keys to match python api
2016-09-14 15:53:02 -04:00
Adam Pash
cc060b794d
fix: wordcount calling excerpt
2016-09-14 15:12:05 -04:00
Adam Pash
7fc1f7f6bb
checking in dist
2016-09-14 15:10:03 -04:00
Adam Pash
c76435ce62
updated name in package.json
2016-09-14 15:06:54 -04:00
Adam Pash
f1cff0b435
chore: removed TODO.md
2016-09-14 15:00:56 -04:00
Adam Pash
daa9266182
feat: generic extractor for word count
...
Squashed commit of the following:
commit 0aba26ef9efba71a72c76fa351a9037e97fc1e9e
Author: Adam Pash <adam.pash@gmail.com>
Date: Wed Sep 14 14:56:45 2016 -0400
fix: normalizeSpaces regex fix broke a test
commit 07d60c1c8c6599d6c94d92e5a70649c28d03d6ea
Author: Adam Pash <adam.pash@gmail.com>
Date: Wed Sep 14 14:52:41 2016 -0400
feat: generic extractor for word count
2016-09-14 14:58:08 -04:00
Adam Pash
76df30e303
chore: cleanup
2016-09-14 14:28:45 -04:00
Adam Pash
b3481a2c45
feat: generic excerpt extraction
2016-09-14 14:13:59 -04:00
Adam Pash
457075889d
fix: selection should not be empty
2016-09-14 13:13:26 -04:00
Adam Pash
81ed4f00ed
feat: improve nymag.com extractor to grab deks from features
2016-09-14 13:12:40 -04:00
Adam Pash
21f444367f
feat: added page counts
2016-09-14 12:21:32 -04:00
Adam Pash
f3a5d0ecca
feat: added domain and url extractor (using same extractor)
...
commit 43ab423d575cd15cc55041fb3fe2f21ffdd7adff
Author: Adam Pash <adam.pash@gmail.com>
Date: Wed Sep 14 11:57:25 2016 -0400
2016-09-14 11:58:09 -04:00
Adam Pash
67296691c2
refactor: page collection
2016-09-14 11:12:28 -04:00
Adam Pash
b325a4acdd
chore: clean up junk tests
2016-09-14 10:36:34 -04:00
Adam Pash
547ee2b4ca
Merge pull request #1 from postlight/test-fix-fixture-locations
...
Fix Fixture Locations
2016-09-14 10:34:50 -04:00
Adam Pash
62ae330db2
fix: bug in scoring and converting to paragraphs
2016-09-14 10:15:36 -04:00
Adam Pash
3694c2d12c
chore: improve linter/babelrc
2016-09-14 10:14:19 -04:00
Jeremy Mack
7ca19d2e6f
test: fix fixture locations
2016-09-14 08:09:14 -05:00
Adam Pash
7e2a34945f
chore: refactored and linted
2016-09-13 15:22:27 -04:00
Adam Pash
9906bd36a4
chore: moved content scoring out of utils, removed no-longer-necessary utils
2016-09-13 10:25:47 -04:00
Adam Pash
7ec0ed0d31
feat: nextPageUrl handles multi-page articles
...
Squashed commit of the following:
commit b5070c0967a7f1a0c0c449ba7ea40aebe8fe4bb8
Author: Adam Pash <adam.pash@gmail.com>
Date: Tue Sep 13 10:03:00 2016 -0400
root extractor includes next page url
commit 79be83127d5342d89eef33665586fabea227d6b3
Author: Adam Pash <adam.pash@gmail.com>
Date: Tue Sep 13 09:58:20 2016 -0400
small score adjustment
commit 0f00507dbff43401145a892e849311518edec68a
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Sep 12 18:17:38 2016 -0400
feat: nextPageUrl generic parser up and running
commit be91c589fc0c6d6f9b573080a76c9b1ac7af710c
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Sep 12 11:53:58 2016 -0400
feat: pageNumFromUrl extracts the pagenum of the current url
commit ad879d7aabedadfd051c01b42d841703bf4763fa
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Sep 12 11:52:37 2016 -0400
feat: isWordpress checks if a page is generated by wordpress
2016-09-13 10:08:49 -04:00
Adam Pash
a89b9b785e
feat: small improvement to author selectors
2016-09-12 10:51:29 -04:00
Adam Pash
acaab70ee2
fix: scorePs parent scoring was overwriting child scoring
2016-09-12 10:17:15 -04:00
Adam Pash
8fe3bec6b6
fix: accepting cookies with request (required for sites like
...
nytimes.com)
2016-09-12 10:08:49 -04:00
Adam Pash
74694ba8e2
debugging: cheerio isn't always consistent in setting scores
2016-09-09 17:16:11 -04:00
Adam Pash
47ac7e9803
refactor: limiting calls to $ function
...
Squashed commit of the following:
commit c72da261cb5319d1eef207bff63b3c9cd49018df
Author: Adam Pash <adam.pash@gmail.com>
Date: Fri Sep 9 15:28:43 2016 -0400
refactor: limiting calls to $ function
commit eeae88247d844d5c6acbc529dbc3ce4d14e04191
Author: Adam Pash <adam.pash@gmail.com>
Date: Fri Sep 9 15:14:33 2016 -0400
refactor: convertNodeTo; requires a cheerio object
2016-09-09 15:29:07 -04:00
Adam Pash
81e9e7a317
feat: whitelisting attrs to keep
2016-09-09 14:33:16 -04:00
Adam Pash
7b97559778
chore: remove logic for fetching meta tags with custom attrs (resource
...
normalizes this now
2016-09-09 13:56:06 -04:00
Adam Pash
c48e3485c0
chore: code reorganization
...
Squashed commit of the following:
commit 636296841d5cf5e685237fe70db7a15305d8e966
Author: Adam Pash <adam.pash@gmail.com>
Date: Fri Sep 9 13:37:21 2016 -0400
final cleanup
commit 51f712b3074d41a1f2da91519289d4dd09719ad0
Author: Adam Pash <adam.pash@gmail.com>
Date: Fri Sep 9 13:25:28 2016 -0400
Another big pass
commit 3860e6d872a9adb9290093fd9c8708dfcc773c28
Author: Adam Pash <adam.pash@gmail.com>
Date: Fri Sep 9 12:49:52 2016 -0400
chore: started reorganizing
2016-09-09 13:44:58 -04:00
Adam Pash
f2729a5ee6
improved wiki extractor
2016-09-09 12:00:09 -04:00
Adam Pash
52e89a0229
fix: cleaning embed and object nodes
2016-09-09 11:59:55 -04:00
Adam Pash
edfb54c532
feat: links are rewritten to absolute in cleaner
...
Squashed commit of the following:
commit 9057d411a5458f80c316604559c469a239ef3a40
Author: Adam Pash <adam.pash@gmail.com>
Date: Fri Sep 9 11:42:19 2016 -0400
feat: links are rewritten to absolute in cleaner
2016-09-09 11:42:55 -04:00
Adam Pash
bdc2c0c1da
feat: can now fetch attrs in RootExtractor's select method
2016-09-09 10:25:12 -04:00
Adam Pash
33c7e0d1c9
feat: Improved dateString parsing to handle more; first trying to parse without cleaning
2016-09-09 09:59:56 -04:00
Adam Pash
91881df523
refactor: cleaners now run on custom extractors
...
Squashed commit of the following:
commit e4c7d1d149d1846f0d589b3653655b81b477c682
Author: Adam Pash <adam.pash@gmail.com>
Date: Thu Sep 8 19:29:26 2016 -0400
refactor: cleaners now run on custom extractors
commit ca08d2482c54bf6a40f50758da9353f00987a4d7
Author: Adam Pash <adam.pash@gmail.com>
Date: Thu Sep 8 14:42:19 2016 -0400
moved cleaners, refactored as necessary
commit ec2c5d36410b255c6d8ee264deca990c46709c3c
Author: Adam Pash <adam.pash@gmail.com>
Date: Thu Sep 8 14:07:01 2016 -0400
moved datePublished cleaner
commit 5e55e397eecb3e88d64cd2aa2c6071c9cffed272
Author: Adam Pash <adam.pash@gmail.com>
Date: Thu Sep 8 13:34:21 2016 -0400
moved dek cleaner
commit 2dfb0c44d7882336992fdc864792df6eac094c21
Author: Adam Pash <adam.pash@gmail.com>
Date: Thu Sep 8 13:29:37 2016 -0400
moved lead-image-url
commit cef7a213b80ddd671249225622f1388f9e68896c
Author: Adam Pash <adam.pash@gmail.com>
Date: Thu Sep 8 13:26:20 2016 -0400
moved author
2016-09-08 19:31:45 -04:00