Commit Graph

302 Commits (1b28713cf5f45685b7a8326bee59fbf48b3174e3)
 

Author SHA1 Message Date
Adam Pash 396313aeae feat: added twitter custom extractor
Squashed commit of the following:

commit 8116f14364869b72a8afabfcb44b2ac154caed96
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 15 16:27:27 2016 -0400

    feat: added twitter custom extractor

commit e478eb1b0bcdcb65fdd5fa64e37be92b6defd702
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 15 16:22:54 2016 -0400

    fix: made custom extractors and cleaners adhere to underscore keys
8 years ago
Adam Pash d60d396c98 feat: added text direction to response 8 years ago
Adam Pash f0f216c7b9 feat: add option to allow custom extractors to skip default cleaners 8 years ago
Adam Pash 97a0728ecf test: added sanity test for get-extractor 8 years ago
Adam Pash 7c375aded7 chore: cleanup 8 years ago
Adam Pash 4cdc4165d6 fix: encodeURI before fetching 8 years ago
Adam Pash 1343469b6c fix: explicit/better decoding of gzipped content 8 years ago
Adam Pash 7638c15077 push new build for testing 8 years ago
Adam Pash c338098f21 refactor: renamed child to sibling for clarity 8 years ago
Adam Pash 6263e505d5 fix: handling case where node.get(0) returns null 8 years ago
Adam Pash 2bf274114f chore: disable camelcase for linting 8 years ago
Adam Pash 3b36a33e36 chore: change result keys to match python api 8 years ago
Adam Pash cc060b794d fix: wordcount calling excerpt 8 years ago
Adam Pash 7fc1f7f6bb checking in dist 8 years ago
Adam Pash c76435ce62 updated name in package.json 8 years ago
Adam Pash f1cff0b435 chore: removed TODO.md 8 years ago
Adam Pash daa9266182 feat: generic extractor for word count
Squashed commit of the following:

commit 0aba26ef9efba71a72c76fa351a9037e97fc1e9e
Author: Adam Pash <adam.pash@gmail.com>
Date:   Wed Sep 14 14:56:45 2016 -0400

    fix: normalizeSpaces regex fix broke a test

commit 07d60c1c8c6599d6c94d92e5a70649c28d03d6ea
Author: Adam Pash <adam.pash@gmail.com>
Date:   Wed Sep 14 14:52:41 2016 -0400

    feat: generic extractor for word count
8 years ago
Adam Pash 76df30e303 chore: cleanup 8 years ago
Adam Pash b3481a2c45 feat: generic excerpt extraction 8 years ago
Adam Pash 457075889d fix: selection should not be empty 8 years ago
Adam Pash 81ed4f00ed feat: improve nymag.com extractor to grab deks from features 8 years ago
Adam Pash 21f444367f feat: added page counts 8 years ago
Adam Pash f3a5d0ecca feat: added domain and url extractor (using same extractor)
commit 43ab423d575cd15cc55041fb3fe2f21ffdd7adff
Author: Adam Pash <adam.pash@gmail.com>
Date:   Wed Sep 14 11:57:25 2016 -0400
8 years ago
Adam Pash 67296691c2 refactor: page collection 8 years ago
Adam Pash b325a4acdd chore: clean up junk tests 8 years ago
Adam Pash 547ee2b4ca Merge pull request #1 from postlight/test-fix-fixture-locations
Fix Fixture Locations
8 years ago
Adam Pash 62ae330db2 fix: bug in scoring and converting to paragraphs 8 years ago
Adam Pash 3694c2d12c chore: improve linter/babelrc 8 years ago
Jeremy Mack 7ca19d2e6f test: fix fixture locations 8 years ago
Adam Pash 7e2a34945f chore: refactored and linted 8 years ago
Adam Pash 9906bd36a4 chore: moved content scoring out of utils, removed no-longer-necessary utils 8 years ago
Adam Pash 7ec0ed0d31 feat: nextPageUrl handles multi-page articles
Squashed commit of the following:

commit b5070c0967a7f1a0c0c449ba7ea40aebe8fe4bb8
Author: Adam Pash <adam.pash@gmail.com>
Date:   Tue Sep 13 10:03:00 2016 -0400

    root extractor includes next page url

commit 79be83127d5342d89eef33665586fabea227d6b3
Author: Adam Pash <adam.pash@gmail.com>
Date:   Tue Sep 13 09:58:20 2016 -0400

    small score adjustment

commit 0f00507dbff43401145a892e849311518edec68a
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Sep 12 18:17:38 2016 -0400

    feat: nextPageUrl generic parser up and running

commit be91c589fc0c6d6f9b573080a76c9b1ac7af710c
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Sep 12 11:53:58 2016 -0400

    feat: pageNumFromUrl extracts the pagenum of the current url

commit ad879d7aabedadfd051c01b42d841703bf4763fa
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Sep 12 11:52:37 2016 -0400

    feat: isWordpress checks if a page is generated by wordpress
8 years ago
Adam Pash a89b9b785e feat: small improvement to author selectors 8 years ago
Adam Pash acaab70ee2 fix: scorePs parent scoring was overwriting child scoring 8 years ago
Adam Pash 8fe3bec6b6 fix: accepting cookies with request (required for sites like
nytimes.com)
8 years ago
Adam Pash 74694ba8e2 debugging: cheerio isn't always consistent in setting scores 8 years ago
Adam Pash 47ac7e9803 refactor: limiting calls to $ function
Squashed commit of the following:

commit c72da261cb5319d1eef207bff63b3c9cd49018df
Author: Adam Pash <adam.pash@gmail.com>
Date:   Fri Sep 9 15:28:43 2016 -0400

    refactor: limiting calls to $ function

commit eeae88247d844d5c6acbc529dbc3ce4d14e04191
Author: Adam Pash <adam.pash@gmail.com>
Date:   Fri Sep 9 15:14:33 2016 -0400

    refactor: convertNodeTo; requires a cheerio object
8 years ago
Adam Pash 81e9e7a317 feat: whitelisting attrs to keep 8 years ago
Adam Pash 7b97559778 chore: remove logic for fetching meta tags with custom attrs (resource
normalizes this now
8 years ago
Adam Pash c48e3485c0 chore: code reorganization
Squashed commit of the following:

commit 636296841d5cf5e685237fe70db7a15305d8e966
Author: Adam Pash <adam.pash@gmail.com>
Date:   Fri Sep 9 13:37:21 2016 -0400

    final cleanup

commit 51f712b3074d41a1f2da91519289d4dd09719ad0
Author: Adam Pash <adam.pash@gmail.com>
Date:   Fri Sep 9 13:25:28 2016 -0400

    Another big pass

commit 3860e6d872a9adb9290093fd9c8708dfcc773c28
Author: Adam Pash <adam.pash@gmail.com>
Date:   Fri Sep 9 12:49:52 2016 -0400

    chore: started reorganizing
8 years ago
Adam Pash f2729a5ee6 improved wiki extractor 8 years ago
Adam Pash 52e89a0229 fix: cleaning embed and object nodes 8 years ago
Adam Pash edfb54c532 feat: links are rewritten to absolute in cleaner
Squashed commit of the following:

commit 9057d411a5458f80c316604559c469a239ef3a40
Author: Adam Pash <adam.pash@gmail.com>
Date:   Fri Sep 9 11:42:19 2016 -0400

    feat: links are rewritten to absolute in cleaner
8 years ago
Adam Pash bdc2c0c1da feat: can now fetch attrs in RootExtractor's select method 8 years ago
Adam Pash 33c7e0d1c9 feat: Improved dateString parsing to handle more; first trying to parse without cleaning 8 years ago
Adam Pash 91881df523 refactor: cleaners now run on custom extractors
Squashed commit of the following:

commit e4c7d1d149d1846f0d589b3653655b81b477c682
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 8 19:29:26 2016 -0400

    refactor: cleaners now run on custom extractors

commit ca08d2482c54bf6a40f50758da9353f00987a4d7
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 8 14:42:19 2016 -0400

    moved cleaners, refactored as necessary

commit ec2c5d36410b255c6d8ee264deca990c46709c3c
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 8 14:07:01 2016 -0400

    moved datePublished cleaner

commit 5e55e397eecb3e88d64cd2aa2c6071c9cffed272
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 8 13:34:21 2016 -0400

    moved dek cleaner

commit 2dfb0c44d7882336992fdc864792df6eac094c21
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 8 13:29:37 2016 -0400

    moved lead-image-url

commit cef7a213b80ddd671249225622f1388f9e68896c
Author: Adam Pash <adam.pash@gmail.com>
Date:   Thu Sep 8 13:26:20 2016 -0400

    moved author
8 years ago
Adam Pash 603682239d feat: basic wikipedia custom extractor 8 years ago
Adam Pash 9665fe7209 feat: blogspot.com custom extractor 8 years ago
Adam Pash 6c6451b34b fix: duplicate key bug 8 years ago
Adam Pash 93ca688955 fix: dek and leadImg should not be html 8 years ago