Mercury Parser - Extracting content from chaos #parser #url #html #extractor
Go to file
Adam Pash 7ec0ed0d31 feat: nextPageUrl handles multi-page articles
Squashed commit of the following:

commit b5070c0967a7f1a0c0c449ba7ea40aebe8fe4bb8
Author: Adam Pash <adam.pash@gmail.com>
Date:   Tue Sep 13 10:03:00 2016 -0400

    root extractor includes next page url

commit 79be83127d5342d89eef33665586fabea227d6b3
Author: Adam Pash <adam.pash@gmail.com>
Date:   Tue Sep 13 09:58:20 2016 -0400

    small score adjustment

commit 0f00507dbff43401145a892e849311518edec68a
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Sep 12 18:17:38 2016 -0400

    feat: nextPageUrl generic parser up and running

commit be91c589fc0c6d6f9b573080a76c9b1ac7af710c
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Sep 12 11:53:58 2016 -0400

    feat: pageNumFromUrl extracts the pagenum of the current url

commit ad879d7aabedadfd051c01b42d841703bf4763fa
Author: Adam Pash <adam.pash@gmail.com>
Date:   Mon Sep 12 11:52:37 2016 -0400

    feat: isWordpress checks if a page is generated by wordpress
2016-09-13 10:08:49 -04:00
fixtures feat: nextPageUrl handles multi-page articles 2016-09-13 10:08:49 -04:00
src feat: nextPageUrl handles multi-page articles 2016-09-13 10:08:49 -04:00
.babelrc chore: code reorganization 2016-09-09 13:44:58 -04:00
.gitignore feat: resource fetches content from a URL and prepares for parsing 2016-09-06 17:55:45 -04:00
NOTES.md notes, cleanup 2016-09-06 09:55:36 -04:00
package.json feat: nextPageUrl handles multi-page articles 2016-09-13 10:08:49 -04:00
read feat: getExtractor returns generic extractor 2016-09-07 13:56:57 -04:00
read.html feat: basic wikipedia custom extractor 2016-09-08 13:19:06 -04:00
rollup.config.js feat: RootExtractor performs extraction using custom and generic 2016-09-08 11:00:29 -04:00
test-runner feat: resource fetches content from a URL and prepares for parsing 2016-09-06 17:55:45 -04:00
TODO.md feat: small improvement to author selectors 2016-09-12 10:51:29 -04:00