Mercury Parser - Extracting content from chaos #parser #url #html #extractor
Go to file
2019-01-31 09:11:52 -08:00
.circleci feat: hook up ci to publish to npm (#226) 2019-01-31 09:33:51 +02:00
.github docs: PR and Issue templates (#211) 2019-01-24 09:36:01 +02:00
dist deps: upgrade (#218) 2019-01-23 09:54:42 -08:00
fixtures feat: add fortinet custom parser (#188) 2019-01-30 09:33:36 +02:00
scripts feat: hook up ci to publish to npm (#226) 2019-01-31 09:33:51 +02:00
src feat: hook up ci to publish to npm (#226) 2019-01-31 09:33:51 +02:00
.agignore chore: renamed iris to mercury 2016-09-16 13:26:37 -04:00
.all-contributorsrc Docs contributors (#227) 2019-01-30 09:26:50 +02:00
.babelrc chore: update node rollup config (#229) 2019-01-30 10:17:32 -08:00
.eslintignore Feat: browser support (#19) 2016-11-21 14:17:06 -08:00
.eslintrc deps: upgrade (#218) 2019-01-23 09:54:42 -08:00
.gitattributes fix: i put a bad comment in .gitattributes (#125) 2017-01-27 10:26:03 -08:00
.gitignore dx: include test results in comment (#230) 2019-01-29 17:04:21 -08:00
.nvmrc chore: update node and some deps (#209) 2019-01-16 16:03:36 -08:00
.prettierrc chore: update node and some deps (#209) 2019-01-16 16:03:36 -08:00
.remarkrc feat: add remarklint for md docs (#213) 2019-01-24 11:09:18 +02:00
appveyor.yml Feat: improving ci (#16) 2016-10-28 09:16:21 -07:00
CHANGELOG.md release: 1.0.13 (#183) 2018-10-12 15:01:42 -07:00
CODE_OF_CONDUCT.md docs: add code of conduct (#204) 2019-01-23 10:30:39 +02:00
CONTRIBUTING.md docs: add code of conduct path (#224) 2019-01-25 09:56:56 +02:00
karma.conf.js deps: upgrade (#218) 2019-01-23 09:54:42 -08:00
LICENSE-APACHE docs: add license files (#217) 2019-01-24 12:10:04 +02:00
LICENSE-MIT docs: add license files (#217) 2019-01-24 12:10:04 +02:00
package.json feat: hook up ci to publish to npm (#226) 2019-01-31 09:33:51 +02:00
preview feat: preview with optional rebuild (#36) 2016-11-30 16:37:42 -08:00
README.md docs: change text to include AMP and Reader (#236) 2019-01-31 09:11:52 -08:00
RELEASE.md docs: document release process (#186) 2018-12-20 09:30:47 -08:00
rollup.config.js deps: upgrade (#218) 2019-01-23 09:54:42 -08:00
rollup.config.web.js deps: upgrade (#218) 2019-01-23 09:54:42 -08:00
score-move chore: refactored and linted 2016-09-13 15:22:27 -04:00
yarn.lock feat: hook up ci to publish to npm (#226) 2019-01-31 09:33:51 +02:00

Mercury Parser - Extracting content from chaos

CircleCI Build status Apache License MITC License

The Mercury Parser extracts the bits that humans care about from any URL you give it. That includes article content, titles, authors, published dates, excerpts, lead images, and more.

Mercury Parser powers the Mercury AMP Converter and Mercury Reader, a Chrome extension that removes ads and distractions, leaving only text and images for a beautiful reading view on any site.

How? Like this.

Installation

yarn add @postlight/mercury-parser

Usage

import Mercury from '@postlight/mercury-parser';

Mercury.parse(url).then(result => console.log(result););

// NOTE: When used in the browser, you can omit the URL argument
// and simply run `Mercury.parse()` to parse the current page.

The result looks like this:

{
  "title": "Thunder (mascot)",
  "content": "<div><div><p>This is the content of the page!</div></div>",
  "author": "Wikipedia Contributors",
  "date_published": "2016-09-16T20:56:00.000Z",
  "lead_image_url": null,
  "dek": null,
  "next_page_url": null,
  "url": "https://en.wikipedia.org/wiki/Thunder_(mascot)",
  "domain": "en.wikipedia.org",
  "excerpt": "Thunder Thunder is the stage name for the horse who is the official live animal mascot for the Denver Broncos",
  "word_count": 4677,
  "direction": "ltr",
  "total_pages": 1,
  "rendered_pages": 1
}

If Mercury is unable to find a field, that field will return null.

License

Licensed under either of the below, at your preference:

Contributing

Unless it is explicitly stated otherwise, any contribution intentionally submitted for inclusion in the work, as defined in the Apache-2.0 license, shall be dual licensed as above without any additional terms or conditions.

Contributors

All Contributors

Adam Pash
Adam Pash

📝 💻 📖 💡
Toy Vano
Toy Vano

💻
Drew Bell
Drew Bell

💻
Jeremy Mack
Jeremy Mack

💻
George Haddad
George Haddad

💻 📖
Toufic Mouallem
Toufic Mouallem

💻 📖
Wajeeh Zantout
Wajeeh Zantout

💻 📖
Marc Esso
Marc Esso

💻 📖
Jad Termsani
Jad Termsani

💻 📖
Ralph Jbeily
Ralph Jbeily

💻 📖
Alexi Akl
Alexi Akl

💻