2019-02-02 02:01:39 +00:00
![Mercury Parser ](https://13c27d41k2ud2vkddp226w55-wpengine.netdna-ssl.com/wp-content/uploads/2018/02/7bacd-16qwcaegges3hkrw70doz4w.png )
2016-09-16 19:10:49 +00:00
# Mercury Parser - Extracting content from chaos
2019-02-05 23:32:42 +00:00
[![CircleCI ](https://circleci.com/gh/postlight/mercury-parser.svg?style=svg&circle-token=3026c2b527d3767750e767872d08991aeb4f8f10 )](https://circleci.com/gh/postlight/mercury-parser) [![Greenkeeper badge ](https://badges.greenkeeper.io/postlight/mercury-parser.svg )](https://greenkeeper.io/) [![Apache License][license-apach-badge]][license-apach] [![MITC License][license-mit-badge]][license-mit]
2019-02-06 16:18:55 +00:00
[![Gitter chat ](https://badges.gitter.im/postlight/mercury.png )](https://gitter.im/postlight/mercury)
2019-01-30 07:26:50 +00:00
2019-01-31 08:28:01 +00:00
[license-apach-badge]: https://img.shields.io/badge/License-Apache%202.0-blue.svg?style=flat-square
[license-apach]: https://github.com/postlight/mercury-parser/blob/master/LICENSE-APACHE
[license-mit-badge]: https://img.shields.io/badge/License-MIT%202.0-blue.svg?style=flat-square
[license-mit]: https://github.com/postlight/mercury-parser/blob/master/LICENSE-MIT
2016-10-27 23:20:30 +00:00
2016-09-16 19:10:49 +00:00
The Mercury Parser extracts the bits that humans care about from any URL you give it. That includes article content, titles, authors, published dates, excerpts, lead images, and more.
2019-01-31 17:11:52 +00:00
Mercury Parser powers the [Mercury AMP Converter ](https://mercury.postlight.com/amp-converter/ ) and [Mercury Reader ](https://mercury.postlight.com/reader/ ), a Chrome extension that removes ads and distractions, leaving only text and images for a beautiful reading view on any site.
2016-09-16 19:10:49 +00:00
2019-02-06 15:10:56 +00:00
Mercury Parser allows you to easily create custom parsers using simple JavaScript and CSS selectors. This allows you to proactively manage parsing and migration edge cases. There are [many examples available ](https://github.com/postlight/mercury-parser/tree/master/src/extractors/custom ) along with [documentation ](https://github.com/postlight/mercury-parser/blob/master/src/extractors/custom/README.md ).
2016-09-16 19:10:49 +00:00
## How? Like this.
2019-01-24 09:15:23 +00:00
### Installation
```bash
2019-02-04 17:03:19 +00:00
# If you're using yarn
2019-01-30 07:17:23 +00:00
yarn add @postlight/mercury -parser
2019-02-04 17:03:19 +00:00
# If you're using npm
npm install @postlight/mercury -parser
2019-01-24 09:15:23 +00:00
```
### Usage
2016-09-16 19:10:49 +00:00
```javascript
2019-01-30 07:17:23 +00:00
import Mercury from '@postlight/mercury-parser';
2016-09-16 19:10:49 +00:00
2019-02-11 23:44:00 +00:00
Mercury.parse(url).then(result => console.log(result));
2019-01-30 18:36:26 +00:00
// NOTE: When used in the browser, you can omit the URL argument
// and simply run `Mercury.parse()` to parse the current page.
2016-09-16 19:10:49 +00:00
```
The result looks like this:
```json
{
"title": "Thunder (mascot)",
"content": "< div > < div > < p > This is the content of the page!< / div > < / div > ",
"author": "Wikipedia Contributors",
"date_published": "2016-09-16T20:56:00.000Z",
"lead_image_url": null,
"dek": null,
"next_page_url": null,
"url": "https://en.wikipedia.org/wiki/Thunder_(mascot)",
"domain": "en.wikipedia.org",
"excerpt": "Thunder Thunder is the stage name for the horse who is the official live animal mascot for the Denver Broncos",
"word_count": 4677,
"direction": "ltr",
"total_pages": 1,
"rendered_pages": 1
}
```
If Mercury is unable to find a field, that field will return `null` .
2016-09-20 14:35:23 +00:00
2019-02-06 17:46:13 +00:00
Mercury Parser also ships with a CLI, meaning you can use the Mercury Parser
from your command line like so:
```bash
# Install Mercury globally
yarn global add @postlight/mercury -parser
# or
npm -g install @postlight/mercury -parser
# Then
mercury-parser https://postlight.com/trackchanges/mercury-goes-open-source
```
2019-01-24 10:10:04 +00:00
## License
Licensed under either of the below, at your preference:
- Apache License, Version 2.0
([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license
([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
2016-09-20 14:35:23 +00:00
## Contributing
2019-02-01 22:10:59 +00:00
For details on how to contribute to Mercury, including how to write a custom content extractor for any site, see [CONTRIBUTING.md ](./CONTRIBUTING.md )
2019-01-31 17:25:54 +00:00
2019-01-24 10:10:04 +00:00
Unless it is explicitly stated otherwise, any contribution intentionally submitted for inclusion in the work, as defined in the Apache-2.0 license, shall be dual licensed as above without any additional terms or conditions.