You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

4.8 KiB

Raw Blame History

readability-cli(1) -- get useful text from a web page

SYNOPSYS

readable [SOURCE] [options]...

DESCRIPTION

readability-cli takes any HTML page and strips out unnecessary bloat, leaving only the core text content. The resulting HTML may be suitable for terminal browsers, text readers, and other uses.

This package provides the readable command, which uses Mozilla's Readability library. The same library is used in Firefox's Reader View.

OPTIONS

The SOURCE can be a URL, a file, or '-' for standard input.

--help

Show help message, and exit.

-b, --base URL

Specify the document's URL. This affects relative links: they will not work if readability-cli does not know the base URL. You only need this option if you read HTML from a local file, or from standard input.

-i, --insane

Don't sanitize HTML.

-K, --insecure

Allow invalid SSL certificates.

-j, --json

Output all known properties of the document as JSON (see Properties subsection).

-l, --low-confidence MODE

What to do if Readability is uncertain about what the core content actually is. The possible modes are:
- keep - When unsure, don't touch the HTML, output as-is.
- force - Process the document even when unsure (may produce really bad output).
- exit - When unsure, exit with an error.
The default value is keep. If the --properties or --json options are set, the program will always run in exit mode.

-C, --keep-classes

Preserve CSS classes for input elements. By default, CSS classes are stripped, and the input is adapted for Firefox's Reader View.

-o, --output FILE

Output the result to FILE.

-p, --properties PROPERTIES...

Output specific properties of the document (see Properties subsection).

-x, --proxy URL

Use specified proxy (can also use HTTPS_PROXY environment variable).

-q, --quiet

Don't print extra information.

-s, --style

Specify .css file for stylesheet.

-A, --user-agent STRING

Set custom user agent string.

-V, --version

Print readability-cli and Node.js version, then exit.

--completion

Print script for shell completion, and exit. Provides Zsh completion if the current shell is zsh, otherwise provides Bash completion.

Properties

The --properties option accepts a list of values, separated by spaces. Suitable values are:

title - The title of the article.
html-title - The title of the article, wrapped in an <h1> tag.
excerpt - Article description, or short excerpt from the content.
byline - Data about the page's author.
length - Length of the article in characters.
dir - Text direction, is either "ltr" for left-to-right or "rtl" for right-to-left.
text-content - Output the article's main content as plain text.
html-content - Output the article's main content as an HTML body.

Properties are printed line by line, in the order specified by the user. Only "text-content" and "html-content" is printed as multiple lines.

EXIT STATUS

As usual, exit code 0 indicates success, and anything other than 0 is an error. readability-cli uses standard* error codes:

Error code	Meaning
64	Bad CLI arguments
65	Data format error: can't parse document using Readability.
66	No input
68	Unknown host name for URL
77	Permission denied: can't read file

* By "standard error codes" I mean "close to a standard". And by that I mean: I actually don't remember any command line tools which use this convention. You may find more info in sysexits(3), or maybe just sysexits.h.

ENVIRONMENT

readability-cli supports localization, using the environment variables LC_ALL, LC_MESSAGES, LANG and LANGUAGE, in that order. Only one language at a time is supported.

HTTPS_PROXY will set the HTTPS proxy, as previously stated, however the --proxy option overrides this. Lowercase https_proxy and http_proxy are also recognized.

EXAMPLE

Read HTML from a file and output the result to the console:

readable index.html

Fetch a random Wikipedia article, get its title and an excerpt:

readable https://en.wikipedia.org/wiki/Special:Random -p title,excerpt

Fetch a web page and read it in W3M:

readable https://www.nytimes.com/2020/01/18/technology/clearview-privacy-facial-recognition.html | w3m -T text/html

Download a web page using cURL, parse it and output as JSON:

curl https://github.com/mozilla/readability | readable --base=https://github.com/mozilla/readability --json

4.8 KiB Raw Blame History