Merge branch 'master' of https://github.com/bookieio/breadability into upstream-sync

Conflicts: CHANGELOG.rst README.rst breadability/document.py breadability/scoring.py breadability/scripts/client.py setup.py tests/test_articles/test_sweetshark/article.html tests/test_articles/test_sweetshark/test.py
10 years ago · 687d2ecfdf
parent 6549a6c307 6906f3b2fa
commit 687d2ecfdf
31 changed files with 4141 additions and 146 deletions
--- a/CHANGELOG.rst
+++ b/CHANGELOG.rst
@ -1,19 +1,41 @@
 .. :changelog:

-Changelog for readability
+Changelog for breadability
 ==========================
- Sibling node is appended only when sibling doesn't already exist.
- Treat images a little differently so they get more inclusion.
- Added User-Agent string into HTTP requests.
- Added property ``Article.main_text`` for getting text annotated with
+
+0.1.17 (Jan 22nd 2014)
+----------------------
+- More log quieting down to INFO vs WARN
+
+0.1.16 (Jan 22nd 2014)
+----------------------
+- Clean up logging output at warning when it's not a true warning
+
+0.1.15 (Nov 29th 2013)
+-----------------------
+- Merge changes from 0.1.14 of breadability with the fork https://github.com/miso-belica/readability.py and tweaking to return to the name breadability.
+- Fork: Added property ``Article.main_text`` for getting text annotated with
  semantic HTML tags (<em>, <strong>, ...).
- Join node with 1 child of the same type. From
+- Fork: Join node with 1 child of the same type. From
  ``<div><div>...</div></div>`` we get ``<div>...</div>``.
- Don't change <div> to <p> if it contains <p> elements.
- Renamed test generation helper 'readability_newtest' -> 'readability_test'.
- Renamed package to readability.
- Added support for Python >= 3.2.
- Py3k compatible package 'charade' is used instead of 'chardet'.
+- Fork: Don't change <div> to <p> if it contains <p> elements.
+- Fork: Renamed test generation helper 'readability_newtest' -> 'readability_test'.
+- Fork: Renamed package to readability. (Renamed back)
+- Fork: Added support for Python >= 3.2.
+- Fork: Py3k compatible package 'charade' is used instead of 'chardet'.
+
+0.1.14 (Nov 7th 2013)
+----------------------
+- Update sibling append to only happen when sibling doesn't already exist.
+
+0.1.13 (Aug 31st 2013)
+-----------------------
+- Give images in content boy a better chance of survival
+- Add tests
+
+0.1.12 (July 28th 2013)
+-----------------------
+- Add a user agent to requests.

 0.1.11 (Dec 12th 2012)
 -----------------------
--- a/CREDITS.txt
+++ b/CREDITS.txt
@ -0,0 +1,4 @@
+Rick Harding
+nhnifong
+Craig Maloney
+Mišo Belica
--- a/72
+++ b/72
@ -0,0 +1,72 @@
+# Makefile to help automate tasks
+WD := $(shell pwd)
+PY := bin/python
+PIP := bin/pip
+PEP8 := bin/pep8
+NOSE := bin/nosetests
+
+# ###########
+# Tests rule!
+# ###########
+.PHONY: test
+test: venv develop $(NOSE)
+	$(NOSE) -s tests
+
+$(NOSE):
+	$(PIP) install nose nose-selecttests pep8 pylint coverage
+
+# #######
+# INSTALL
+# #######
+.PHONY: all
+all: venv deps develop
+
+venv: bin/python
+bin/python:
+	virtualenv .
+
+.PHONY: deps
+deps: venv
+	pip install -r requirements.txt
+
+.PHONY: clean_venv
+clean_venv:
+	rm -rf bin include lib local man share
+
+.PHONY: develop
+develop: lib/python*/site-packages/breadability.egg-link
+lib/python*/site-packages/breadability.egg-link:
+	$(PY) setup.py develop
+
+
+# ###########
+# Development
+# ###########
+.PHONY: clean_all
+clean_all: clean_venv
+	if [ -d dist ]; then \
+		rm -r dist; \
+    fi
+
+
+bin/flake8: venv
+	bin/pip install flake8
+
+lint: bin/flake8
+	flake8 breadability
+
+
+# ###########
+# Deploy
+# ###########
+.PHONY: dist
+dist:
+	$(PY) setup.py sdist
+
+.PHONY: upload
+upload:
+	$(PY) setup.py sdist upload
+
+.PHONY: version_update
+version_update:
+	$(EDITOR) setup.py CHANGELOG.rst
--- a/README.rst
+++ b/README.rst
@ -1,14 +1,14 @@
-Readability.py - another readability Python port
-==============================================
-.. image:: https://api.travis-ci.org/miso-belica/readability.py.png?branch=master
-   :target: https://travis-ci.org/miso-belica/readability.py
+breadability - another readability Python (v2.6-v3.3) port
+===========================================================
+.. image:: https://api.travis-ci.org/bookieio/breadability.png?branch=master
+   :target: https://travis-ci.org/bookieio/breadability.py

 I've tried to work with the various forks of some ancient codebase that ported
 `readability`_ to Python. The lack of tests, unused regex's, and commented out
 sections of code in other Python ports just drove me nuts.

 I put forth an effort to bring in several of the better forks into one
-codebase, but they've diverged so much that I just can't work with it.
+code base, but they've diverged so much that I just can't work with it.

 So what's any sane person to do? Re-port it with my own repo, add some tests,
 infrastructure, and try to make this port better. OSS FTW (and yea, NIH FML,
@ -47,7 +47,7 @@ things from pip so that it can compile.
 .. code-block:: bash

    $ [sudo] apt-get install libxml2-dev libxslt-dev
-    $ [sudo] pip install git+git://github.com/miso-belica/readability.py.git
+    $ [sudo] pip install git+git://github.com/bookieio/breadability.git

 Tests
 -----
@ -63,7 +63,7 @@ Command line

 .. code-block:: bash

-    $ readability http://wiki.python.org/moin/BeginnersGuide
+    $ breadability http://wiki.python.org/moin/BeginnersGuide

 Options
 ```````
@ -85,7 +85,7 @@ Python API

    from __future__ import print_function

-    from readability.readable import Article
+    from breadability.readable import Article


    if __name__ == "__main__":
@ -103,7 +103,7 @@ hopefully things are setup in a way that those can/will be added.
 Fortunately, I need this library for my tools:

 - https://bmark.us
- http://readable.bmark.us
+- http://r.bmark.us

 so I really need this to be an active and improving project.

--- a/breadability/init.py
+++ b/breadability/init.py
@ -0,0 +1,11 @@
+# -*- coding: utf8 -*-
+
+from __future__ import (
+    absolute_import,
+    division,
+    print_function,
+    unicode_literals
+)
+import pkg_resources
+
+__version__ = pkg_resources.get_distribution("breadability").version
--- a/breadability/_compat.py
+++ b/breadability/_compat.py
@ -19,9 +19,13 @@ string_types = (bytes, unicode,)


 try:
+    # Assert to hush pyflakes about the unused import. This is a _compat
+    # module and we expect this to aid in other code importing urllib.
    import urllib2 as urllib
+    assert urllib
 except ImportError:
    import urllib.request as urllib
+    assert urllib


 def unicode_compatible(cls):
--- a/breadability/annotated_text.py
+++ b/breadability/annotated_text.py
--- a/breadability/document.py
+++ b/breadability/document.py
@ -8,17 +8,31 @@ import re
 import logging
 import charade

-from lxml.etree import tounicode, XMLSyntaxError
-from lxml.html import document_fromstring, HTMLParser
-
-from ._compat import unicode, to_bytes, to_unicode, unicode_compatible
+from lxml.etree import (
+    tounicode,
+    XMLSyntaxError,
+)
+from lxml.html import (
+    document_fromstring,
+    HTMLParser,
+)
+
+from ._compat import (
+    to_bytes,
+    to_unicode,
+    unicode,
+    unicode_compatible,
+)
 from .utils import cached_property


-logger = logging.getLogger("readability")
+logger = logging.getLogger("breadability")


 TAG_MARK_PATTERN = re.compile(to_bytes(r"</?[^>]*>\s*"))
+UTF8_PARSER = HTMLParser(encoding="utf8")
+
+
 def determine_encoding(page):
    encoding = "utf8"
    text = TAG_MARK_PATTERN.sub(to_bytes(" "), page)
@ -43,7 +57,12 @@ def determine_encoding(page):
    return encoding


-BREAK_TAGS_PATTERN = re.compile(to_unicode(r"(?:<\s*[bh]r[^>]*>\s*)+"), re.IGNORECASE)
+BREAK_TAGS_PATTERN = re.compile(
+    to_unicode(r"(?:<\s*[bh]r[^>]*>\s*)+"),
+    re.IGNORECASE
+)
+
+
 def convert_breaks_to_paragraphs(html):
    """
    Converts <hr> tag and multiple <br> tags into paragraph.
@ -64,7 +83,6 @@ def _replace_break_tags(match):
        return tags


-UTF8_PARSER = HTMLParser(encoding="utf8")
 def build_document(html_content, base_href=None):
    """Requires that the `html_content` not be None"""
    assert html_content is not None
--- a/breadability/readable.py
+++ b/breadability/readable.py
@ -13,12 +13,17 @@ from lxml.html import fragment_fromstring, fromstring

 from .document import OriginalDocument
 from .annotated_text import AnnotatedTextHandler
-from .scoring import (score_candidates, get_link_density, get_class_weight,
-    is_unlikely_node)
+from .scoring import (
+    get_class_weight,
+    get_link_density,
+    is_unlikely_node,
+    score_candidates,
+)
 from .utils import cached_property, shrink_text


-html_cleaner = Cleaner(scripts=True, javascript=True, comments=True,
+html_cleaner = Cleaner(
+    scripts=True, javascript=True, comments=True,
    style=True, links=True, meta=False, add_nofollow=False,
    page_structure=False, processing_instructions=True,
    embedded=False, frames=False, forms=False,
@ -44,7 +49,7 @@ NULL_DOCUMENT = """
 </html>
 """

-logger = logging.getLogger("readability")
+logger = logging.getLogger("breadability")


 def ok_embedded_video(node):
@ -129,7 +134,8 @@ def check_siblings(candidate_node, candidate_list):
            content_bonus += candidate_node.content_score * 0.2

        if sibling in candidate_list:
-            adjusted_score = candidate_list[sibling].content_score + content_bonus
+            adjusted_score = \
+                candidate_list[sibling].content_score + content_bonus

            if adjusted_score >= sibling_target_score:
                append = True
@ -146,7 +152,8 @@ def check_siblings(candidate_node, candidate_list):
                    append = True

        if append:
-            logger.debug("Sibling appended: %s %r", sibling.tag, sibling.attrib)
+            logger.debug(
+                "Sibling appended: %s %r", sibling.tag, sibling.attrib)
            if sibling.tag not in ("div", "p"):
                # We have a node that isn't a common block level element, like
                # a form or td tag. Turn it into a div so it doesn't get
@ -191,7 +198,8 @@ def clean_document(node):
        if n.tag in ("div", "p"):
            text_content = shrink_text(n.text_content())
            if len(text_content) < 5 and not n.getchildren():
-                logger.debug("Dropping %s %r without content.", n.tag, n.attrib)
+                logger.debug(
+                    "Dropping %s %r without content.", n.tag, n.attrib)
                to_drop.append(n)

        # finally try out the conditional cleaning of the target node
@ -206,7 +214,8 @@ def clean_document(node):
 def drop_nodes_with_parents(nodes):
    for node in nodes:
        if node.getparent() is not None:
-            logger.debug("Droping node with parent %s %r", node.tag, node.attrib)
+            logger.debug(
+                "Droping node with parent %s %r", node.tag, node.attrib)
            node.drop_tree()


@ -231,7 +240,8 @@ def clean_conditionally(node):

    commas_count = node.text_content().count(',')
    if commas_count < 10:
-        logger.debug("There are %d commas so we're processing more.", commas_count)
+        logger.debug(
+            "There are %d commas so we're processing more.", commas_count)

        # If there are not very many commas, and the number of
        # non-paragraph elements is more than paragraphs or other ominous
@ -267,7 +277,8 @@ def clean_conditionally(node):
            logger.debug('Conditional drop: weight big but link heavy')
            remove_node = True
        elif (embed == 1 and content_length < 75) or embed > 1:
-            logger.debug('Conditional drop: embed w/o much content or many embed')
+            logger.debug(
+                'Conditional drop: embed w/o much content or many embed')
            remove_node = True

        if remove_node:
@ -305,10 +316,12 @@ def find_candidates(document):

    for node in document.iter():
        if is_unlikely_node(node):
-            logger.debug("We should drop unlikely: %s %r", node.tag, node.attrib)
+            logger.debug(
+                "We should drop unlikely: %s %r", node.tag, node.attrib)
            should_remove.add(node)
        elif is_bad_link(node):
-            logger.debug("We should drop bad link: %s %r", node.tag, node.attrib)
+            logger.debug(
+                "We should drop bad link: %s %r", node.tag, node.attrib)
            should_remove.add(node)
        elif node.tag in SCORABLE_TAGS:
            nodes_to_score.add(node)
@ -399,11 +412,12 @@ class Article(object):
    def _readable(self):
        """The readable parsed article"""
        if not self.candidates:
-            logger.warning("No candidates found in document.")
+            logger.info("No candidates found in document.")
            return self._handle_no_candidates()

        # right now we return the highest scoring candidate content
-        best_candidates = sorted((c for c in self.candidates.values()),
+        best_candidates = sorted(
+            (c for c in self.candidates.values()),
            key=attrgetter("content_score"), reverse=True)

        printer = PrettyPrinter(indent=2)
@ -415,9 +429,11 @@ class Article(object):
        updated_winner = check_siblings(winner, self.candidates)
        updated_winner.node = prep_article(updated_winner.node)
        if updated_winner.node is not None:
-            dom = build_base_document(updated_winner.node, self._return_fragment)
+            dom = build_base_document(
+                updated_winner.node, self._return_fragment)
        else:
-            logger.warning('Had candidates but failed to find a cleaned winning DOM.')
+            logger.info(
+                'Had candidates but failed to find a cleaned winning DOM.')
            dom = self._handle_no_candidates()

        return self._remove_orphans(dom.get_element_by_id("readabilityBody"))
@ -437,9 +453,10 @@ class Article(object):
        if self.dom is not None and len(self.dom):
            dom = prep_article(self.dom)
            dom = build_base_document(dom, self._return_fragment)
-            return self._remove_orphans(dom.get_element_by_id("readabilityBody"))
+            return self._remove_orphans(
+                dom.get_element_by_id("readabilityBody"))
        else:
-            logger.warning("No document to use.")
+            logger.info("No document to use.")
            return build_error_document(self._return_fragment)


@ -454,7 +471,8 @@ def leaf_div_elements_into_paragraphs(document):
    for element in document.iter(tag="div"):
        child_tags = tuple(n.tag for n in element.getchildren())
        if "div" not in child_tags and "p" not in child_tags:
-            logger.debug("Changing leaf block element <%s> into <p>", element.tag)
+            logger.debug(
+                "Changing leaf block element <%s> into <p>", element.tag)
            element.tag = "p"

    return document
--- a/breadability/scoring.py
+++ b/breadability/scoring.py
@ -17,9 +17,9 @@ from .utils import normalize_whitespace
 # A series of sets of attributes we check to help in determining if a node is
 # a potential candidate or not.
 CLS_UNLIKELY = re.compile(
-    "combx|comment|community|disqus|extra|foot|header|menu|remark|rss|shoutbox|"
-    "sidebar|sponsor|ad-break|agegate|pagination|pager|perma|popup|tweet|"
-    "twitter|social|breadcrumb",
+    "combx|comment|community|disqus|extra|foot|header|menu|remark|rss|"
+    "shoutbox|sidebar|sponsor|ad-break|agegate|pagination|pager|perma|popup|"
+    "tweet|twitter|social|breadcrumb",
    re.IGNORECASE
 )
 CLS_MAYBE = re.compile(
@ -32,12 +32,12 @@ CLS_WEIGHT_POSITIVE = re.compile(
 )
 CLS_WEIGHT_NEGATIVE = re.compile(
    "combx|comment|com-|contact|foot|footer|footnote|head|masthead|media|meta|"
-    "outbrain|promo|related|scroll|shoutbox|sidebar|sponsor|shopping|tags|tool|"
-    "widget",
+    "outbrain|promo|related|scroll|shoutbox|sidebar|sponsor|shopping|tags|"
+    "tool|widget",
    re.IGNORECASE
 )

-logger = logging.getLogger("readability")
+logger = logging.getLogger("breadability")


 def check_node_attributes(pattern, node, *attributes):
@ -146,8 +146,8 @@ def score_candidates(nodes):
    for node in nodes:
        logger.debug("* Scoring candidate %s %r", node.tag, node.attrib)

-        # if the node has no parent it knows of
-        # then it ends up creating a body & html tag to parent the html fragment
+        # if the node has no parent it knows of then it ends up creating a
+        # body & html tag to parent the html fragment
        parent = node.getparent()
        if parent is None:
            logger.debug("Skipping candidate - parent node is 'None'.")
@ -161,7 +161,9 @@ def score_candidates(nodes):
        # if paragraph is < `MIN_HIT_LENTH` characters don't even count it
        inner_text = node.text_content().strip()
        if len(inner_text) < MIN_HIT_LENTH:
-            logger.debug("Skipping candidate - inner text < %d characters.", MIN_HIT_LENTH)
+            logger.debug(
+                "Skipping candidate - inner text < %d characters.",
+                MIN_HIT_LENTH)
            continue

        # initialize readability data for the parent
@ -184,7 +186,8 @@ def score_candidates(nodes):
            # subtract 0.5 points for each double quote within this paragraph
            double_quotes_count = inner_text.count('"')
            content_score += double_quotes_count * -0.5
-            logger.debug("Penalty points for %d double-quotes.", double_quotes_count)
+            logger.debug(
+                "Penalty points for %d double-quotes.", double_quotes_count)

            # for every 100 characters in this paragraph, add another point
            # up to 3 points
@ -193,12 +196,14 @@ def score_candidates(nodes):
            logger.debug("Bonus points for length of text: %f", length_points)

        # add the score to the parent
-        logger.debug("Bonus points for parent %s %r with score %f: %f",
+        logger.debug(
+            "Bonus points for parent %s %r with score %f: %f",
            parent.tag, parent.attrib, candidates[parent].content_score,
            content_score)
        candidates[parent].content_score += content_score
        # the grand node gets half
-        logger.debug("Bonus points for grand %s %r with score %f: %f",
+        logger.debug(
+            "Bonus points for grand %s %r with score %f: %f",
            grand.tag, grand.attrib, candidates[grand].content_score,
            content_score / 2.0)
        candidates[grand].content_score += content_score / 2.0
@ -210,7 +215,8 @@ def score_candidates(nodes):
    for candidate in candidates.values():
        adjustment = 1.0 - get_link_density(candidate.node)
        candidate.content_score *= adjustment
-        logger.debug("Link density adjustment for %s %r: %f",
+        logger.debug(
+            "Link density adjustment for %s %r: %f",
            candidate.node.tag, candidate.node.attrib, adjustment)

    return candidates
--- a/breadability/scripts/init.py
+++ b/breadability/scripts/init.py
--- a/breadability/scripts/client.py
+++ b/breadability/scripts/client.py
@ -4,9 +4,9 @@
 A fast python port of arc90's readability tool

 Usage:
-    readability [options] <resource>
-    readability --version
-    readability --help
+    breadability [options] <resource>
+    breadability --version
+    breadability --help

 Arguments:
  <resource>      URL or file path to process in readable form.
@ -37,7 +37,10 @@ from ..readable import Article


 HEADERS = {
-    "User-Agent": "Readability (Readable content parser; https://github.com/miso-belica/readability.py) Version/%s" % __version__,
+    "User-Agent": 'breadability/{version} ({url})'.format(
+        url="https://github.com/bookieio/breadability",
+        version=__version__
+    )
 }


@ -47,7 +50,7 @@ def parse_args():

 def main():
    args = parse_args()
-    logger = logging.getLogger("readability")
+    logger = logging.getLogger("breadability")

    if args["--verbose"]:
        logger.setLevel(logging.DEBUG)
--- a/breadability/scripts/test_helper.py
+++ b/breadability/scripts/test_helper.py
@ -1,12 +1,12 @@
 # -*- coding: utf8 -*-

 """
-Helper to generate a new set of article test files for readability.
+Helper to generate a new set of article test files for breadability.

 Usage:
-    readability_test --name <name> <url>
-    readability_test --version
-    readability_test --help
+    breadability_test --name <name> <url>
+    breadability_test --version
+    breadability_test --help

 Arguments:
  <url>                   The url of content to fetch for the article.html
@ -39,7 +39,7 @@ from __future__ import absolute_import
 from __future__ import division, print_function, unicode_literals

 from os.path import join, dirname
-from readability.readable import Article
+from breadability.readable import Article
 from ...compat import unittest


--- a/breadability/utils.py
+++ b/breadability/utils.py
@ -6,6 +6,9 @@ from __future__ import division, print_function, unicode_literals
 import re


+MULTIPLE_WHITESPACE_PATTERN = re.compile(r"\s+", re.UNICODE)
+
+
 def is_blank(text):
    """
    Returns ``True`` if string contains only whitespace characters
@ -18,7 +21,6 @@ def shrink_text(text):
    return normalize_whitespace(text.strip())


-MULTIPLE_WHITESPACE_PATTERN = re.compile(r"\s+", re.UNICODE)
 def normalize_whitespace(text):
    """
    Translates multiple whitespace into single space character.
--- a/readability/init.py
+++ b/readability/init.py
@ -1,7 +0,0 @@
-# -*- coding: utf8 -*-
-
-from __future__ import absolute_import
-from __future__ import division, print_function, unicode_literals
-
-
-__version__ = "0.1.11"
--- a/requirements.txt
+++ b/requirements.txt
@ -1,5 +1,8 @@
-docopt>=0.6.1,<0.7
 charade
-lxml
 coverage
+docopt>=0.6.1,<0.7
+lxml
 nose
+nose-selecttests
+pep8
+pylint
--- a/setup.cfg
+++ b/setup.cfg
@ -1,4 +1,4 @@
 [nosetests]
 with-coverage=1
-cover-package=readability
+cover-package=breadability
 cover-erase=1
--- a/setup.py
+++ b/setup.py
@ -2,8 +2,8 @@ import sys

 from os.path import abspath, dirname, join
 from setuptools import setup, find_packages
-from readability import __version__

+VERSION = "0.1.17"

 VERSION_SUFFIX = "%d.%d" % sys.version_info[:2]
 CURRENT_DIRECTORY = abspath(dirname(__file__))
@ -28,24 +28,34 @@ tests_require = [
 if sys.version_info < (2, 7):
    install_requires.append("unittest2")

+console_script_targets = [
+    "breadability = breadability.scripts.client:main",
+    "breadability-{0} = breadability.scripts.client:main",
+    "breadability_test = breadability.scripts.test_helper:main",
+    "breadability_test-{0} = breadability.scripts.test_helper:main",
+]
+console_script_targets = [
+    target.format(VERSION_SUFFIX) for target in console_script_targets
+]
+

 setup(
-    name="readability",
-    version=__version__,
+    name="breadability",
+    version=VERSION,
    description="Port of Readability HTML parser in Python",
    long_description=long_description,
    keywords=[
+        "bookie",
+        "breadability",
+        "content",
+        "HTML",
+        "parsing",
        "readability",
        "readable",
-        "parsing",
-        "HTML",
-        "content",
    ],
    author="Rick Harding",
    author_email="rharding@mitechie.com",
-    maintainer="Michal Belica",
-    maintainer_email="miso.belica@gmail.com",
-    url="https://github.com/miso-belica/readability.py",
+    url="https://github.com/bookieio/breadability",
    license="BSD",
    classifiers=[
        "Development Status :: 5 - Production/Stable",
@ -64,7 +74,6 @@ setup(
        "Topic :: Software Development :: Pre-processors",
        "Topic :: Text Processing :: Filters",
        "Topic :: Text Processing :: Markup :: HTML",
-
    ],
    packages=find_packages(),
    include_package_data=True,
@ -73,11 +82,6 @@ setup(
    tests_require=tests_require,
    test_suite="tests.run_tests.run",
    entry_points={
-        "console_scripts": [
-            "readability = readability.scripts.client:main",
-            "readability-%s = readability.scripts.client:main" % VERSION_SUFFIX,
-            "readability_test = readability.scripts.test_helper:main",
-            "readability_test-%s = readability.scripts.test_helper:main" % VERSION_SUFFIX,
-        ]
+        "console_scripts": console_script_targets,
    }
 )
--- a/tests/run_tests.py
+++ b/tests/run_tests.py
@ -12,7 +12,7 @@ from os.path import dirname, abspath
 DEFAULT_PARAMS = [
    "nosetests",
    "--with-coverage",
-    "--cover-package=readability",
+    "--cover-package=breadability",
    "--cover-erase",
 ]

--- a/tests/test_annotated_text.py
+++ b/tests/test_annotated_text.py
@ -1,11 +1,15 @@
 # -*- coding: utf8 -*-

-from __future__ import absolute_import
-from __future__ import division, print_function, unicode_literals
+from __future__ import (
+    absolute_import,
+    division,
+    print_function,
+    unicode_literals
+)

 from lxml.html import fragment_fromstring, document_fromstring
-from readability.readable import Article
-from readability.annotated_text import AnnotatedTextHandler
+from breadability.readable import Article
+from breadability.annotated_text import AnnotatedTextHandler
 from .compat import unittest
 from .utils import load_snippet, load_article

--- a/tests/test_articles/test_antipope_org/test.py
+++ b/tests/test_articles/test_antipope_org/test.py
@ -5,7 +5,7 @@ from __future__ import division, print_function, unicode_literals

 import os

-from readability.readable import Article
+from breadability.readable import Article
 from ...compat import unittest


--- a/tests/test_articles/test_businessinsider-com/init.py
+++ b/tests/test_articles/test_businessinsider-com/init.py
--- a/tests/test_articles/test_businessinsider-com/article.html
+++ b/tests/test_articles/test_businessinsider-com/article.html
--- a/tests/test_articles/test_businessinsider-com/test.py
+++ b/tests/test_articles/test_businessinsider-com/test.py
@ -0,0 +1,33 @@
+import os
+try:
+    # Python < 2.7
+    import unittest2 as unittest
+except ImportError:
+    import unittest
+
+from breadability.readable import Article
+
+
+class TestBusinessInsiderArticle(unittest.TestCase):
+    """Test the scoring and parsing of the Blog Post"""
+
+    def setUp(self):
+
+        """Load up the article for us"""
+        article_path = os.path.join(os.path.dirname(__file__), 'article.html')
+        self.article = open(article_path).read()
+
+    def tearDown(self):
+        """Drop the article"""
+        self.article = None
+
+    def test_parses(self):
+        """Verify we can parse the document."""
+        doc = Article(self.article)
+        self.assertTrue('id="readabilityBody"' in doc.readable)
+
+    def test_images_preserved(self):
+        """The div with the comments should be removed."""
+        doc = Article(self.article)
+        self.assertTrue('bharath-kumar-a-co-founder-at-pugmarksme-suggests-working-on-a-sunday-late-night.jpg' in doc.readable)
+        self.assertTrue('bryan-guido-hassin-a-university-professor-and-startup-junkie-uses-airplane-days.jpg' in doc.readable)
--- a/tests/test_articles/test_businessinsider_com/test.py
+++ b/tests/test_articles/test_businessinsider_com/test.py
@ -4,7 +4,7 @@ from __future__ import absolute_import
 from __future__ import division, print_function, unicode_literals

 from os.path import join, dirname
-from readability.readable import Article
+from breadability.readable import Article
 from ...compat import unittest


--- a/tests/test_articles/test_cz_zdrojak_tests/test.py
+++ b/tests/test_articles/test_cz_zdrojak_tests/test.py
@ -4,8 +4,8 @@ from __future__ import absolute_import
 from __future__ import division, print_function, unicode_literals

 from os.path import join, dirname
-from readability.readable import Article
-from readability._compat import unicode
+from breadability.readable import Article
+from breadability._compat import unicode
 from ...compat import unittest


--- a/tests/test_articles/test_scripting_com/test.py
+++ b/tests/test_articles/test_scripting_com/test.py
@ -1,14 +1,18 @@
 # -*- coding: utf8 -*-

-from __future__ import absolute_import
-from __future__ import division, print_function, unicode_literals
+from __future__ import (
+    absolute_import,
+    division,
+    print_function,
+    unicode_literals
+)

 import os

 from operator import attrgetter
-from readability.readable import Article
-from readability.readable import check_siblings
-from readability.readable import prep_article
+from breadability.readable import Article
+from breadability.readable import check_siblings
+from breadability.readable import prep_article
 from ...compat import unittest


@ -57,7 +61,8 @@ class TestArticle(unittest.TestCase):
        for node in doc._should_drop:
            self.assertFalse(node == found.node)

-        by_score = sorted([c for c in doc.candidates.values()],
+        by_score = sorted(
+            [c for c in doc.candidates.values()],
            key=attrgetter('content_score'), reverse=True)
        self.assertTrue(by_score[0].node == found.node)

--- a/tests/test_articles/test_sweetshark/test.py
+++ b/tests/test_articles/test_sweetshark/test.py
@ -4,11 +4,11 @@ from __future__ import absolute_import
 from __future__ import division, print_function, unicode_literals

 from os.path import join, dirname
-from readability.readable import Article
+from breadability.readable import Article
 from ...compat import unittest


-class TestArticle(unittest.TestCase):
+class TestSweetsharkBlog(unittest.TestCase):
    """
    Test the scoring and parsing of the article from URL below:
    http://sweetshark.livejournal.com/11564.html
--- a/tests/test_orig_document.py
+++ b/tests/test_orig_document.py
@ -4,9 +4,12 @@ from __future__ import absolute_import
 from __future__ import division, print_function, unicode_literals

 from collections import defaultdict
-from readability._compat import to_unicode, to_bytes
-from readability.document import (OriginalDocument, determine_encoding,
-    convert_breaks_to_paragraphs)
+from breadability._compat import to_unicode, to_bytes
+from breadability.document import (
+    convert_breaks_to_paragraphs,
+    determine_encoding,
+    OriginalDocument,
+)
 from .compat import unittest
 from .utils import load_snippet

@ -18,14 +21,16 @@ class TestOriginalDocument(unittest.TestCase):
        returned = convert_breaks_to_paragraphs(
            "<div>HI<br><br>How are you?<br><br> \t \n  <br>Fine\n I guess</div>")

-        self.assertEqual(returned,
+        self.assertEqual(
+            returned,
            "<div>HI</p><p>How are you?</p><p>Fine\n I guess</div>")

    def test_convert_hr_tags_to_paragraphs(self):
        returned = convert_breaks_to_paragraphs(
            "<div>HI<br><br>How are you?<hr/> \t \n  <br>Fine\n I guess</div>")

-        self.assertEqual(returned,
+        self.assertEqual(
+            returned,
            "<div>HI</p><p>How are you?</p><p>Fine\n I guess</div>")

    def test_readin_min_document(self):
@ -79,7 +84,7 @@ class TestOriginalDocument(unittest.TestCase):

    def test_encoding(self):
        text = "ľščťžýáíéäúňôůě".encode("iso-8859-2")
-        encoding = determine_encoding(text)
+        determine_encoding(text)

    def test_encoding_short(self):
        text = "ľščťžýáíé".encode("iso-8859-2")
--- a/tests/test_readable.py
+++ b/tests/test_readable.py
@ -6,14 +6,16 @@ from __future__ import division, print_function, unicode_literals
 from lxml.etree import tounicode
 from lxml.html import document_fromstring
 from lxml.html import fragment_fromstring
-from readability._compat import to_unicode
-from readability.readable import Article
-from readability.readable import get_class_weight
-from readability.readable import get_link_density
-from readability.readable import is_bad_link
-from readability.readable import score_candidates
-from readability.readable import leaf_div_elements_into_paragraphs
-from readability.scoring import ScoredNode
+from breadability._compat import to_unicode
+from breadability.readable import (
+    Article,
+    get_class_weight,
+    get_link_density,
+    is_bad_link,
+    leaf_div_elements_into_paragraphs,
+    score_candidates,
+)
+from breadability.scoring import ScoredNode
 from .compat import unittest
 from .utils import load_snippet, load_article

@ -27,6 +29,14 @@ class TestReadableDocument(unittest.TestCase):
        # We get back the document as a div tag currently by default.
        self.assertEqual(doc.readable_dom.tag, 'div')

+    def test_title_loads(self):
+        """Verify we can fetch the title of the parsed article"""
+        doc = Article(load_snippet('document_min.html'))
+        self.assertEqual(
+            doc._original_document.title,
+            'Min Document Title'
+        )
+
    def test_doc_no_scripts_styles(self):
        """Step #1 remove all scripts from the document"""
        doc = Article(load_snippet('document_scripts.html'))
@ -80,10 +90,11 @@ class TestCleaning(unittest.TestCase):
        """Verify we wipe out things from our unlikely list."""
        doc = Article(load_snippet('test_readable_unlikely.html'))
        readable = doc.readable_dom
-        must_not_appear = ['comment', 'community', 'disqus', 'extra', 'foot',
-                'header', 'menu', 'remark', 'rss', 'shoutbox', 'sidebar',
-                'sponsor', 'ad-break', 'agegate', 'pagination' '', 'pager',
-                'popup', 'tweet', 'twitter', 'imgBlogpostPermalink']
+        must_not_appear = [
+            'comment', 'community', 'disqus', 'extra', 'foot',
+            'header', 'menu', 'remark', 'rss', 'shoutbox', 'sidebar',
+            'sponsor', 'ad-break', 'agegate', 'pagination' '', 'pager',
+            'popup', 'tweet', 'twitter', 'imgBlogpostPermalink']

        want_to_appear = ['and', 'article', 'body', 'column', 'main', 'shadow']

@ -128,17 +139,24 @@ class TestCleaning(unittest.TestCase):
        self.assertEqual(
            tounicode(
                leaf_div_elements_into_paragraphs(test_doc2)),
-                to_unicode('<html><body><p>simple<a href="">link</a></p></body></html>')
+            to_unicode(
+                '<html><body><p>simple<a href="">link</a></p></body></html>')
        )

    def test_dont_transform_div_with_div(self):
        """Verify that only child <div> element is replaced by <p>."""
        dom = document_fromstring(
-            "<html><body><div>text<div>child</div>aftertext</div></body></html>")
+            "<html><body><div>text<div>child</div>"
+            "aftertext</div></body></html>"
+        )

        self.assertEqual(
-            tounicode(leaf_div_elements_into_paragraphs(dom)),
-            to_unicode("<html><body><div>text<p>child</p>aftertext</div></body></html>")
+            tounicode(
+                leaf_div_elements_into_paragraphs(dom)),
+            to_unicode(
+                "<html><body><div>text<p>child</p>"
+                "aftertext</div></body></html>"
+            )
        )

    def test_bad_links(self):
--- a/tests/test_scoring.py
+++ b/tests/test_scoring.py
@ -8,14 +8,18 @@ import re
 from operator import attrgetter
 from lxml.html import document_fromstring
 from lxml.html import fragment_fromstring
-from readability.readable import Article
-from readability.scoring import check_node_attributes
-from readability.scoring import get_class_weight
-from readability.scoring import ScoredNode
-from readability.scoring import score_candidates
-from readability.scoring import generate_hash_id
-from readability.readable import get_link_density
-from readability.readable import is_unlikely_node
+from breadability.readable import Article
+from breadability.scoring import (
+    check_node_attributes,
+    generate_hash_id,
+    get_class_weight,
+    score_candidates,
+    ScoredNode,
+)
+from breadability.readable import (
+    get_link_density,
+    is_unlikely_node,
+)
 from .compat import unittest
 from .utils import load_snippet

@ -60,7 +64,8 @@ class TestCheckNodeAttr(unittest.TestCase):
        test_node = fragment_fromstring('<div/>')
        test_node.set('class', 'test2 comment')

-        self.assertTrue(check_node_attributes(test_pattern, test_node, 'class'))
+        self.assertTrue(
+            check_node_attributes(test_pattern, test_node, 'class'))

    def test_has_id(self):
        """Verify that a node has an id in our set."""
@ -75,7 +80,8 @@ class TestCheckNodeAttr(unittest.TestCase):
        test_pattern = re.compile('test1|test2', re.I)
        test_node = fragment_fromstring('<div/>')
        test_node.set('class', 'test4 comment')
-        self.assertFalse(check_node_attributes(test_pattern, test_node, 'class'))
+        self.assertFalse(
+            check_node_attributes(test_pattern, test_node, 'class'))

    def test_lacks_id(self):
        """Verify that a node does not have an id in our set."""
@ -266,7 +272,8 @@ class TestScoreCandidates(unittest.TestCase):
        div_nodes = dom.findall(".//div")

        candidates = score_candidates(div_nodes)
-        ordered = sorted((c for c in candidates.values()), reverse=True,
+        ordered = sorted(
+            (c for c in candidates.values()), reverse=True,
            key=attrgetter("content_score"))

        self.assertEqual(ordered[0].node.tag, "div")