breadability - another readability Python (v2.6-v3.3) port ========================================================== .. image:: https://api.travis-ci.org/bookieio/breadability.png?branch=master :target: https://travis-ci.org/bookieio/breadability.py I've tried to work with the various forks of some ancient codebase that ported `readability`_ to Python. The lack of tests, unused regex's, and commented out sections of code in other Python ports just drove me nuts. I put forth an effort to bring in several of the better forks into one code base, but they've diverged so much that I just can't work with it. So what's any sane person to do? Re-port it with my own repo, add some tests, infrastructure, and try to make this port better. OSS FTW (and yea, NIH FML, but oh well I did try) This is a pretty straight port of the JS here: - http://code.google.com/p/arc90labs-readability/source/browse/trunk/js/readability.js#82 - http://www.minvolai.com/blog/decruft-arc90s-readability-in-python/ Alternatives ------------ - https://github.com/codelucas/newspaper - https://github.com/grangier/python-goose - https://github.com/aidanf/BTE - http://www.unixuser.org/~euske/python/webstemmer/#extract - https://github.com/al3xandru/readability.py - https://github.com/rcarmo/soup-strainer - https://github.com/bcampbell/decruft - https://github.com/gfxmonk/python-readability - https://github.com/srid/readability - https://github.com/dcramer/decruft - https://github.com/reorx/readability - https://github.com/mote/python-readability - https://github.com/predatell/python-readability-lxml - https://github.com/Harshavardhana/boilerpipy - https://github.com/raptium/hitomi - https://github.com/kingwkb/readability Installation ------------ This does depend on lxml so you'll need some C headers in order to install things from pip so that it can compile. .. code-block:: bash $ [sudo] apt-get install libxml2-dev libxslt-dev $ [sudo] pip install git+git://github.com/bookieio/breadability.git Tests ----- .. code-block:: bash $ pytest tests Usage ----- Command line ~~~~~~~~~~~~ .. code-block:: bash $ breadability http://wiki.python.org/moin/BeginnersGuide Options ``````` - **b** will write out the parsed content to a temp file and open it in a browser for viewing. - **d** will write out debug scoring statements to help track why a node was chosen as the document and why some nodes were removed from the final product. - **f** will override the default behaviour of getting an html fragment (