Commit Graph

12 Commits

Author SHA1 Message Date
Éloi Rivard
e9acdd091b Use black to format the code 2020-01-30 17:32:43 +01:00
Éloi Rivard
0846955dd7 Fixed issue with self-closing tags. Fix #125 2019-12-29 17:13:15 +01:00
Yuri Baburov
494b19ed4e
Merge branch 'master' into many_repeated_spaces_timeout 2018-09-26 22:10:14 +07:00
Linas Valiukas
63fbc36cb8 Close sample input file after reading it
Otherwise tests spit out:

    ResourceWarning: unclosed file <_io.TextIOWrapper name='/Users/pypt/Dropbox/etc-MediaCloud/python-readability/tests/samples/si-game.sample.html' mode='r' encoding='UTF-8'>
    return open(os.path.join(SAMPLES, filename)).read()
2018-09-26 08:53:27 +03:00
Linas Valiukas
747c46abce Trim many repeated spaces to make clean() faster
When Readability encounters many repeated whitespace, the cleanup
regexes in clean() take forever to run, so trim the amount of whitespace
to 255 characters.

Additionally, test the extracting performance with "timeout_decorator".
2018-09-26 08:26:08 +03:00
Yuri Baburov
0e50b53d05 Release version 0.7 . Better HTML5 support and an important bugfix. 2018-05-07 17:53:53 +07:00
Mariusz Osiecki
bf9e7404fa Failure if best_elem is root (fix #58) 2015-05-06 09:34:55 +02:00
Yuri Baburov
dc648e7d0b Added a test for issue #48 but can't reproduce it -- seems to work fine. 2015-04-27 15:59:18 +06:00
Yuri Baburov
1fac7e685a Added a feature to allow more images per article (with a test) 2015-04-27 14:35:00 +06:00
Richard Harding
46f0302ebc rename the document_only flag to html_partial 2012-04-17 10:17:14 -04:00
Richard Harding
5a98e2c1b8 Correct appending and allow for document only
- Fix the appending of siblings to the correct nested element
- Add a document only flag so that you can get a dom tree you can nest
yourself without html/body tags.
2012-04-16 20:55:13 -04:00
Richard Harding
edccec5d3b Work on why we have an empty <body/> tag
- Seems to come because the sanitizer ends up with two nodes, not one. The
first is an empty body, the second is the article div.
- Fix up the tabs so we can work with the file. Needs lots of pep8 love.
- Implement an initial hack that at least gets it working atm.
- Start to add test cases, sample html files we can test against, etc.
2012-04-16 17:13:24 -04:00