Yuri Baburov
154658798b
Merge pull request #64 from martinth/master
...
Added python 3 support (Supported: python 2.6, 2.7, 3.3, 3.4).
Thanks a lot to @martinth
9 years ago
Yuri Baburov
83a7ce67c1
Merge pull request #68 from digitaldavenyc/python3
...
fix for setup, convert print to python 3 compatible format
9 years ago
Dave Padovano
1ac3e019bd
fix for setup, convert print to python 3 compatible format
9 years ago
Yuri Baburov
1aabdb3d27
Merge pull request #67 from horva/fix-logging-config
...
Move logging.basicConfig to main function
9 years ago
Marko Horvatic
f0ff9b2425
Move logging.basicConfig to main function
9 years ago
Yuri Baburov
e2bc1ea055
Improved #65 which has given warning, added cssselect lib, bumped to 0.5.1
10 years ago
Yuri Baburov
1cb17d919b
Merge pull request #65 from avalanchy/best_elem_is_root
...
Failure if best_elem is root (fix #58 )
Thanks a lot @avalanchy and @jnothman !
10 years ago
Mariusz Osiecki
bf9e7404fa
Failure if best_elem is root ( fix #58 )
10 years ago
Martin Thurau
386e48d29b
Fixes checking of declared encodings in get_encoding.
...
In PYthon 3 .decode() on bytes requires the name of the encoding to be a str type which means we have to convert the extracted encoding before we can use it.
10 years ago
Martin Thurau
046d2c10c3
Fixes regex declaration in get_encoding.
...
Since get_encoding() is only called when the input is *not* already unicode we need to declare the regexs as byte type so they continue to work in Python 3.
10 years ago
Martin Thurau
ce7ca26835
Adds compatibility `raise_with_traceback` method to support different `raise` syntax
...
Unfortunately the Python 2 `raise` syntax is not supported in Python 3.3 and not all 3.4.x versions so we deal with that by using conditional imports and a compatibility layer.
10 years ago
Martin Thurau
3ac56329e2
Corrects some things were 2to3 did to much.
10 years ago
Martin Thurau
aa4132f57a
Adds Python 3.4 support.
...
Code now supports Python 2.6, 2.7 and 3.4. PYthon 3.3 isn't support
because of some issues with the parser and the difference between old and
new `raise` syntax.
10 years ago
Martin Thurau
13cca1dd19
Adds tox configuration.
...
Adds tox.ini to support running the tests on multiple versions. Adds
requirements.txt to support dependency installtion via pip.
10 years ago
Yuri Baburov
1d4ee9d421
Releasing as version 0.5
10 years ago
Yuri Baburov
987570bef0
Updated package links for Python 2.7 and Python 3 support
10 years ago
Yuri Baburov
dc648e7d0b
Added a test for issue #48 but can't reproduce it -- seems to work fine.
10 years ago
Yuri Baburov
c715426584
Releasing as version 0.4
10 years ago
Yuri Baburov
1fac7e685a
Added a feature to allow more images per article (with a test)
10 years ago
Yuri Baburov
c6796195a7
Fixed makefile testing.
10 years ago
Miguel Galves
d04d41b749
Insert text inside iframe for correct output
10 years ago
Miguel Galves
be2a1c4646
Let width and height attributes
10 years ago
Miguel Galves
f1759c1404
Allows iframes containing youtube or vimeo videos. People like them
10 years ago
Yuri Baburov
332ad810de
Bumped to 0.3.0.6
10 years ago
Yuri Baburov
e4bcbe57d7
Fixes #53
10 years ago
Yuri Baburov
aeb4f4c782
Merge pull request #59 from seomoz/mac_10_10
...
Fix mac version comparison in setup.py for 10.10
10 years ago
Matthew Peters
c8c2f8809c
Fix mac version comparison in setup.py for 10.10
10 years ago
Yuri Baburov
2d4cfdb2c8
Merge pull request #56 from nathanathan/patch-1
...
Defaulting to utf-8 when chardet returns None
10 years ago
Nathan Breit
75e2e0cb3a
Defaulting to utf-8 when chardet returns None
...
On articles like this one chardet returns None:
http://news.zing.vn/nhip-song-tre/thay-giao-gay-sot-tung-bo-luat-tinh-yeu/a291427.html
This causes exceptions later on when encoding.lower() is called
10 years ago
Yuri Baburov
0c2f29ed0d
Version bump.
10 years ago
Yuri Baburov
638f73f6a2
Fix for #52 : <input type="hidden"> are not counted any more for "form removal" heuristic.
10 years ago
Yuri Baburov
2fab5ffa6b
Merge pull request #48 from mperdomo1/master
...
Added code to check declared encodings first
11 years ago
Mark Perdomo
3a43a3fe7e
Added code to check declared encodings first and check them
...
from kennethreitz/requests/utils.py. Also I added some superset
encodings I have found in Chinese pages that are mishandled by
chardet/character declarations.
11 years ago
Yuri Baburov
1a4d3697bc
Allow latest lxml on Mac OS X 10.9, see issue #39 for comments and setup instructions
11 years ago
Yuri Baburov
d8595b7103
Quickfix for #41
11 years ago
Yuri Baburov
318f25c577
Minor fix in encoding guessing. Claiming it v0.3.0.1
11 years ago
Yuri Baburov
08658d1d31
Released v 0.3, and uploaded to the pypi.
11 years ago
Yuri Baburov
4e3192f5ab
Merge pull request #29 from hush-hush/master
...
Make lxml clean tree available for user modifications
12 years ago
hush-hush
e2e78e4d55
Make lxml clean tree available for user modifications.
12 years ago
Yuri Baburov
c923995606
Merge pull request #27 from sunlightlabs/master
...
Simple guard for empty title elements. Thanks, dvogel!
12 years ago
Drew Vogel
fdba8d9e11
Added check on title.text to avoid a TypeError on None.
12 years ago
Yuri Baburov
9cd5fb6226
Bump to 0.2.6.1
12 years ago
Yuri Baburov
44915518d3
Merge pull request #24 from zacharydenton/master
...
Fix issue 22: all titles were blank.
12 years ago
Zach Denton
0843d9cdf2
Explicitly check if title is None. fixes #22
...
This fixes #22 which caused all titles to be blank.
12 years ago
Yuri Baburov
8aefc6175f
Updated README with 0.2.6 changes.
12 years ago
Yuri Baburov
20d5f3a73a
Bump to 0.2.6
12 years ago
Yuri Baburov
2e49e34e11
Merge pull request #20 from andreypopp/master
...
readability.htmls: some docs do not have title elem
13 years ago
Andrey Popp
95852d5c18
readability.htmls: some docs do not have title elem
13 years ago
Yuri Baburov
274b60cdb1
Merge pull request #19 from EvaSDK/master
...
Package that provides source code
13 years ago
Gilles Dartiguelongue
ea6afd3d49
Make sure code is actually distributed
13 years ago