Yuri Baburov
|
21906f1c44
|
Better setup.py, now we're "readability-lxml" in pypi. Thanks to Jerry Charumilind.
|
13 years ago |
Yuri Baburov
|
c2ec1d1c38
|
Sorted out unicode issues, thanks to Lee Semel.
|
13 years ago |
Yuri Baburov
|
45781a600f
|
Added command-line usage
|
13 years ago |
Yuri Baburov
|
97ba2a0369
|
Debug utilities.
|
13 years ago |
Lee Semel
|
f3d0a8d842
|
Allow passing unicode objects
|
13 years ago |
Jerry Charumilind
|
ad38fac40a
|
Add chardet to installation requirements
|
13 years ago |
Jerry Charumilind
|
8c1adc5141
|
Expose Document in readability package
|
13 years ago |
Jerry Charumilind
|
bae87079e9
|
Change to automatically find packages
|
13 years ago |
Jerry Charumilind
|
5bf5192d03
|
Add version number to track changes more easily
|
13 years ago |
Yuri Baburov
|
7a1e063c22
|
Updated setup.py to my fork, changed package name to lxml-readability
|
13 years ago |
Yuri Baburov
|
43c34bacc1
|
Renamed encodings to encoding to avoid conflicts with system module.
|
14 years ago |
Yuri Baburov
|
096d4db6ce
|
Added usage
|
14 years ago |
Yuri Baburov
|
f55f16baa1
|
Updated scoring algorithm to match readability.js v1.7.1
|
14 years ago |
Yuri Baburov
|
96f476181c
|
Improved title shortener method, and added it to the Document class.
|
14 years ago |
Yuri Baburov
|
f925e3ef05
|
Corrected README
|
14 years ago |
Yuri Baburov
|
dada82099b
|
Moved to lxml (based on decruft version); better encoding recognition.
|
14 years ago |
gfxmonk
|
b5639a0822
|
well that was quick; first fork added
|
14 years ago |
gfxmonk
|
324e280e16
|
added note to readme to make it clear that I'm not actively working on this library
|
14 years ago |
Tim Cuthbertson
|
7ebbcc03d2
|
made setup.py executable
|
14 years ago |
Sean Brant
|
a5d47a1129
|
added setup.py
|
14 years ago |
gfxmonk
|
2b6a2d3db4
|
removing empty paragraphs is not very useful, and can break some (stupid) websites
|
15 years ago |
gfxmonk
|
1d862a00c3
|
fixed bug where only immediate text was being considered for weights, instead of all nested text
|
15 years ago |
gfxmonk
|
0eacd959a4
|
failsafe parsing and more logging
|
15 years ago |
gfxmonk
|
87ad057706
|
unicode, dammit!
|
15 years ago |
gfxmonk
|
a224c5b759
|
minor
|
15 years ago |
gfxmonk
|
e42a39e1aa
|
modified readme
|
15 years ago |
gfxmonk
|
f73b5f05c4
|
split out into content and summary methods
|
15 years ago |
gfxmonk
|
c952f421b7
|
clean up content method and debug
|
15 years ago |
gfxmonk
|
c0ca60ee26
|
use a more leniant parser
|
15 years ago |
gfxmonk
|
ad3d52ade4
|
initial
|
15 years ago |