Yuri Baburov
|
96f476181c
|
Improved title shortener method, and added it to the Document class.
|
14 years ago |
Yuri Baburov
|
f925e3ef05
|
Corrected README
|
14 years ago |
Yuri Baburov
|
dada82099b
|
Moved to lxml (based on decruft version); better encoding recognition.
|
14 years ago |
gfxmonk
|
b5639a0822
|
well that was quick; first fork added
|
14 years ago |
gfxmonk
|
324e280e16
|
added note to readme to make it clear that I'm not actively working on this library
|
14 years ago |
Tim Cuthbertson
|
7ebbcc03d2
|
made setup.py executable
|
14 years ago |
Sean Brant
|
a5d47a1129
|
added setup.py
|
14 years ago |
gfxmonk
|
2b6a2d3db4
|
removing empty paragraphs is not very useful, and can break some (stupid) websites
|
15 years ago |
gfxmonk
|
1d862a00c3
|
fixed bug where only immediate text was being considered for weights, instead of all nested text
|
15 years ago |
gfxmonk
|
0eacd959a4
|
failsafe parsing and more logging
|
15 years ago |
gfxmonk
|
87ad057706
|
unicode, dammit!
|
15 years ago |
gfxmonk
|
a224c5b759
|
minor
|
15 years ago |
gfxmonk
|
e42a39e1aa
|
modified readme
|
15 years ago |
gfxmonk
|
f73b5f05c4
|
split out into content and summary methods
|
15 years ago |
gfxmonk
|
c952f421b7
|
clean up content method and debug
|
15 years ago |
gfxmonk
|
c0ca60ee26
|
use a more leniant parser
|
15 years ago |
gfxmonk
|
ad3d52ade4
|
initial
|
15 years ago |