Yuri Baburov
|
f55f16baa1
|
Updated scoring algorithm to match readability.js v1.7.1
|
14 years ago |
Yuri Baburov
|
96f476181c
|
Improved title shortener method, and added it to the Document class.
|
14 years ago |
Yuri Baburov
|
dada82099b
|
Moved to lxml (based on decruft version); better encoding recognition.
|
14 years ago |
gfxmonk
|
2b6a2d3db4
|
removing empty paragraphs is not very useful, and can break some (stupid) websites
|
15 years ago |
gfxmonk
|
1d862a00c3
|
fixed bug where only immediate text was being considered for weights, instead of all nested text
|
15 years ago |
gfxmonk
|
0eacd959a4
|
failsafe parsing and more logging
|
15 years ago |
gfxmonk
|
87ad057706
|
unicode, dammit!
|
15 years ago |
gfxmonk
|
a224c5b759
|
minor
|
15 years ago |
gfxmonk
|
f73b5f05c4
|
split out into content and summary methods
|
15 years ago |
gfxmonk
|
c952f421b7
|
clean up content method and debug
|
15 years ago |
gfxmonk
|
c0ca60ee26
|
use a more leniant parser
|
15 years ago |
gfxmonk
|
ad3d52ade4
|
initial
|
15 years ago |