Yuri Baburov
|
096d4db6ce
|
Added usage
|
2011-06-14 04:33:15 -07:00 |
|
Yuri Baburov
|
f55f16baa1
|
Updated scoring algorithm to match readability.js v1.7.1
|
2011-06-01 12:16:32 +07:00 |
|
Yuri Baburov
|
96f476181c
|
Improved title shortener method, and added it to the Document class.
|
2011-05-11 19:58:27 +07:00 |
|
Yuri Baburov
|
f925e3ef05
|
Corrected README
|
2011-05-02 21:45:23 -07:00 |
|
Yuri Baburov
|
dada82099b
|
Moved to lxml (based on decruft version); better encoding recognition.
|
2011-05-03 11:34:29 +07:00 |
|
gfxmonk
|
b5639a0822
|
well that was quick; first fork added
|
2011-01-20 23:03:30 +11:00 |
|
gfxmonk
|
324e280e16
|
added note to readme to make it clear that I'm not actively working on this library
|
2011-01-20 22:28:01 +11:00 |
|
Tim Cuthbertson
|
7ebbcc03d2
|
made setup.py executable
|
2010-09-16 22:01:13 +10:00 |
|
Sean Brant
|
a5d47a1129
|
added setup.py
|
2010-09-14 19:18:35 -05:00 |
|
gfxmonk
|
2b6a2d3db4
|
removing empty paragraphs is not very useful, and can break some (stupid) websites
|
2010-05-01 00:08:23 +10:00 |
|
gfxmonk
|
1d862a00c3
|
fixed bug where only immediate text was being considered for weights, instead of all nested text
|
2010-05-01 00:07:30 +10:00 |
|
gfxmonk
|
0eacd959a4
|
failsafe parsing and more logging
|
2010-04-30 22:34:53 +10:00 |
|
gfxmonk
|
87ad057706
|
unicode, dammit!
|
2010-04-26 23:22:54 +10:00 |
|
gfxmonk
|
a224c5b759
|
minor
|
2010-04-24 14:24:09 +10:00 |
|
gfxmonk
|
e42a39e1aa
|
modified readme
|
2010-04-24 13:47:35 +10:00 |
|
gfxmonk
|
f73b5f05c4
|
split out into content and summary methods
|
2010-04-24 00:41:09 +10:00 |
|
gfxmonk
|
c952f421b7
|
clean up content method and debug
|
2010-04-23 23:28:51 +10:00 |
|
gfxmonk
|
c0ca60ee26
|
use a more leniant parser
|
2010-04-23 20:51:56 +10:00 |
|
gfxmonk
|
ad3d52ade4
|
initial
|
2010-04-22 21:55:00 +10:00 |
|