Commit Graph

122 Commits

Author SHA1 Message Date
Jerry Charumilind
5bf5192d03 Add version number to track changes more easily 2011-06-30 12:17:07 +08:00
Yuri Baburov
7a1e063c22 Updated setup.py to my fork, changed package name to lxml-readability 2011-06-25 23:14:01 -07:00
Yuri Baburov
43c34bacc1 Renamed encodings to encoding to avoid conflicts with system module. 2011-06-16 17:53:02 +07:00
Yuri Baburov
096d4db6ce Added usage 2011-06-14 04:33:15 -07:00
Yuri Baburov
f55f16baa1 Updated scoring algorithm to match readability.js v1.7.1 2011-06-01 12:16:32 +07:00
Yuri Baburov
96f476181c Improved title shortener method, and added it to the Document class. 2011-05-11 19:58:27 +07:00
Yuri Baburov
f925e3ef05 Corrected README 2011-05-02 21:45:23 -07:00
Yuri Baburov
dada82099b Moved to lxml (based on decruft version); better encoding recognition. 2011-05-03 11:34:29 +07:00
gfxmonk
b5639a0822 well that was quick; first fork added 2011-01-20 23:03:30 +11:00
gfxmonk
324e280e16 added note to readme to make it clear that I'm not actively working on this library 2011-01-20 22:28:01 +11:00
Tim Cuthbertson
7ebbcc03d2 made setup.py executable 2010-09-16 22:01:13 +10:00
Sean Brant
a5d47a1129 added setup.py 2010-09-14 19:18:35 -05:00
gfxmonk
2b6a2d3db4 removing empty paragraphs is not very useful, and can break some (stupid) websites 2010-05-01 00:08:23 +10:00
gfxmonk
1d862a00c3 fixed bug where only immediate text was being considered for weights, instead of all nested text 2010-05-01 00:07:30 +10:00
gfxmonk
0eacd959a4 failsafe parsing and more logging 2010-04-30 22:34:53 +10:00
gfxmonk
87ad057706 unicode, dammit! 2010-04-26 23:22:54 +10:00
gfxmonk
a224c5b759 minor 2010-04-24 14:24:09 +10:00
gfxmonk
e42a39e1aa modified readme 2010-04-24 13:47:35 +10:00
gfxmonk
f73b5f05c4 split out into content and summary methods 2010-04-24 00:41:09 +10:00
gfxmonk
c952f421b7 clean up content method and debug 2010-04-23 23:28:51 +10:00
gfxmonk
c0ca60ee26 use a more leniant parser 2010-04-23 20:51:56 +10:00
gfxmonk
ad3d52ade4 initial 2010-04-22 21:55:00 +10:00