Commit Graph

9 Commits

Author SHA1 Message Date
gfxmonk
2b6a2d3db4 removing empty paragraphs is not very useful, and can break some (stupid) websites 2010-05-01 00:08:23 +10:00
gfxmonk
1d862a00c3 fixed bug where only immediate text was being considered for weights, instead of all nested text 2010-05-01 00:07:30 +10:00
gfxmonk
0eacd959a4 failsafe parsing and more logging 2010-04-30 22:34:53 +10:00
gfxmonk
87ad057706 unicode, dammit! 2010-04-26 23:22:54 +10:00
gfxmonk
a224c5b759 minor 2010-04-24 14:24:09 +10:00
gfxmonk
f73b5f05c4 split out into content and summary methods 2010-04-24 00:41:09 +10:00
gfxmonk
c952f421b7 clean up content method and debug 2010-04-23 23:28:51 +10:00
gfxmonk
c0ca60ee26 use a more leniant parser 2010-04-23 20:51:56 +10:00
gfxmonk
ad3d52ade4 initial 2010-04-22 21:55:00 +10:00