gfxmonk
|
2b6a2d3db4
|
removing empty paragraphs is not very useful, and can break some (stupid) websites
|
2010-05-01 00:08:23 +10:00 |
|
gfxmonk
|
1d862a00c3
|
fixed bug where only immediate text was being considered for weights, instead of all nested text
|
2010-05-01 00:07:30 +10:00 |
|
gfxmonk
|
0eacd959a4
|
failsafe parsing and more logging
|
2010-04-30 22:34:53 +10:00 |
|
gfxmonk
|
87ad057706
|
unicode, dammit!
|
2010-04-26 23:22:54 +10:00 |
|
gfxmonk
|
a224c5b759
|
minor
|
2010-04-24 14:24:09 +10:00 |
|
gfxmonk
|
f73b5f05c4
|
split out into content and summary methods
|
2010-04-24 00:41:09 +10:00 |
|
gfxmonk
|
c952f421b7
|
clean up content method and debug
|
2010-04-23 23:28:51 +10:00 |
|
gfxmonk
|
c0ca60ee26
|
use a more leniant parser
|
2010-04-23 20:51:56 +10:00 |
|
gfxmonk
|
ad3d52ade4
|
initial
|
2010-04-22 21:55:00 +10:00 |
|