python-readability

Commit Graph

Author	SHA1	Message	Date
Richard Harding	e9a5cbfe7f	Remove pdb dummy	13 years ago
Richard Harding	f1a79fb8f8	Update to make sure we don't drop the html tag when ditching elements	13 years ago
Richard Harding	46f0302ebc	rename the document_only flag to html_partial	13 years ago
Richard Harding	a46dc14251	Try to pep8 all the things but give up when I got close.	13 years ago
Richard Harding	5a98e2c1b8	Correct appending and allow for document only - Fix the appending of siblings to the correct nested element - Add a document only flag so that you can get a dom tree you can nest yourself without html/body tags.	13 years ago
Richard Harding	edccec5d3b	Work on why we have an empty <body/> tag - Seems to come because the sanitizer ends up with two nodes, not one. The first is an empty body, the second is the article div. - Fix up the tabs so we can work with the file. Needs lots of pep8 love. - Implement an initial hack that at least gets it working atm. - Start to add test cases, sample html files we can test against, etc.	13 years ago
Jan Weiß	3cdc3d67af	Adding comment about oversight in transform_misused_divs_into_paragraphs().	13 years ago
Jan Weiß	960f885edf	Continue early in remove_unlikely_candidates() in case there is neither a class nor an id attribute.	13 years ago
Jan Weiß	6b3961cd30	Fixing gap in node_length coverage.	13 years ago
facundo	bb93ae1e5f	fixed a small issue on the Document score_paragraphs method	13 years ago
Yuri Baburov	11c4d95411	Fixed indentation, encoding issue and README bug. Thanks to Greg Jastrab. Bump version to 0.2.3	13 years ago
Yuri Baburov	61715dca0a	Bump to version 0.2	13 years ago
Yuri Baburov	c2ec1d1c38	Sorted out unicode issues, thanks to Lee Semel.	13 years ago
Yuri Baburov	97ba2a0369	Debug utilities.	13 years ago
Lee Semel	f3d0a8d842	Allow passing unicode objects	13 years ago
Jerry Charumilind	8c1adc5141	Expose Document in readability package	13 years ago
Yuri Baburov	43c34bacc1	Renamed encodings to encoding to avoid conflicts with system module.	14 years ago
Yuri Baburov	f55f16baa1	Updated scoring algorithm to match readability.js v1.7.1	14 years ago
Yuri Baburov	96f476181c	Improved title shortener method, and added it to the Document class.	14 years ago
Yuri Baburov	dada82099b	Moved to lxml (based on decruft version); better encoding recognition.	14 years ago
gfxmonk	2b6a2d3db4	removing empty paragraphs is not very useful, and can break some (stupid) websites	15 years ago
gfxmonk	1d862a00c3	fixed bug where only immediate text was being considered for weights, instead of all nested text	15 years ago
gfxmonk	0eacd959a4	failsafe parsing and more logging	15 years ago
gfxmonk	87ad057706	unicode, dammit!	15 years ago
gfxmonk	a224c5b759	minor	15 years ago
gfxmonk	f73b5f05c4	split out into content and summary methods	15 years ago
gfxmonk	c952f421b7	clean up content method and debug	15 years ago
gfxmonk	c0ca60ee26	use a more leniant parser	15 years ago
gfxmonk	ad3d52ade4	initial	15 years ago

29 Commits (274b60cdb11a22ea13715f873578fb983e3a7a23)