Yuri Baburov
1fac7e685a
Added a feature to allow more images per article (with a test)
9 years ago
Yuri Baburov
c6796195a7
Fixed makefile testing.
9 years ago
Miguel Galves
d04d41b749
Insert text inside iframe for correct output
9 years ago
Miguel Galves
be2a1c4646
Let width and height attributes
9 years ago
Miguel Galves
f1759c1404
Allows iframes containing youtube or vimeo videos. People like them
9 years ago
Yuri Baburov
332ad810de
Bumped to 0.3.0.6
9 years ago
Yuri Baburov
e4bcbe57d7
Fixes #53
9 years ago
Yuri Baburov
aeb4f4c782
Merge pull request #59 from seomoz/mac_10_10
...
Fix mac version comparison in setup.py for 10.10
10 years ago
Matthew Peters
c8c2f8809c
Fix mac version comparison in setup.py for 10.10
10 years ago
Yuri Baburov
2d4cfdb2c8
Merge pull request #56 from nathanathan/patch-1
...
Defaulting to utf-8 when chardet returns None
10 years ago
Nathan Breit
75e2e0cb3a
Defaulting to utf-8 when chardet returns None
...
On articles like this one chardet returns None:
http://news.zing.vn/nhip-song-tre/thay-giao-gay-sot-tung-bo-luat-tinh-yeu/a291427.html
This causes exceptions later on when encoding.lower() is called
10 years ago
Yuri Baburov
0c2f29ed0d
Version bump.
10 years ago
Yuri Baburov
638f73f6a2
Fix for #52 : <input type="hidden"> are not counted any more for "form removal" heuristic.
10 years ago
Yuri Baburov
2fab5ffa6b
Merge pull request #48 from mperdomo1/master
...
Added code to check declared encodings first
10 years ago
Mark Perdomo
3a43a3fe7e
Added code to check declared encodings first and check them
...
from kennethreitz/requests/utils.py. Also I added some superset
encodings I have found in Chinese pages that are mishandled by
chardet/character declarations.
10 years ago
Yuri Baburov
1a4d3697bc
Allow latest lxml on Mac OS X 10.9, see issue #39 for comments and setup instructions
10 years ago
Yuri Baburov
d8595b7103
Quickfix for #41
11 years ago
Yuri Baburov
318f25c577
Minor fix in encoding guessing. Claiming it v0.3.0.1
11 years ago
Yuri Baburov
08658d1d31
Released v 0.3, and uploaded to the pypi.
11 years ago
Yuri Baburov
4e3192f5ab
Merge pull request #29 from hush-hush/master
...
Make lxml clean tree available for user modifications
12 years ago
hush-hush
e2e78e4d55
Make lxml clean tree available for user modifications.
12 years ago
Yuri Baburov
c923995606
Merge pull request #27 from sunlightlabs/master
...
Simple guard for empty title elements. Thanks, dvogel!
12 years ago
Drew Vogel
fdba8d9e11
Added check on title.text to avoid a TypeError on None.
12 years ago
Yuri Baburov
9cd5fb6226
Bump to 0.2.6.1
12 years ago
Yuri Baburov
44915518d3
Merge pull request #24 from zacharydenton/master
...
Fix issue 22: all titles were blank.
12 years ago
Zach Denton
0843d9cdf2
Explicitly check if title is None. fixes #22
...
This fixes #22 which caused all titles to be blank.
12 years ago
Yuri Baburov
8aefc6175f
Updated README with 0.2.6 changes.
12 years ago
Yuri Baburov
20d5f3a73a
Bump to 0.2.6
12 years ago
Yuri Baburov
2e49e34e11
Merge pull request #20 from andreypopp/master
...
readability.htmls: some docs do not have title elem
12 years ago
Andrey Popp
95852d5c18
readability.htmls: some docs do not have title elem
12 years ago
Yuri Baburov
274b60cdb1
Merge pull request #19 from EvaSDK/master
...
Package that provides source code
12 years ago
Gilles Dartiguelongue
ea6afd3d49
Make sure code is actually distributed
12 years ago
Richard Harding
a19e766900
Update version so we can upload new tar.gz to pypi
12 years ago
Richard Harding
b9f6f6777f
Merge branch 'master' of github.com:buriy/python-readability
12 years ago
Richard Harding
873562cfba
Update setup.py for finding the package correctly
12 years ago
Richard Harding
e9a5cbfe7f
Remove pdb dummy
12 years ago
Richard Harding
f1a79fb8f8
Update to make sure we don't drop the html tag when ditching elements
12 years ago
Richard Harding
46f0302ebc
rename the document_only flag to html_partial
12 years ago
Rick Harding
6e8a1f5ce2
Merge pull request #18 from mitechie/add_makefile
...
Add makefile, update .gitignore for venv potential testfile output.
12 years ago
Richard Harding
b8fc399fac
Fix rebase issue in the Makefile
12 years ago
Richard Harding
82804b664d
Update .gitignore file for venv and nosetests.
12 years ago
Richard Harding
4376eedc13
Add makefile testing, building, uploading.
...
- Adds a makefile with helpers
- make all will setup a virtualenv and get deps
- make test will install test deps and run nosetests
- make version_update will open the setup.py for updating version string
- make upload will build and upload sdist to pypi
12 years ago
Yuri Baburov
7338e9ef63
Added test suite to setup.py
...
Bump to version 0.2.4
12 years ago
Yuri Baburov
a1ae4eaf72
Merge pull request #15 from mitechie/master
...
New option only_document of Document.summary(), fixed issue GH-13 with "<body/>", added some docs, tests, and code quality improvements. Thanks, Rick!
12 years ago
Richard Harding
8d3e39f04e
Update readme
12 years ago
Richard Harding
a46dc14251
Try to pep8 all the things but give up when I got close.
12 years ago
Richard Harding
5a98e2c1b8
Correct appending and allow for document only
...
- Fix the appending of siblings to the correct nested element
- Add a document only flag so that you can get a dom tree you can nest
yourself without html/body tags.
12 years ago
Richard Harding
edccec5d3b
Work on why we have an empty <body/> tag
...
- Seems to come because the sanitizer ends up with two nodes, not one. The
first is an empty body, the second is the article div.
- Fix up the tabs so we can work with the file. Needs lots of pep8 love.
- Implement an initial hack that at least gets it working atm.
- Start to add test cases, sample html files we can test against, etc.
12 years ago
Yuri Baburov
ab783b25b7
Merge pull request #11 from JanX2/master
...
Fixing gap in node_length coverage (length=80 was missed)
Continue early in remove_unlikely_candidates() in case there is neither a class nor an id attribute.
Adding comment about oversight in transform_misused_divs_into_paragraphs
12 years ago
Jan Weiß
3cdc3d67af
Adding comment about oversight in transform_misused_divs_into_paragraphs().
12 years ago