Commit Graph

221 Commits (0751fe0c97a0cd5a4a5e099ee7d17d2fc73d3596)
 

Author SHA1 Message Date
Richard Harding 05e13a4834 Update to only append sibling if we don't already have it 11 years ago
Richard Harding 952ea273c5 Update to version 0.1.13 11 years ago
Craig Maloney 9b9ec5b0e6 Treat images a little differently so they get more inclusion.
- When the body of the article contains screenshots/etc we want to try to keep
those images around.
- Added test for Business Insider article
- Adding sweetshark test from issue #1
- Add craig to the credits
11 years ago
Mišo Belica 471db19a43 Added BTE tool into similar tools to readme 11 years ago
Mišo Belica 43cc38dc7b Cleanup 11 years ago
Richard Harding 37c6c41d29 Update versions for 0.1.12 11 years ago
macmenot 4f2b744a3a Set urllib useragent string.
- Use a custom string to help with identifying traffic
- Update version to 0.1.12
- Small linting

Adjust the user agent string, lint
11 years ago
Mišo Belica 81ba7aec3c Create console scripts with python version suffix 12 years ago
Mišo Belica 51df29f05d Write readable content into temp file in binary mode 12 years ago
Mišo Belica 42530d4af7 Use py3k compatible urllib with own User-Agent header 12 years ago
Mišo Belica 9ed02047dd Added string representation for empty scored node 12 years ago
Mišo Belica 7630237b86 Added missing empty line 12 years ago
Mišo Belica c34bc53d9e Updated list of similar tools 12 years ago
Mišo Belica bf6cfef556 Renamed '_py3k.py' -> '_compat.py' 12 years ago
Mišo Belica bd084a8e28 Fixed named argument name 'fragment' 12 years ago
Mišo Belica 8f3ebf0950 Removed file with version number 12 years ago
Mišo Belica 8c775fee7f Added new test article 12 years ago
Mišo Belica c9afc38c49 Cleanups for function 'clean_document' 12 years ago
Mišo Belica 5c20673d45 Don't remove h1/h2 elements from readable article 12 years ago
Mišo Belica c9e087d077 Cleanups 12 years ago
Mišo Belica e0c87223ae Better log messages while scoring candidates 12 years ago
Mišo Belica df5cb8c8f6 Added scored nodes into candidates 12 years ago
Mišo Belica f858f0dbb0 1 pt for 100 inner text chars is computed as float 12 years ago
Mišo Belica 31b75c1cd8 Updated docstring for 'get_link_density' [ci skip] 12 years ago
Mišo Belica d054823958 Added simple test for parser of annotated text 12 years ago
Mišo Belica 05d2230015 Load articles/snippets as binary strings 12 years ago
Mišo Belica e6191fe0d1 Link density is computed with normalized whitespace
HTML code contains many whitespace and if there is
large amount of indentation characters link density
is small even if there are only links with usefull
text.
12 years ago
Mišo Belica 671580ac2c Use groupby for to group annotated texts 12 years ago
Mišo Belica c2a5b74230 Changed representation of annotated text 12 years ago
Mišo Belica e366721873 Convert <hr> tag into paragraphs 12 years ago
Mišo Belica e198b94ffb Added string utils for handling whitespace 12 years ago
Mišo Belica 3449a33d87 Test for changing multiple <br> into <p> 12 years ago
Mišo Belica 7bd7231e25 Renamed property of 'OriginalDocument': 'html' -> 'dom' 12 years ago
Mišo Belica 0e748a80a6 Cleaned class 'Article' 12 years ago
Mišo Belica 530b7d8f22 Drop unlikely candidates as soon as you can 12 years ago
Mišo Belica 69dd9ef4fd Changed 'readable_annotated_text' -> 'main_text' 12 years ago
Mišo Belica c47530bfe0 Updated changelog 12 years ago
Mišo Belica 0df3a95c1e Property of ``Article`` with annotated text 12 years ago
Mišo Belica 7337e2fb38 Join node with 1 child of the same type 12 years ago
Mišo Belica ade957cb47 Don't change <div> to <p> if it contains <p> elements 12 years ago
Mišo Belica 35dd10f546 Better logging messages 12 years ago
Mišo Belica f5939f4608 Skip unused tests instead of useless passing 12 years ago
Mišo Belica 6b87ac5e07 Use unicode literals from future, not 'to_string' 12 years ago
Mišo Belica c9e8e00b92 Refactored class ``OriginalDocument`` 12 years ago
Mišo Belica eb8a8c5248 Replaced deprecated method 'getiterator' by 'iter' 12 years ago
Mišo Belica 2159625626 Function 'callable' has returned in Python 3.2 12 years ago
Mišo Belica 76832530b4 I don't use Makefile 12 years ago
Mišo Belica 5abe69d917 Added new test article 12 years ago
Mišo Belica 5e41280f77 Updated helper for creating an article test 12 years ago
Mišo Belica 0178cfff5c Added compatibility file with unittest2 import 12 years ago