Commit Graph

145 Commits (e6191fe0d1894e689cbd0894abe2b2b771d30737)
 

Author SHA1 Message Date
Mišo Belica e6191fe0d1 Link density is computed with normalized whitespace
HTML code contains many whitespace and if there is
large amount of indentation characters link density
is small even if there are only links with usefull
text.
12 years ago
Mišo Belica 671580ac2c Use groupby for to group annotated texts 12 years ago
Mišo Belica c2a5b74230 Changed representation of annotated text 12 years ago
Mišo Belica e366721873 Convert <hr> tag into paragraphs 12 years ago
Mišo Belica e198b94ffb Added string utils for handling whitespace 12 years ago
Mišo Belica 3449a33d87 Test for changing multiple <br> into <p> 12 years ago
Mišo Belica 7bd7231e25 Renamed property of 'OriginalDocument': 'html' -> 'dom' 12 years ago
Mišo Belica 0e748a80a6 Cleaned class 'Article' 12 years ago
Mišo Belica 530b7d8f22 Drop unlikely candidates as soon as you can 12 years ago
Mišo Belica 69dd9ef4fd Changed 'readable_annotated_text' -> 'main_text' 12 years ago
Mišo Belica c47530bfe0 Updated changelog 12 years ago
Mišo Belica 0df3a95c1e Property of ``Article`` with annotated text 12 years ago
Mišo Belica 7337e2fb38 Join node with 1 child of the same type 12 years ago
Mišo Belica ade957cb47 Don't change <div> to <p> if it contains <p> elements 12 years ago
Mišo Belica 35dd10f546 Better logging messages 12 years ago
Mišo Belica f5939f4608 Skip unused tests instead of useless passing 12 years ago
Mišo Belica 6b87ac5e07 Use unicode literals from future, not 'to_string' 12 years ago
Mišo Belica c9e8e00b92 Refactored class ``OriginalDocument`` 12 years ago
Mišo Belica eb8a8c5248 Replaced deprecated method 'getiterator' by 'iter' 12 years ago
Mišo Belica 2159625626 Function 'callable' has returned in Python 3.2 12 years ago
Mišo Belica 76832530b4 I don't use Makefile 12 years ago
Mišo Belica 5abe69d917 Added new test article 12 years ago
Mišo Belica 5e41280f77 Updated helper for creating an article test 12 years ago
Mišo Belica 0178cfff5c Added compatibility file with unittest2 import 12 years ago
Mišo Belica 26fe24789c Made packages from all tests 12 years ago
Mišo Belica ee483a7f91 Changed location of test HTML files 12 years ago
Mišo Belica 3b5b2b1522 Renamed to readability 12 years ago
Mišo Belica cf781bc595 Updated implementation of cached property
Cached value of properties are stored
in instance's '__dict__'.
12 years ago
Mišo Belica 4e3227521e Fewer code - fewer bugs (I hope) 12 years ago
Mišo Belica 1a5970b238 Better names and positions for variables 12 years ago
Mišo Belica 930b6ced12 Fixed transformation of leaf <div> into <p> 12 years ago
Mišo Belica 314c999730 Drop useless tags by HTML cleaner 12 years ago
Mišo Belica 272fe480a3 Updated setup.py 12 years ago
Mišo Belica 9eacbd579c Updated LICENSE, AUTHORS, README 12 years ago
Mišo Belica 18b5c9b447 Refactored file 'scoring.py' 12 years ago
Mišo Belica dcb7c18fd5 Refactored file 'document.py'
Removed non-intuitive parts and dead code
not covered by tests. Better names for objects.
Better coverage by tests.
12 years ago
Mišo Belica 03ff0be266 Moved client script into 'breadability.scripts' 12 years ago
Mišo Belica c92f61fa53 Fixed docopt version 12 years ago
Mišo Belica ec88a4efe6 Use docopt as an argument parser 12 years ago
Mišo Belica 8470ef2b45 Purification of file readable.py 12 years ago
Mišo Belica b3b987440d Added test runner via nosetests 12 years ago
Mišo Belica 2e2e906da7 Purification of document.py 12 years ago
Mišo Belica 9f0fc2d433 Purification 12 years ago
Mišo Belica baaefeda3c Refactored computing of link density 12 years ago
Mišo Belica 3f71e1b7d4 Refactored checking of node's attribute 12 years ago
Mišo Belica 636a38d705 Refactored generating of hash ID 12 years ago
Mišo Belica 9a613317c0 Make package from tests 12 years ago
Mišo Belica cc00976533 Replace implementation of 'cached_property'
Parameter 'ttl' isn't needed.
12 years ago
Mišo Belica e3b6ee2fd6 Suppress warning "ResourceWarning: unclosed file" 12 years ago
Mišo Belica c69cd4b2ba Purification 12 years ago