Mišo Belica
e6191fe0d1
Link density is computed with normalized whitespace
...
HTML code contains many whitespace and if there is
large amount of indentation characters link density
is small even if there are only links with usefull
text.
12 years ago
Mišo Belica
671580ac2c
Use groupby for to group annotated texts
12 years ago
Mišo Belica
c2a5b74230
Changed representation of annotated text
12 years ago
Mišo Belica
e366721873
Convert <hr> tag into paragraphs
12 years ago
Mišo Belica
e198b94ffb
Added string utils for handling whitespace
12 years ago
Mišo Belica
3449a33d87
Test for changing multiple <br> into <p>
12 years ago
Mišo Belica
7bd7231e25
Renamed property of 'OriginalDocument': 'html' -> 'dom'
12 years ago
Mišo Belica
0e748a80a6
Cleaned class 'Article'
12 years ago
Mišo Belica
530b7d8f22
Drop unlikely candidates as soon as you can
12 years ago
Mišo Belica
69dd9ef4fd
Changed 'readable_annotated_text' -> 'main_text'
12 years ago
Mišo Belica
c47530bfe0
Updated changelog
12 years ago
Mišo Belica
0df3a95c1e
Property of ``Article`` with annotated text
12 years ago
Mišo Belica
7337e2fb38
Join node with 1 child of the same type
12 years ago
Mišo Belica
ade957cb47
Don't change <div> to <p> if it contains <p> elements
12 years ago
Mišo Belica
35dd10f546
Better logging messages
12 years ago
Mišo Belica
f5939f4608
Skip unused tests instead of useless passing
12 years ago
Mišo Belica
6b87ac5e07
Use unicode literals from future, not 'to_string'
12 years ago
Mišo Belica
c9e8e00b92
Refactored class ``OriginalDocument``
12 years ago
Mišo Belica
eb8a8c5248
Replaced deprecated method 'getiterator' by 'iter'
12 years ago
Mišo Belica
2159625626
Function 'callable' has returned in Python 3.2
12 years ago
Mišo Belica
76832530b4
I don't use Makefile
12 years ago
Mišo Belica
5abe69d917
Added new test article
12 years ago
Mišo Belica
5e41280f77
Updated helper for creating an article test
12 years ago
Mišo Belica
0178cfff5c
Added compatibility file with unittest2 import
12 years ago
Mišo Belica
26fe24789c
Made packages from all tests
12 years ago
Mišo Belica
ee483a7f91
Changed location of test HTML files
12 years ago
Mišo Belica
3b5b2b1522
Renamed to readability
12 years ago
Mišo Belica
cf781bc595
Updated implementation of cached property
...
Cached value of properties are stored
in instance's '__dict__'.
12 years ago
Mišo Belica
4e3227521e
Fewer code - fewer bugs (I hope)
12 years ago
Mišo Belica
1a5970b238
Better names and positions for variables
12 years ago
Mišo Belica
930b6ced12
Fixed transformation of leaf <div> into <p>
12 years ago
Mišo Belica
314c999730
Drop useless tags by HTML cleaner
12 years ago
Mišo Belica
272fe480a3
Updated setup.py
12 years ago
Mišo Belica
9eacbd579c
Updated LICENSE, AUTHORS, README
12 years ago
Mišo Belica
18b5c9b447
Refactored file 'scoring.py'
12 years ago
Mišo Belica
dcb7c18fd5
Refactored file 'document.py'
...
Removed non-intuitive parts and dead code
not covered by tests. Better names for objects.
Better coverage by tests.
12 years ago
Mišo Belica
03ff0be266
Moved client script into 'breadability.scripts'
12 years ago
Mišo Belica
c92f61fa53
Fixed docopt version
12 years ago
Mišo Belica
ec88a4efe6
Use docopt as an argument parser
12 years ago
Mišo Belica
8470ef2b45
Purification of file readable.py
12 years ago
Mišo Belica
b3b987440d
Added test runner via nosetests
12 years ago
Mišo Belica
2e2e906da7
Purification of document.py
12 years ago
Mišo Belica
9f0fc2d433
Purification
12 years ago
Mišo Belica
baaefeda3c
Refactored computing of link density
12 years ago
Mišo Belica
3f71e1b7d4
Refactored checking of node's attribute
12 years ago
Mišo Belica
636a38d705
Refactored generating of hash ID
12 years ago
Mišo Belica
9a613317c0
Make package from tests
12 years ago
Mišo Belica
cc00976533
Replace implementation of 'cached_property'
...
Parameter 'ttl' isn't needed.
12 years ago
Mišo Belica
e3b6ee2fd6
Suppress warning "ResourceWarning: unclosed file"
12 years ago
Mišo Belica
c69cd4b2ba
Purification
12 years ago