Commit Graph

39 Commits (d6317cd2cedd0eaef88387e21e1164bf925b50ac)

Author SHA1 Message Date
Richard Harding d6317cd2ce Sync up with the fork 11 years ago
Mišo Belica bf6cfef556 Renamed '_py3k.py' -> '_compat.py' 12 years ago
Mišo Belica 8c775fee7f Added new test article 12 years ago
Mišo Belica 5c20673d45 Don't remove h1/h2 elements from readable article 12 years ago
Mišo Belica df5cb8c8f6 Added scored nodes into candidates 12 years ago
Mišo Belica f858f0dbb0 1 pt for 100 inner text chars is computed as float 12 years ago
Mišo Belica d054823958 Added simple test for parser of annotated text 12 years ago
Mišo Belica 05d2230015 Load articles/snippets as binary strings 12 years ago
Mišo Belica e6191fe0d1 Link density is computed with normalized whitespace
HTML code contains many whitespace and if there is
large amount of indentation characters link density
is small even if there are only links with usefull
text.
12 years ago
Mišo Belica c2a5b74230 Changed representation of annotated text 12 years ago
Mišo Belica e366721873 Convert <hr> tag into paragraphs 12 years ago
Mišo Belica 3449a33d87 Test for changing multiple <br> into <p> 12 years ago
Mišo Belica 7bd7231e25 Renamed property of 'OriginalDocument': 'html' -> 'dom' 12 years ago
Mišo Belica 69dd9ef4fd Changed 'readable_annotated_text' -> 'main_text' 12 years ago
Mišo Belica 0df3a95c1e Property of ``Article`` with annotated text 12 years ago
Mišo Belica f5939f4608 Skip unused tests instead of useless passing 12 years ago
Mišo Belica 6b87ac5e07 Use unicode literals from future, not 'to_string' 12 years ago
Mišo Belica eb8a8c5248 Replaced deprecated method 'getiterator' by 'iter' 12 years ago
Mišo Belica 5abe69d917 Added new test article 12 years ago
Mišo Belica 0178cfff5c Added compatibility file with unittest2 import 12 years ago
Mišo Belica 26fe24789c Made packages from all tests 12 years ago
Mišo Belica ee483a7f91 Changed location of test HTML files 12 years ago
Mišo Belica 3b5b2b1522 Renamed to readability 12 years ago
Mišo Belica 1a5970b238 Better names and positions for variables 12 years ago
Mišo Belica 930b6ced12 Fixed transformation of leaf <div> into <p> 12 years ago
Mišo Belica 18b5c9b447 Refactored file 'scoring.py' 12 years ago
Mišo Belica dcb7c18fd5 Refactored file 'document.py'
Removed non-intuitive parts and dead code
not covered by tests. Better names for objects.
Better coverage by tests.
12 years ago
Mišo Belica b3b987440d Added test runner via nosetests 12 years ago
Mišo Belica 3f71e1b7d4 Refactored checking of node's attribute 12 years ago
Mišo Belica 636a38d705 Refactored generating of hash ID 12 years ago
Mišo Belica 9a613317c0 Make package from tests 12 years ago
Mišo Belica cc00976533 Replace implementation of 'cached_property'
Parameter 'ttl' isn't needed.
12 years ago
Mišo Belica e3b6ee2fd6 Suppress warning "ResourceWarning: unclosed file" 12 years ago
Mišo Belica 101950478e Simplify logging 12 years ago
Mišo Belica 3322681166 Use 'charade' for detecting encoding 12 years ago
Mišo Belica 544220e9a3 Replaced u"" literal wit function 'to_unnicode'
Literal u"" is not supported by Python v3.2.
12 years ago
Mišo Belica 94f6b0a84e Tests passes for both Python v2.7, v3.3 12 years ago
Mišo Belica 912bb50b76 Skip failing test that I don't know how to fix 12 years ago
Mišo Belica c4dbe24a65 New repository structure 12 years ago