Richard Harding
d6317cd2ce
Sync up with the fork
11 years ago
Mišo Belica
bf6cfef556
Renamed '_py3k.py' -> '_compat.py'
12 years ago
Mišo Belica
8c775fee7f
Added new test article
12 years ago
Mišo Belica
5c20673d45
Don't remove h1/h2 elements from readable article
12 years ago
Mišo Belica
df5cb8c8f6
Added scored nodes into candidates
12 years ago
Mišo Belica
f858f0dbb0
1 pt for 100 inner text chars is computed as float
12 years ago
Mišo Belica
d054823958
Added simple test for parser of annotated text
12 years ago
Mišo Belica
05d2230015
Load articles/snippets as binary strings
12 years ago
Mišo Belica
e6191fe0d1
Link density is computed with normalized whitespace
...
HTML code contains many whitespace and if there is
large amount of indentation characters link density
is small even if there are only links with usefull
text.
12 years ago
Mišo Belica
c2a5b74230
Changed representation of annotated text
12 years ago
Mišo Belica
e366721873
Convert <hr> tag into paragraphs
12 years ago
Mišo Belica
3449a33d87
Test for changing multiple <br> into <p>
12 years ago
Mišo Belica
7bd7231e25
Renamed property of 'OriginalDocument': 'html' -> 'dom'
12 years ago
Mišo Belica
69dd9ef4fd
Changed 'readable_annotated_text' -> 'main_text'
12 years ago
Mišo Belica
0df3a95c1e
Property of ``Article`` with annotated text
12 years ago
Mišo Belica
f5939f4608
Skip unused tests instead of useless passing
12 years ago
Mišo Belica
6b87ac5e07
Use unicode literals from future, not 'to_string'
12 years ago
Mišo Belica
eb8a8c5248
Replaced deprecated method 'getiterator' by 'iter'
12 years ago
Mišo Belica
5abe69d917
Added new test article
12 years ago
Mišo Belica
0178cfff5c
Added compatibility file with unittest2 import
12 years ago
Mišo Belica
26fe24789c
Made packages from all tests
12 years ago
Mišo Belica
ee483a7f91
Changed location of test HTML files
12 years ago
Mišo Belica
3b5b2b1522
Renamed to readability
12 years ago
Mišo Belica
1a5970b238
Better names and positions for variables
12 years ago
Mišo Belica
930b6ced12
Fixed transformation of leaf <div> into <p>
12 years ago
Mišo Belica
18b5c9b447
Refactored file 'scoring.py'
12 years ago
Mišo Belica
dcb7c18fd5
Refactored file 'document.py'
...
Removed non-intuitive parts and dead code
not covered by tests. Better names for objects.
Better coverage by tests.
12 years ago
Mišo Belica
b3b987440d
Added test runner via nosetests
12 years ago
Mišo Belica
3f71e1b7d4
Refactored checking of node's attribute
12 years ago
Mišo Belica
636a38d705
Refactored generating of hash ID
12 years ago
Mišo Belica
9a613317c0
Make package from tests
12 years ago
Mišo Belica
cc00976533
Replace implementation of 'cached_property'
...
Parameter 'ttl' isn't needed.
12 years ago
Mišo Belica
e3b6ee2fd6
Suppress warning "ResourceWarning: unclosed file"
12 years ago
Mišo Belica
101950478e
Simplify logging
12 years ago
Mišo Belica
3322681166
Use 'charade' for detecting encoding
12 years ago
Mišo Belica
544220e9a3
Replaced u"" literal wit function 'to_unnicode'
...
Literal u"" is not supported by Python v3.2.
12 years ago
Mišo Belica
94f6b0a84e
Tests passes for both Python v2.7, v3.3
12 years ago
Mišo Belica
912bb50b76
Skip failing test that I don't know how to fix
12 years ago
Mišo Belica
c4dbe24a65
New repository structure
12 years ago