Mišo Belica
05d2230015
Load articles/snippets as binary strings
2013-03-26 19:55:50 +01:00
Mišo Belica
e6191fe0d1
Link density is computed with normalized whitespace
...
HTML code contains many whitespace and if there is
large amount of indentation characters link density
is small even if there are only links with usefull
text.
2013-03-26 19:55:18 +01:00
Mišo Belica
671580ac2c
Use groupby for to group annotated texts
2013-03-25 16:32:52 +01:00
Mišo Belica
c2a5b74230
Changed representation of annotated text
2013-03-25 14:26:03 +01:00
Mišo Belica
e366721873
Convert <hr> tag into paragraphs
2013-03-25 13:57:33 +01:00
Mišo Belica
e198b94ffb
Added string utils for handling whitespace
2013-03-25 13:41:43 +01:00
Mišo Belica
3449a33d87
Test for changing multiple <br> into <p>
2013-03-23 17:04:30 +01:00
Mišo Belica
7bd7231e25
Renamed property of 'OriginalDocument': 'html' -> 'dom'
2013-03-23 17:03:54 +01:00
Mišo Belica
0e748a80a6
Cleaned class 'Article'
2013-03-23 16:07:42 +01:00
Mišo Belica
530b7d8f22
Drop unlikely candidates as soon as you can
2013-03-23 16:02:43 +01:00
Mišo Belica
69dd9ef4fd
Changed 'readable_annotated_text' -> 'main_text'
2013-03-23 15:47:14 +01:00
Mišo Belica
c47530bfe0
Updated changelog
2013-03-21 19:53:07 +01:00
Mišo Belica
0df3a95c1e
Property of `Article
` with annotated text
2013-03-21 19:43:22 +01:00
Mišo Belica
7337e2fb38
Join node with 1 child of the same type
2013-03-21 19:42:18 +01:00
Mišo Belica
ade957cb47
Don't change <div> to <p> if it contains <p> elements
2013-03-21 19:41:00 +01:00
Mišo Belica
35dd10f546
Better logging messages
2013-03-21 19:38:54 +01:00
Mišo Belica
f5939f4608
Skip unused tests instead of useless passing
2013-03-21 19:36:04 +01:00
Mišo Belica
6b87ac5e07
Use unicode literals from future, not 'to_string'
2013-03-19 23:49:07 +01:00
Mišo Belica
c9e8e00b92
Refactored class `OriginalDocument
`
2013-03-19 23:48:14 +01:00
Mišo Belica
eb8a8c5248
Replaced deprecated method 'getiterator' by 'iter'
2013-03-19 16:06:49 +01:00
Mišo Belica
2159625626
Function 'callable' has returned in Python 3.2
2013-03-19 15:33:49 +01:00
Mišo Belica
76832530b4
I don't use Makefile
2013-03-19 01:28:30 +01:00
Mišo Belica
5abe69d917
Added new test article
2013-03-19 01:13:46 +01:00
Mišo Belica
5e41280f77
Updated helper for creating an article test
2013-03-19 00:31:44 +01:00
Mišo Belica
0178cfff5c
Added compatibility file with unittest2 import
2013-03-18 22:01:11 +01:00
Mišo Belica
26fe24789c
Made packages from all tests
2013-03-18 21:45:33 +01:00
Mišo Belica
ee483a7f91
Changed location of test HTML files
2013-03-18 21:40:19 +01:00
Mišo Belica
3b5b2b1522
Renamed to readability
2013-03-18 21:25:09 +01:00
Mišo Belica
cf781bc595
Updated implementation of cached property
...
Cached value of properties are stored
in instance's '__dict__'.
2013-03-17 00:57:28 +01:00
Mišo Belica
4e3227521e
Fewer code - fewer bugs (I hope)
2013-03-15 01:40:41 +01:00
Mišo Belica
1a5970b238
Better names and positions for variables
2013-03-15 00:52:56 +01:00
Mišo Belica
930b6ced12
Fixed transformation of leaf <div> into <p>
2013-03-15 00:48:13 +01:00
Mišo Belica
314c999730
Drop useless tags by HTML cleaner
2013-03-15 00:23:41 +01:00
Mišo Belica
272fe480a3
Updated setup.py
2013-03-15 00:10:55 +01:00
Mišo Belica
9eacbd579c
Updated LICENSE, AUTHORS, README
2013-03-15 00:10:41 +01:00
Mišo Belica
18b5c9b447
Refactored file 'scoring.py'
2013-03-11 23:06:21 +01:00
Mišo Belica
dcb7c18fd5
Refactored file 'document.py'
...
Removed non-intuitive parts and dead code
not covered by tests. Better names for objects.
Better coverage by tests.
2013-03-11 22:10:26 +01:00
Mišo Belica
03ff0be266
Moved client script into 'breadability.scripts'
2013-03-11 21:18:04 +01:00
Mišo Belica
c92f61fa53
Fixed docopt version
2013-03-11 12:43:17 +01:00
Mišo Belica
ec88a4efe6
Use docopt as an argument parser
2013-03-11 12:37:15 +01:00
Mišo Belica
8470ef2b45
Purification of file readable.py
2013-03-09 13:15:05 +01:00
Mišo Belica
b3b987440d
Added test runner via nosetests
2013-03-09 13:05:16 +01:00
Mišo Belica
2e2e906da7
Purification of document.py
2013-03-09 00:05:49 +01:00
Mišo Belica
9f0fc2d433
Purification
2013-03-08 23:48:35 +01:00
Mišo Belica
baaefeda3c
Refactored computing of link density
2013-03-08 23:23:30 +01:00
Mišo Belica
3f71e1b7d4
Refactored checking of node's attribute
2013-03-08 23:19:24 +01:00
Mišo Belica
636a38d705
Refactored generating of hash ID
2013-03-08 23:06:57 +01:00
Mišo Belica
9a613317c0
Make package from tests
2013-03-08 23:05:14 +01:00
Mišo Belica
cc00976533
Replace implementation of 'cached_property'
...
Parameter 'ttl' isn't needed.
2013-03-08 19:29:15 +01:00
Mišo Belica
e3b6ee2fd6
Suppress warning "ResourceWarning: unclosed file"
2013-03-08 17:46:18 +01:00