Commit Graph

125 Commits

Author SHA1 Message Date
Richard Harding
316c550709 Add python 2.6 to the travis ci 2012-12-12 20:00:23 -05:00
Richard Harding
fee5c37b39 Add argparse as a install req for py <2.7 2012-12-12 19:58:27 -05:00
Richard Harding
3dea2f349b Update ignore file 2012-10-29 11:00:06 +01:00
Nathan Nifong
920094c81a Add a penalty for double quote chars in paragraphs.
- They are far more common in random commented code and proprietary metadata
  that keeps slipping by the filter as actual content.
- Downgraded the score value of commas for the same reason.
- Prep for 0.1.10 release with these changes.

Add credits and tweak the " and , scoring

Update version and update the scoring code
2012-09-13 19:52:48 -04:00
Richard Harding
60da675da5 Reprocess without candidate in case of errors using one
- Fixes #10
2012-08-27 17:31:14 -04:00
Richard Harding
3984e04668 Add better handling around xml parsing issues
- Fixes #9 with empty/non parsable docs
- Fixes #8 and removes kwargs for the decode statements.
- Fixes #7 by checking if the node has a parent before dropping.
2012-08-27 15:31:28 -04:00
Richard Harding
fe9364295f prep for 0.1.7 release 2012-07-21 21:37:12 -04:00
Richard Harding
ae355e9f2f Update kwarg for older python 2012-07-21 21:36:03 -04:00
Richard Harding
0de17a7b81 Update readme 2012-06-21 15:55:09 -04:00
Richard Harding
e592f5322e Prep for 0.1.6 2012-06-17 10:49:13 -04:00
Richard Harding
bf35e3410e Do some link filtring to drop stupid permalinks from the content. 2012-06-17 10:47:11 -04:00
Richard Harding
9cf19d9970 Prep for 0.1.5 2012-06-16 21:17:37 -04:00
Richard Harding
ff37f3169f Add checks to links to remove really bad links from the scripting site 2012-06-16 21:16:29 -04:00
Richard Harding
5157b4570d Prep for the 0.1.4 release 2012-06-16 20:59:49 -04:00
Richard Harding
5704eb4c15 Start process of adding a newtest script for generating test cases
- Adds new breadability_newtest tool for generating test cases.
- Add fixes for the scripting.com test failure.
2012-06-16 20:57:01 -04:00
Richard Harding
3b00d33ad3 Prep for 0.1.3 release 2012-06-15 21:07:06 -04:00
Richard Harding
c2f935bf51 Remove code we didn't need 2012-06-15 21:03:50 -04:00
Richard Harding
326fbfe107 Fix the processing and clean up the antipope article 2012-06-15 21:00:03 -04:00
Richard Harding
3ae64f165e Update and merge 2012-06-15 20:15:37 -04:00
Richard Harding
edca1c74ba Add in test files for antipope blog post 2012-05-28 17:09:23 -04:00
Richard Harding
d3c83b7255 Update scoring and tests for the antipope article 2012-05-28 17:08:45 -04:00
Richard Harding
3f70a49a22 Update to fix client, add head to the css downgrade weights 2012-05-28 16:25:45 -04:00
Richard Harding
46ede7ccfb Prep for 0.1.2 release 2012-05-28 15:55:20 -04:00
Richard Harding
811921775c Started to do some testing, but really not happy with it 2012-05-23 21:18:07 -04:00
Richard Harding
7c220535df Complete upstream merge 2012-05-22 06:59:46 -04:00
Greg Jastrab
c8c53b304b Bonus per 100 chars logic was incorrect
Number of characters was being mod'd by 100 instead of divided,
so a paragraph with a character length of 103 would have
incorrectly gotten 3 bonus points added to the content score.

Add Greg to credits
2012-05-22 06:56:31 -04:00
Richard Harding
be77f99be1 Add doc and candidates properties to the article 2012-05-16 21:34:01 -04:00
Richard Harding
2e3f416e3b Garden 2012-05-16 20:29:42 -04:00
Richard Harding
e83a753b82 Garden and lint 2012-05-16 20:20:36 -04:00
Richard Harding
6d380712c5 Start process of testing full candidate scoring 2012-05-12 14:00:55 -04:00
Richard Harding
ae9208374b Add some ScoredNode tests as well 2012-05-12 13:56:23 -04:00
Richard Harding
e57f8f02ce Adding tests for the id/css weights and link density 2012-05-12 13:39:26 -04:00
Richard Harding
90a02569ca Prep for 0.1.1 release 2012-05-11 21:18:47 -04:00
Richard Harding
e168484126 Garden readme 2012-05-11 21:17:07 -04:00
Richard Harding
645838c66c Update readme with ci and other important links 2012-05-11 21:15:53 -04:00
Richard Harding
1553eda145 Fix typo in travis config 2012-05-11 21:10:12 -04:00
Richard Harding
ad3685d4f4 Start to add items to get travis ci builds working 2012-05-11 21:05:21 -04:00
Richard Harding
56f29a8585 Mark true so we can start sending tests to travisci 2012-05-11 20:59:00 -04:00
Richard Harding
32350fc3a1 Create LNODE and update bugs in parsing
- Add concept of a LNODE logger that outputs information about scoring, node,
    and generates a hash_id for the node content so we can track it.
- Add `-d` flag to the cmd line client to output the LNODE logging
- Update reading in of http content in the client to be unicode
- Wrap stdout with a unicode happy stream so we can pipe unicode to less/grep,
    etc
- Add html article to the scorable tags we work with
- Make sure we drop iframe along with noscript
- Fix scoring bugs around length points
- Add the hash_id as a scored node @property
2012-05-11 20:53:49 -04:00
Richard Harding
f1623fc3e3 Redo the candidate logging to help us locate the best candidate 2012-05-09 19:56:41 -04:00
Richard Harding
278d695614 Update readme for the new cmd line flags 2012-05-08 19:39:02 -04:00
Richard Harding
6b92dd2f83 Add -f and -b flags to client
- added a -f flag that will override only getting a <div> fragement back and
return a fully constructed document
- added a -b flag to not just parse, but write to temp file and open in a
browser, great for testing
- Updated the Article to support the fragment=False so that you can get back a
fully wrapped <html> document with a header (especially with utf-8 content
type set yay)
2012-05-08 19:33:50 -04:00
Richard Harding
8b77675ab2 Fix up some tests since we should have run them before tagging 0.1...need to get into build server 2012-05-06 21:06:51 -04:00
Richard Harding
745598dff9 Update news file with initial release 2012-05-06 20:47:24 -04:00
Richard Harding
279788c003 Update the readme for install info 2012-05-06 20:45:44 -04:00
Richard Harding
9e6835bd92 Work on tweaking out parser algorithm to help find the right candidate: fixes #2 2012-05-06 20:34:42 -04:00
Richard Harding
b78ea49c5a Update readme so people don't misunderstand 2012-05-06 19:57:03 -04:00
Richard Harding
454e283850 Add link to readability 2012-05-06 19:55:04 -04:00
Richard Harding
d52d99f6b0 More readme tweaks 2012-05-06 19:53:59 -04:00
Richard Harding
773361efd9 Update readme with some real content 2012-05-06 19:52:59 -04:00