Commit Graph

59 Commits

Author SHA1 Message Date
Richard Harding
c2f935bf51 Remove code we didn't need 2012-06-15 21:03:50 -04:00
Richard Harding
326fbfe107 Fix the processing and clean up the antipope article 2012-06-15 21:00:03 -04:00
Richard Harding
3ae64f165e Update and merge 2012-06-15 20:15:37 -04:00
Richard Harding
edca1c74ba Add in test files for antipope blog post 2012-05-28 17:09:23 -04:00
Richard Harding
d3c83b7255 Update scoring and tests for the antipope article 2012-05-28 17:08:45 -04:00
Richard Harding
3f70a49a22 Update to fix client, add head to the css downgrade weights 2012-05-28 16:25:45 -04:00
Richard Harding
46ede7ccfb Prep for 0.1.2 release 2012-05-28 15:55:20 -04:00
Richard Harding
811921775c Started to do some testing, but really not happy with it 2012-05-23 21:18:07 -04:00
Richard Harding
7c220535df Complete upstream merge 2012-05-22 06:59:46 -04:00
Greg Jastrab
c8c53b304b Bonus per 100 chars logic was incorrect
Number of characters was being mod'd by 100 instead of divided,
so a paragraph with a character length of 103 would have
incorrectly gotten 3 bonus points added to the content score.

Add Greg to credits
2012-05-22 06:56:31 -04:00
Richard Harding
be77f99be1 Add doc and candidates properties to the article 2012-05-16 21:34:01 -04:00
Richard Harding
2e3f416e3b Garden 2012-05-16 20:29:42 -04:00
Richard Harding
e83a753b82 Garden and lint 2012-05-16 20:20:36 -04:00
Richard Harding
6d380712c5 Start process of testing full candidate scoring 2012-05-12 14:00:55 -04:00
Richard Harding
ae9208374b Add some ScoredNode tests as well 2012-05-12 13:56:23 -04:00
Richard Harding
e57f8f02ce Adding tests for the id/css weights and link density 2012-05-12 13:39:26 -04:00
Richard Harding
90a02569ca Prep for 0.1.1 release 2012-05-11 21:18:47 -04:00
Richard Harding
e168484126 Garden readme 2012-05-11 21:17:07 -04:00
Richard Harding
645838c66c Update readme with ci and other important links 2012-05-11 21:15:53 -04:00
Richard Harding
1553eda145 Fix typo in travis config 2012-05-11 21:10:12 -04:00
Richard Harding
ad3685d4f4 Start to add items to get travis ci builds working 2012-05-11 21:05:21 -04:00
Richard Harding
56f29a8585 Mark true so we can start sending tests to travisci 2012-05-11 20:59:00 -04:00
Richard Harding
32350fc3a1 Create LNODE and update bugs in parsing
- Add concept of a LNODE logger that outputs information about scoring, node,
    and generates a hash_id for the node content so we can track it.
- Add `-d` flag to the cmd line client to output the LNODE logging
- Update reading in of http content in the client to be unicode
- Wrap stdout with a unicode happy stream so we can pipe unicode to less/grep,
    etc
- Add html article to the scorable tags we work with
- Make sure we drop iframe along with noscript
- Fix scoring bugs around length points
- Add the hash_id as a scored node @property
2012-05-11 20:53:49 -04:00
Richard Harding
f1623fc3e3 Redo the candidate logging to help us locate the best candidate 2012-05-09 19:56:41 -04:00
Richard Harding
278d695614 Update readme for the new cmd line flags 2012-05-08 19:39:02 -04:00
Richard Harding
6b92dd2f83 Add -f and -b flags to client
- added a -f flag that will override only getting a <div> fragement back and
return a fully constructed document
- added a -b flag to not just parse, but write to temp file and open in a
browser, great for testing
- Updated the Article to support the fragment=False so that you can get back a
fully wrapped <html> document with a header (especially with utf-8 content
type set yay)
2012-05-08 19:33:50 -04:00
Richard Harding
8b77675ab2 Fix up some tests since we should have run them before tagging 0.1...need to get into build server 2012-05-06 21:06:51 -04:00
Richard Harding
745598dff9 Update news file with initial release 2012-05-06 20:47:24 -04:00
Richard Harding
279788c003 Update the readme for install info 2012-05-06 20:45:44 -04:00
Richard Harding
9e6835bd92 Work on tweaking out parser algorithm to help find the right candidate: fixes #2 2012-05-06 20:34:42 -04:00
Richard Harding
b78ea49c5a Update readme so people don't misunderstand 2012-05-06 19:57:03 -04:00
Richard Harding
454e283850 Add link to readability 2012-05-06 19:55:04 -04:00
Richard Harding
d52d99f6b0 More readme tweaks 2012-05-06 19:53:59 -04:00
Richard Harding
773361efd9 Update readme with some real content 2012-05-06 19:52:59 -04:00
Richard Harding
7d2eec8f52 Add the conditional node checking during node cleaning 2012-05-06 19:41:30 -04:00
Richard Harding
14bbe701eb Add some more debugging to support tracing wtf we did and why 2012-05-06 13:46:01 -04:00
Richard Harding
00ba7e5164 Start to add debugging process for the library/client 2012-05-06 09:04:15 -04:00
Richard Harding
e7873d3d92 Profile and adjust for performance, add bugfix to parse out mitechie blog post 2012-05-06 00:38:47 -04:00
Richard Harding
6b16b7b21f Start to add scoring file specific tests 2012-05-05 23:26:30 -04:00
Richard Harding
ab79d9632b Some refactoring starts to help us org tests/code 2012-05-05 21:31:36 -04:00
Richard Harding
ccac04e567 Add some cleaning/post processing of our target
- Starting to look decent
- Still need to port their cleanConditionally but going to have to think on
that
- Removes spare paragraphs, does some other cleaning tweaks
2012-05-05 20:52:15 -04:00
Richard Harding
19a38a2cea Add support for sibling detection, need to figure out how to test it well still 2012-05-05 14:41:12 -04:00
Richard Harding
4455ec226d Fix logic in the changing of body -> div 2012-05-05 13:09:45 -04:00
Richard Harding
5c1765a6ef Update cmd line client/interface, update doc builders
- For now we're always getting a div back from the parser
- Update the client code, not all flags are enabled, but basic passing a url
works
2012-05-05 13:08:24 -04:00
Richard Harding
5b3ef916ef Update to add link density scoring adjustments, prep for sibling checks 2012-05-05 08:07:13 -04:00
Richard Harding
e843940549 Garden 2012-05-04 22:54:30 -04:00
Richard Harding
8e96cb7844 Update tests for scoring, returning div/html doc depending on the found content 2012-05-04 22:46:37 -04:00
Richard Harding
60ab4a96b0 Fix tests to pass again 2012-05-04 17:18:30 -04:00
Richard Harding
8f28e7c947 Add processing of content per the algorithm with some base tests 2012-05-04 16:07:52 -04:00
Richard Harding
7960264c3b Make sure we return body with our css class on it 2012-05-04 13:54:58 -04:00