Richard Harding
c2f935bf51
Remove code we didn't need
2012-06-15 21:03:50 -04:00
Richard Harding
326fbfe107
Fix the processing and clean up the antipope article
2012-06-15 21:00:03 -04:00
Richard Harding
3ae64f165e
Update and merge
2012-06-15 20:15:37 -04:00
Richard Harding
edca1c74ba
Add in test files for antipope blog post
2012-05-28 17:09:23 -04:00
Richard Harding
d3c83b7255
Update scoring and tests for the antipope article
2012-05-28 17:08:45 -04:00
Richard Harding
3f70a49a22
Update to fix client, add head to the css downgrade weights
2012-05-28 16:25:45 -04:00
Richard Harding
46ede7ccfb
Prep for 0.1.2 release
2012-05-28 15:55:20 -04:00
Richard Harding
811921775c
Started to do some testing, but really not happy with it
2012-05-23 21:18:07 -04:00
Richard Harding
7c220535df
Complete upstream merge
2012-05-22 06:59:46 -04:00
Greg Jastrab
c8c53b304b
Bonus per 100 chars logic was incorrect
...
Number of characters was being mod'd by 100 instead of divided,
so a paragraph with a character length of 103 would have
incorrectly gotten 3 bonus points added to the content score.
Add Greg to credits
2012-05-22 06:56:31 -04:00
Richard Harding
be77f99be1
Add doc and candidates properties to the article
2012-05-16 21:34:01 -04:00
Richard Harding
2e3f416e3b
Garden
2012-05-16 20:29:42 -04:00
Richard Harding
e83a753b82
Garden and lint
2012-05-16 20:20:36 -04:00
Richard Harding
6d380712c5
Start process of testing full candidate scoring
2012-05-12 14:00:55 -04:00
Richard Harding
ae9208374b
Add some ScoredNode tests as well
2012-05-12 13:56:23 -04:00
Richard Harding
e57f8f02ce
Adding tests for the id/css weights and link density
2012-05-12 13:39:26 -04:00
Richard Harding
90a02569ca
Prep for 0.1.1 release
2012-05-11 21:18:47 -04:00
Richard Harding
e168484126
Garden readme
2012-05-11 21:17:07 -04:00
Richard Harding
645838c66c
Update readme with ci and other important links
2012-05-11 21:15:53 -04:00
Richard Harding
1553eda145
Fix typo in travis config
2012-05-11 21:10:12 -04:00
Richard Harding
ad3685d4f4
Start to add items to get travis ci builds working
2012-05-11 21:05:21 -04:00
Richard Harding
56f29a8585
Mark true so we can start sending tests to travisci
2012-05-11 20:59:00 -04:00
Richard Harding
32350fc3a1
Create LNODE and update bugs in parsing
...
- Add concept of a LNODE logger that outputs information about scoring, node,
and generates a hash_id for the node content so we can track it.
- Add `-d` flag to the cmd line client to output the LNODE logging
- Update reading in of http content in the client to be unicode
- Wrap stdout with a unicode happy stream so we can pipe unicode to less/grep,
etc
- Add html article to the scorable tags we work with
- Make sure we drop iframe along with noscript
- Fix scoring bugs around length points
- Add the hash_id as a scored node @property
2012-05-11 20:53:49 -04:00
Richard Harding
f1623fc3e3
Redo the candidate logging to help us locate the best candidate
2012-05-09 19:56:41 -04:00
Richard Harding
278d695614
Update readme for the new cmd line flags
2012-05-08 19:39:02 -04:00
Richard Harding
6b92dd2f83
Add -f and -b flags to client
...
- added a -f flag that will override only getting a <div> fragement back and
return a fully constructed document
- added a -b flag to not just parse, but write to temp file and open in a
browser, great for testing
- Updated the Article to support the fragment=False so that you can get back a
fully wrapped <html> document with a header (especially with utf-8 content
type set yay)
2012-05-08 19:33:50 -04:00
Richard Harding
8b77675ab2
Fix up some tests since we should have run them before tagging 0.1...need to get into build server
2012-05-06 21:06:51 -04:00
Richard Harding
745598dff9
Update news file with initial release
2012-05-06 20:47:24 -04:00
Richard Harding
279788c003
Update the readme for install info
2012-05-06 20:45:44 -04:00
Richard Harding
9e6835bd92
Work on tweaking out parser algorithm to help find the right candidate: fixes #2
2012-05-06 20:34:42 -04:00
Richard Harding
b78ea49c5a
Update readme so people don't misunderstand
2012-05-06 19:57:03 -04:00
Richard Harding
454e283850
Add link to readability
2012-05-06 19:55:04 -04:00
Richard Harding
d52d99f6b0
More readme tweaks
2012-05-06 19:53:59 -04:00
Richard Harding
773361efd9
Update readme with some real content
2012-05-06 19:52:59 -04:00
Richard Harding
7d2eec8f52
Add the conditional node checking during node cleaning
2012-05-06 19:41:30 -04:00
Richard Harding
14bbe701eb
Add some more debugging to support tracing wtf we did and why
2012-05-06 13:46:01 -04:00
Richard Harding
00ba7e5164
Start to add debugging process for the library/client
2012-05-06 09:04:15 -04:00
Richard Harding
e7873d3d92
Profile and adjust for performance, add bugfix to parse out mitechie blog post
2012-05-06 00:38:47 -04:00
Richard Harding
6b16b7b21f
Start to add scoring file specific tests
2012-05-05 23:26:30 -04:00
Richard Harding
ab79d9632b
Some refactoring starts to help us org tests/code
2012-05-05 21:31:36 -04:00
Richard Harding
ccac04e567
Add some cleaning/post processing of our target
...
- Starting to look decent
- Still need to port their cleanConditionally but going to have to think on
that
- Removes spare paragraphs, does some other cleaning tweaks
2012-05-05 20:52:15 -04:00
Richard Harding
19a38a2cea
Add support for sibling detection, need to figure out how to test it well still
2012-05-05 14:41:12 -04:00
Richard Harding
4455ec226d
Fix logic in the changing of body -> div
2012-05-05 13:09:45 -04:00
Richard Harding
5c1765a6ef
Update cmd line client/interface, update doc builders
...
- For now we're always getting a div back from the parser
- Update the client code, not all flags are enabled, but basic passing a url
works
2012-05-05 13:08:24 -04:00
Richard Harding
5b3ef916ef
Update to add link density scoring adjustments, prep for sibling checks
2012-05-05 08:07:13 -04:00
Richard Harding
e843940549
Garden
2012-05-04 22:54:30 -04:00
Richard Harding
8e96cb7844
Update tests for scoring, returning div/html doc depending on the found content
2012-05-04 22:46:37 -04:00
Richard Harding
60ab4a96b0
Fix tests to pass again
2012-05-04 17:18:30 -04:00
Richard Harding
8f28e7c947
Add processing of content per the algorithm with some base tests
2012-05-04 16:07:52 -04:00
Richard Harding
7960264c3b
Make sure we return body with our css class on it
2012-05-04 13:54:58 -04:00