- They are far more common in random commented code and proprietary metadata
that keeps slipping by the filter as actual content.
- Downgraded the score value of commas for the same reason.
- Prep for 0.1.10 release with these changes.
Add credits and tweak the " and , scoring
Update version and update the scoring code
- Fixes#9 with empty/non parsable docs
- Fixes#8 and removes kwargs for the decode statements.
- Fixes#7 by checking if the node has a parent before dropping.
Number of characters was being mod'd by 100 instead of divided,
so a paragraph with a character length of 103 would have
incorrectly gotten 3 bonus points added to the content score.
Add Greg to credits
- Add concept of a LNODE logger that outputs information about scoring, node,
and generates a hash_id for the node content so we can track it.
- Add `-d` flag to the cmd line client to output the LNODE logging
- Update reading in of http content in the client to be unicode
- Wrap stdout with a unicode happy stream so we can pipe unicode to less/grep,
etc
- Add html article to the scorable tags we work with
- Make sure we drop iframe along with noscript
- Fix scoring bugs around length points
- Add the hash_id as a scored node @property
- added a -f flag that will override only getting a <div> fragement back and
return a fully constructed document
- added a -b flag to not just parse, but write to temp file and open in a
browser, great for testing
- Updated the Article to support the fragment=False so that you can get back a
fully wrapped <html> document with a header (especially with utf-8 content
type set yay)