Bookie Weekly Update: April 22nd 2012
Another week, another few lines of code, and yay for two weeks in a row!
Bookie
Not a ton here, just some CSS updates and updating the backup script for pulling the INI correctly.
Bookie Parser
I spent some time cleaning up the CSS. I did some research on the most readable fonts for screens and surprisingly, it seems that sans serif wins on digital displays. So I updated the CSS and combined with some work on the Bookie main CSS files to make the readable pages a bit nicer. I’ve still got some more cleanup to do, but it reads a bit nicer now.
I also fixed the html generated to not have the empty body tag. It was due to the way the readable parsing library was giving me a full html document of content. See the updates over there for some bigger updates.
Finally, I added a form on the main page so you can try it out on a url just by entering it. So if you’re just curious what it does, go try it out!
Bookie Api
Just added a ping command. It should help make sure that the configuration is correct for new users. It’s also a nice start to a non-admin specific api command. A little bit of cleanup aside from that, but nothing major.
readability_lxml
Currently, Bookie uses a library called decruft for parsing html pages for the actual important article content. The bookie_parser project is using a different fork of that called readability_lxml. The author is a bit open to merging changes in and actually says she’s in ‘maintenance mode’. Since I kind of want a really decent library for this, it’s an important feature, I started hacking on it. In the process, this is where my week of hacking went.
First I updated it to allow me to get back only a partial html document vs an entire <html> doc. I then fixed some bugs, started cleaning up the code (adding tests, making the command line client all nice and argepare’y) etc. In the process I noticed that there’s a big branch in Github that adds a ton of things like multiple page document support and such. I’ve started to try to pull his branch into my work and the origin author’s code. It’s a LOT of git cherry-pick and really a pain since I want to clean up the code as I go. Unfortunately, this just means that Git gets confused on future merges since the code’s changed between commits. Ugh!
I’m about half way done though and I hope this will leave us with one solid library to do this parsing. I’m hoping to kind of take over stewardship of the library as I complete this work. It should hopefully make Bookie and bookie_parser all the more awesome.
The coming week
I’m giving a talk on the YUI JavaScript library at Penguicon. This means my
hacking time will be a bit less since I’ve got a presentation to prepare for. Next week’s status report might be a bit light and boring, but hey, maybe I’ll scrounge up some more beta users of Bookie while at the conference.
Filed under: Bookie | Leave a Comment
Tags: api, bookie, github, parser, penguicon, readability_lxml
No Responses Yet to “Bookie Weekly Update: April 22nd 2012”