2011-07-10

Reproducible blogging

As a fact-based blog, the posts here contain very often diagrams and data tables. To enable you to reproduce the results and insights, I include the computations as computer code.

Most blogposts I write are markdown text combined (or weaved) with computer code written in the R language. I created a small package mdtools that puts the tools together and smoothes the workflow.

This post gives an short introduction to the mdtools package: how to install it, the first post, caveats, and future directions.

Installation

The mdtools package is not yet on CRAN, but on a self-hosted repository. You install it by

> install.packages("mdtools", repos = c(getOption("repos"), 
+ "http://userpage.fu-berlin.de/~kweinert/R"))

For Windows users, only a R 2.13 binary is available, users of other R versions need to add the type="source" parameter.

To transform the code fragments into data tables and diagrams, we use the ascii package by David Hajage, which will get installed as a dependency. ascii itself customizes and enhances Friedrich Leisch’s Sweave tool.

To convert markdown text to HTML we use Pandoc, an universal document converter by John MacFarlane. You need to install it by yourself and make sure it is available on the path. I use version 1.8.

In order to access blogger and picasa, you need python (version >= 2.5) with Google’s gdata python client installed. The python binary needs to be on the search path.

It is assumed that you have an account at blogger and picasa, and that you have created blog titled “myblog” and that you have an album at picasa with the same name.

First post

Here is a small sample post, copied from Wikipedia. I do not go into details on the format, you will need to study the Sweave and Pandoc manuals.

I give these texts the extension .Pnw, so let’s assume the text is saved as myfirstpost.Pnw. Now, to put this document on blogger, you just load the package, instantiate a blogger object and pass the object and the name of the text file to the function pnw_to_blogger:

> library(mdtools)
> b <- blogger(username = "name@gmail.com", password = "scr",
+ blog = "myblog")
> pnw_to_blogger("myfirstpost.Pnw", b)

With these commands, the article gets posted.

Caveats, future directions

Please note that

  • Pandoc currently does not support keywords or tags on the post. You have to set them in bloggers web interface.
  • It is checked if a blog post with the same title was posted in the last 6 months. If so, this post is replaced, i.e. overwritten by the new post.
  • Currently, the ascii package does not support captions on figures.

Since Pandoc is a univeral document converter, I plan to add converters to PDF and EPUB. Also, I am playing with a simple tcltk editor for markdown documents. Lastly, it seems not that difficult to add support for wordpress blogs.

2 comments:

Ana Nelson said...

WordPress integration should be straightforward, if you want to copy from Python XMLRPC code you can look here: https://bitbucket.org/ananelson/dexy/src/18b865fc832e/handlers/word_press_handler.py (most of the code relates to uploading generated images, new_post and update_post are relevant for the content).

Karsten W. said...

Thank you for the pointer -- it looks good, I will try it next weekend. dexy looks promising, too.