2012-06-24

Querying DBpedia from R

DBpedia is an extract of structured information from wikipedia. The structured data can be retrieved using an SQL-like query language for RDF called SPARQL. There is already an R package for this kind of queries named SPARQL.

There is an S4 class Dbpedia part of my datamart package that aims to support the creation of predefined parameterized queries. Here is an example that retrieves data on German Federal States:

> library(datamart) # version 0.5 or later
> dbp <- dbpedia()

# see a list of predefined queries
> queries(dbp)
[1] "Nuts1"  "PlzAgs"

# lists Federal States
> head(query(dbp, "Nuts1"))[, c("name", "nuts", "gdp")]
                    name nuts    gdp
1                Hamburg  DE6  94.43
2      Baden-Württemberg  DE1 376.28
3 Mecklenburg-Vorpommern  DE8  35.78
4        Rheinland-Pfalz  DEB 107.63
5              Thüringen  DEG  49.87
6                 Berlin  DE3  101.4

It is straightforward to extend the Dbpedia class for further queries. More challenging in my opinion is to figure out useful queries. Some examples can be found at Bob DuCharme's blog, in the article by Jos van den Oever at kde.org, in a discussion on a mailing list and a tutorial at the W3C, at Kingsley Idehen's blog and at DBpedia's wiki.

No comments: