2010-03-13

Rosetta language popularity

Rosetta Code is a community wiki which presents how to solve various programming tasks by different programming languages. Thus, it serves as a dictionary between programming languages, but also as cookbook of programming recipes for a specific language.

One unsolved (until today) programming task for R was to rank languages by popularity. I worked on it using the RJSONIO package from Omegahat and the Mediawiki API. Here I explain the code step by step:

First, let us look up the languages which are defined at Rosetta Code. The wiki has a category for solutions by programming languages, which we will use.


> library(RJSONIO)
> langUrl <- "http://rosettacode.org/mw/api.php?action=query&format=json&cmtitle=Category:Solutions_by_Programming_Language&list=categorymembers&cmlimit=500"
> languages <- fromJSON(langUrl)$query$categorymembers
> languages <- sapply(languages, function(x) sub("Category:", "", x$title))

Now for each programming language, there is a category of the users of the language. We iterate over all languages and count the category members.


> user <- function (lang) {
+ userBaseUrl <- "http://rosettacode.org/mw/api.php?action=query&format=json&list=categorymembers&cmlimit=500&cmtitle=Category:"
+ userUrl <- paste(userBaseUrl, URLencode(paste(lang, " User", sep="")),sep="")
+ length(fromJSON(userUrl)$query$categorymembers)
+ }
> users <- sapply(languages, user)

Now we can print out the top 15 languages:


> head(sort(users, decreasing=TRUE),15)
C C++ Java Python JavaScript Perl UNIX Shell
55 55 37 32 27 27 22
Pascal BASIC PHP SQL Haskell AWK C sharp
20 19 19 18 17 16 16
Ruby
14

It is very straightforward to work with the Mediawiki API, and it offers many other different features. It would be nice to have a S3 class that does all the URL encoding. There is already a project wikirobot on R-forge, but I did not look into it yet.