Interactive visualization: from R to D3 using rCharts

Data Driven Documents, or D3 for short, is an incredible JavaScript library for creating interactive data visualization on the web. Earlier this year, for example, we illustrated the power of D3 by interactively linking maps and charts in this visualization.

D3, however, can be challenging to work with, especially if you don't have experience with JavaScript. Luckily, for those who work with R, the package rCharts, created by Ramnath Vaidyanathan, makes moving your R visualizations to the web much easier. The package is designed to help users to create interactive JavaScript visualizations directly from R. One of the nice features of rCharts is that it does not limit itself to a single JavaScript library but rather provides the capacity to create charts using a wide range of libraries including Polychart, Morris, NVD3, xCharts, HighCharts, Rickshaw and even maps using Leaflet. In this example, though, we will focus on D3.

An example using rCharts: spatial packages in R through time

In order to test out the package we're going to develop a line plot showing the growth in the number of spatial data-related packages in R. Spatial data-related packages will be defined as those listed on the Analysis of Spatial Data Task View on the Comprehensive R Archive Network (CRAN) and maintained by Roger Bivand.

1. Install rCharts from GitHub

If you haven't already installed rCharts, unfortunately, it's not quite as simple as typing install.packages("rCharts"). Instead you will need to load it from GitHub using a helper function from the devtools library created by Hadley Wickham. To install you can use the following code:

library(devtools)
install_github('rCharts', 'ramnathv')

2. Use the XML package to grab the list of spatial packages

Now you're ready to begin coding. We will load the XML package to help us grab the list of spatial packages from CRAN and then we do a little formatting.

require(XML)
require(rCharts)

# URL of the task view (thanks Roger Bivand)
url<-"http://cran.r-project.org/web/views/Spatial.html"

# grab list of packages
spat<-readHTMLList(url, stringsAsFactors = F)[[2]]

# For cleanliness drop the word '(core)' from the package name
spat<-gsub(" \\(core\\)", "", spat)

3. Create a function to get package dates

For each package I want to know the most recent date it was published as well as the very first publish date. To do this we will cycle through the package names and for each we will get the most recent date from the main package site and then we will get the earliest package date from the package's archive page. If a package does not have an archive (meaning it's a new package) then we will use the current date as the first date. Here is the function:


getPkgInfo<-function(pkg){
  url = paste("http://cran.r-project.org/web/packages/", pkg, "/index.html", sep="")

  # get table of details and extract most current date
  dtl<-readHTMLTable(url, stringsAsFactors = F)[[1]]
  curdate<-dtl[which(dtl[,1]=="Published:"),2]

  # get first archive date
  url = paste('http://cran.r-project.org/src/contrib/Archive/', pkg, '/', sep="")
  packages = try(readHTMLTable(url, stringsAsFactors = F)[[1]][-1,])

  # if there is no archive first date is current date
  if(class(packages)=='try-error'){
    print("in error")
    firstdate<-as.Date(curdate)

  #otherwise get first archive date
  }else{ 

    packages<-packages[-1,]
    firstdate<-as.Date(packages[1,"Last modified"],"%d-%b-%Y %H:%M")


  }
  return(data.frame(pkg=pkg, firstdate=firstdate, curdate=curdate))
}

4. Use the function to assemble the table

We will run the function on all package names using lapply and then bind them together into one table using do.call.


pkgdates<-do.call("rbind", lapply(spat, getPkgInfo))

pkgdates<-pkgdates[order(pkgdates$firstdate),]
pkgdates$numpkg<-1:nrow(pkgdates)
pkgdates$datechar<-as.character(pkgdates$firstdate)
head(pkgdates)
##          pkg  firstdate    curdate numpkg   datechar
## 8      akima 1998-08-20 2013-09-16      1 1998-08-20
## 124  tripack 1998-08-20 2013-09-16      2 1998-08-20
## 10       ash 1999-04-21 2013-02-11      3 1999-04-21
## 92  sgeostat 1999-04-21 2013-03-01      4 1999-04-21
## 65      nlme 1999-11-23 2014-03-31      5 1999-11-23
## 115  splancs 2000-11-05 2013-09-01      6 2000-11-05

5. Create the chart using rCharts

Now that we have the table we want to use, creating a D3 chart using rCharts is straightforward. Note that rCharts uses NVD3.js, a library designed to build re-usable charts for D3, to create these plots. The function for creating a D3 chart is nPlot and you use it like this:


d3chart<-nPlot(numpkg~firstdate, data=pkgdates, type="lineChart")

6. Customize your chart

I found that the default tooltip and x-axis format for dates were not what I wanted so I used the following code to improve on the defaults. I did find the rCharts documentation to be lacking in places. Finding the right approach to altering the tooltips and the axes took some digging. Hopefully the documentation will be improved in the near future. In the meantime, the rCharts readthedocs is a good place to start. Here is the code to modify the tooltip and x-axis:


d3chart$chart(tooltipContent = "#! function(key, x, y, e){ 
  return 'Package: ' + e.point.pkg
} !#")


d3chart$xAxis(
  tickFormat = "#!function(d) {return d3.time.format('%b %Y')(new Date( d * 86400000 ));}!#"
)

6. Finally, you're ready to save and publish your result

rCharts has a number of ways to save and publish your charts. To simply save to the current working directory you can use the following code:

d3chart$save('d3chart.html', cdn=TRUE)

This saves an HTML file in your working directory. Note that, by default, the paths to dependencies will be on your computer. If you want the script to point to a hosted source of these dependencies (a content delivery network/CDN) you will need to include the argument cdn=TRUE.

A very nice additional piece of functionality is the ability to publish your chart as a gist at GitHub. You simply run the following code – it will ask you for your username and password and add the gist to GitHub. You can view your gist at http://rcharts.github.io/viewer/XXXXX where XXXXX is the gist id.

d3chart$publish('rCharts D3 Spatial Packages', host = 'gist')

Here is the final result, hover over the line to identify the packages published on that date

One response

Leave a Reply

Your email address will not be published. Required fields are marked *