Below we list the tools and innovations from the RStudio Conference in Austin, Texas that we found the most useful and exciting.
As with last year’s conference, the event is well organized and the community of R users is enthusiastic and professionally diverse which makes for a fun and interesting conference. And, similar to last year’s conference in San Diego, we return to the office with a wealth of new ideas, new strategies and new tools. This post is intended to give a super-quick summary.
If all you care about is the best tacos or beer in Austin you can jump to the bottom of the post.
Naturally, we could only make a small portion of the talks which means we also missed some great tools — if you think something important is missing you can write us a note or comment. For all talks you can find references to slides and GitHub references at this resource created by Karl Broman.
Discoveries in no particular order:
▶ RStudio job launcher
Beginning with version 1.2 of RStudio Server Pro you can use the GUI to launch your long-running R jobs in another R process either locally or using other services such as dockerized R sessions in Kubernetes. A great new option if you have jobs that take a long time to run. Nice talk by Darby Hadley.
▶ D3 visualizations in RStudio!
D3 is one of our favorite data visualization JavaScript libraries. Using a combination of the r2d3
package and new capabilities in RStudio 1.2 you can create and run D3 visualizations (in JavaScript) directly in the RStudio. The blog post about this from RStudio is here.
▶ Exciting new packages for working with time series data
Earo Wang gave a rich and informative talk on new packages for working with time series data including the new tstibble
package for creating “tidy” tables that contain time series data. This package comes with functions for easily filling gaps, aggregating and others. More details on this and the other time series packages can be found here.
▶ New package for working with spatio-temporal data
Edzer Pebesma, author of one of our favorite packages (sf
), outlined the exciting new stars
package for working with spatio-temporal data. The package elegantly handles spatio-temporal data in a tidyverse-way and smartly can work with very large datasets by leaving pieces of the data, for example, remotely in cloud storage. More at the GitHub page.
▶ Big Data R pipeline with TensorFlow
Heather & Jacqueline Nolis gave a great talk on a real-world big data pipeline using R and TensorFlow. RStudio has put a lot of resources behind tools for running deep learning in R — the keras
package in particular. (As an aside, one of the best technical books I’ve ever read, seriously, is the Deep Learning with R book by François Chollet and RStudio’s J.J. Allaire — excellent). But python leads R in a big way in machine learning big data pipelines. So it was interesting to hear the details behind this set of tools designed to run a “business chat” with customers.
The pipeline uses R’s plumber for the API, Docker containers and neural networks, keras and tensor flow. You can find their slides here and additional details here.
▶ Yihui Xie and pagedown
From my perspective, Yihui Xie is one of the most charming and entertaining speakers at the RStudio conferences. This year was no exception. Yihui demonstrated a new tool for creating a wide range of documents including elegant PDFs, business cards, presentations etc. The tool itself looks great. You can visit the GitHub repository for more details. But I also recommend watching the stream of his talk for his creative use of animated GIFs and funny tangents. Slides are here.
As a slight aside, he mentioned a new function (which is not yet working) for printing HTML documents in Chrome from within R (chrome_print()
), this will come in handy for making nice PDFs from HTML documents like those created with the xaringan
package without leaving the RStudio IDE.
▶ rlang is powerful (but only needed in limited situations)
We have a running debate in this office about when we should be using the tools available in the rlang
package for “tidy eval”. The overhead can add a significant amount of time to a project and it’s hard to determine when to invest the time.
In her talk on the topic, Jenny Bryan, provided a discussion about when to use rlang
. Her take home was basically, you only need to be using rlang
rarely, primarily in instances where you need to compute on expressions and manipulate environments.
She also mentioned that for newer R users time would be better spent learning other aspects of the language including:
- How to write good functions
- Tools specific to your field (maps, time series etc)
- Lists, list-columns, nesting, unnesting
- Functional programming with
purrr
- Scoped
dplyr
verbs likemutate_at()
▶ Add the git branch to your R prompt
This was one of the tiny tricks I learned accidentally. During Hadley Wickham’s talk on his new package, vctrs
, I noticed he had the git branch listed at the R prompt. Apparently this is made possible by the prompt
package from Gábor Csárdi.
▶ New testing tools for Shiny
Joe Cheng, CTO at RStudio, gave a great talk on Shiny in production and introduced several new tools as well as old suggestions for profiling and preparing your shiny apps for production. These include:
- The
shinytest
package for automated UI testing for Shiny - The
shinyloadtest
package for, as you might guess, load testing shiny applications - The
profvis
package (which has been around awhile) for identifying slow processes in your apps - Plot caching in your shiny apps: this was an exciting development for apps that prepare and show the same plots over and over. A great addition and easy to implement.
You can find the Shiny in Production book here.
Barret Schloerke at RStudio gave a talk on the Shiny reactlog — a network graph for tracing the reactivity of a shiny application which is super-useful for reviewing the relationships between reactives and observers.
▶ Function for visualizing billions of dots
The db_compute_raster()
function from the dbplot
package was mentioned quickly in Edgar Ruiz’ talk but it was new to me and potentially very useful — a practical way to visualize billions of dots with the computations themselves performed in the database (rather than in R). More detail.
▶ Nice resource on databases with R
I didn’t know that RStudio maintains a resource with tips and best practices for working with a wide range of databases. Thanks to Edgar Ruiz.
▶ Use feather files for shiny apps
In his talk on Shiny in production Joe Cheng somewhat off-handedly mentioned that for super-fast reading of data Shiny apps should be using feather files. We generally use RDS files and I’m looking forward to testing how they compare to feather files for reading speed.
▶ New tools for Spark including data streams and XGBoost
Javier Luraschi gave an impressive talk on new tools coming to R/RStudio for working with Spark. He demonstrated working with streaming data using Apache Kafka-Spark-Shiny (with Twitter data) and discussed the work in progress for executing boosting models in spark from R.
▶ Rayshader for hillshaded maps
Tyler Morgan-Wall delivered a much-anticipated talk on his tool for hillshading and animating topographical maps, rayshader
. The tool makes it easy to convert, for example, digital elevation models to 3-D rendering and he includes the parlor trick of adding support for 3-D printing.
▶ Automated deployment and scaling of containerized applications
There was no individual talk on the Kubernetes system but it came up in several talks and it’s clear that it is being used widely for automating the deployment and scaling of containerized R applications.
▶ Conference Big Picture
RStudio puts on a great conference with an excellent lineup of speakers. The event is well organized, provides impressively good food and allows for plenty of time to talk with co-participants. The social events feature a nice opportunity to mingle (complete with margaritas and Manhattans) and the event feels welcoming to a wide range of practitioners. Despite a crowd of 1700 there were very few lines which is impressive.
On the Friday, I co-hosted a Birds of a Feather (with Alan Dipert) focused on spatial/maps in R and we had excellent turnout with perhaps 50 or so people joining us during breakfast. We discussed sf
, rayshader
, mapview
, tmap
, leaflet
, stars
and many others. (Thanks to Curtis Kephart or organizing!).
In terms of keynotes, Tareef, president at RStudio, did a nice introduction and overview. He emphasized RStudio’s commitment to open source tools and pointed out that 50% of engineers’ time is devoted to open source tool development. The breadth and depth of tools that RStudio supports is amazing and R developers around the globe benefit on a daily basis.
Joe Cheng focused on Shiny in production. Joe emphasized tools to help evaluate loads and identify pain points. Joe puts a lot of care into crafting his slides and his messages and it comes across in a smooth delivery that keeps the audience engaged and results in an entertaining and informative talk.
On Friday, Felienne did a fascinating talk focused on coding education. She argued that few, if any, studies have investigated how people learn coding. A significant emphasis in current coding education is on problem solving with very little formal guidance. In contrast, there is a ton of research on how people learn to read in general. Using this literature as a foundation, she argues that teaching with “explicit direct instruction”, similar to how reading itself is taught, is preferable to a more exploratory approach. Her talk was thought provoking and novel. And, as a bonus, all of her slides were hand-drawn and far more interesting than a traditional slide deck.
Finally, David Robinson argued in his keynote that public work contributions are a “critical part of a data science career.” Using his own career as an example — he got his first data science job as a direct result of a stackoverflow post — David did a great job of outlining how tweeting, blogging, contributing to open source projects and other avenues for public contributions build experience and exposure that will help further data science careers. Nice work David!
▶ The non-work
Beyond the conference material itself, I had a good time exploring Austin in the little time available. Overall, I think I did pretty well. The highlights include:
Granny’s tacos food truck: Ranchero taco and Abuela taco — both amazing.
Lazarus Brewing: Tasty IPA
Lime Scooter: Perhaps not the best option after a tasty IPA but a nice way to get around.
Bykowski Tailor & Garb: fascinating clothes store started by the guy with the beard. Hand made, western-style and rustic some of the clothes appear on HBO and AMC shows. Complete with gun case by the cash register.
Mort Subite: A ton of Belgian beers to choose from
▶ Post-conference
I made it back from Austin in time to enjoy the snow!