The tidycensus
and tmap
R packages make an incredible duo for working with and visualizing US Census data. The tidycensus
package, authored by Kyle Walker, streamlines geographic and tabular data downloads while the tmap
package, written by Martijn Tennekes, vastly simplifies creating maps with multiple layers, accepts many different spatial object types and makes it easy to add scale bars, north arrows and other cartographic details.
- Part 1: Using
tidycensus
to collect US Census data - Part 2: Creating beautiful maps with
tmap
- Super easy mapping
- Projecting data on-the-fly
- Simplify polygons or lines
- Working with colors and cuts
- Customizing layout features and adding attributes to the map
- Adding additional layers to the map
- Adding multiple maps to a page
- Taking a deeper dive with leaflet
- Working with rasters
- Summary:
tmap
is awesome!
Note that this post is designed to allow you to skip ahead to the tmap section if mapping is what you’re interested in.
This post is is intended to mimic a real-world task that requires both data collection and data visualization of geographic data and is broken into in two parts. Part 1 focuses on the collection of Census data using tidycensus
. Referring to the data in Part 1, Part 2 outlines the map-making process using tmap
. If you’d prefer to skip to Part 2 you can download the Census data here.
This post assumes a basic understanding of data manipulation with dplyr
and a basic understanding of working with spatial objects using the sf
package. If you need a refresher on working with spatial data in R we recommend the following:
- Our DataCamp course on working with sf and raster objects in R
- Our previous blog post on mapping and analyzing raster data in R
- Our previous blog post on reading spatial files into R
We are currently using version 0.8.1 of tidycensus
and version 2.1-1 of tmap
. Below are all of the packages we will be using in this post.
library(ggplot2) # For plotting
library(tidycensus) # For downloading Census data
library(tmap) # For creating tmap
library(tmaptools) # For reading and processing spatial data related to tmap
library(dplyr) # For data wrangling
library(sf) # For reading, writing and working with spatial objects
Part 1: Using tidycensus
to collect US Census data
Before we begin mapping we’ll grab some data from the US Census using the tidycensus
package. Similar to other packages that access the Census and American Community Survey (ACS) you’ll need to acquire an API key. Follow this link to get your Census API key.
As a reminder, if you’d prefer to skip to Part 2 downloading the Census data here will allow you to follow along without completing Part 1.
Download Census data
Health Insurance Coverage Status by Sex and Age
In this post we’ll explore county-level health insurance coverage status in the US, Table B27001 from ACS, for 2012 and 2016. The tidycensus
package has a convenience function, get_acs()
, for allowing the user to download data either as a table or as an sf
object with geometry. For downloading sf
objects use the argument geometry = TRUE
(FALSE
is the default).
In order to download data for the full US leave the argument for state
equal to NULL. Also since we’ll be mapping our data we’ll use the argument shift_geo = TRUE
. This will adjust the positions of Alaska and Hawaii and is very useful for creating thematic maps of the full 50 United States in one view.
Download data for years 2012 and 2016. Notice that we’re grabbing the spatial data for 2016 but not for 2012.
dat12 <- get_acs("county", table = "B27001", year = 2012,
output = "tidy", state = NULL, geometry = FALSE) %>%
rename(`2012` = estimate) %>%
select(-NAME, -moe)
dat16 <- get_acs("county", table = "B27001", year = 2016,
output = "tidy", state = NULL, geometry = TRUE, shift_geo = TRUE) %>%
rename(`2016` = estimate) %>%
select(-moe)
As an aside you can access just the spatial data with “shifted” geography using the following:
# County data
data("county_laea", package = "tidycensus")
# State data
data("state_laea", package = "tidycensus")
Process Census data
For much of this post we’re exploring changes in health insurance coverage between 2012 and 2016 by county in the US. In particular we’re looking at the uninsured young adult population ages 18-34. Our downloaded tables are broken into population by sex, age and health insurance status so we’ll need to do some data wrangling to calculate our final estimates.
Join the 2012 and 2016 data
Use a left_join
to merge the tables to keep all records from the 2016 data and only those from the 2012 data that match. Note that we are making the assumption that the county boundaries have not changed from 2012 to 2016 which will mostly be the case.
The joined table, dat
, will be used for the tabular data processing so we can safely drop the geometry with the function st_geometry()
from the sf
package. Once we’ve completed the tabular data processing, later in Part 1, we will join the tabular data back with the geometry.
dat <- left_join(dat16, dat12, by = c("GEOID", "variable"))
st_geometry(dat) <- NULL # This drops the geometry and leaves a table
head(dat)
## # A tibble: 6 x 5
## GEOID NAME variable `2016` `2012`
## <chr> <chr> <chr> <dbl> <dbl>
## 1 01001 Autauga County, Alabama B27001_001 54387 53571
## 2 01001 Autauga County, Alabama B27001_002 26392 25794
## 3 01001 Autauga County, Alabama B27001_003 2003 2213
## 4 01001 Autauga County, Alabama B27001_004 1982 2209
## 5 01001 Autauga County, Alabama B27001_005 21.0 4.00
## 6 01001 Autauga County, Alabama B27001_006 5119 5174
Assign health insurance categories
Although we want to calculate the percentage of population ages 18-34 without health insurance you’ll notice from the snapshot below that neither “total population ages 18-34” nor “total population without health insurance 18-34” are actual variables. Instead we need to tally males and females and age categories 18-24 and 25-34. We only show a portion of the table below, you can find the full table at Factfinder.
In the table above you can see that the male population ages 18-24, Males 18 to 24 years, is the 9th variable from the top. This corresponds with variable == "B27001_009"
in our table. Similarly Males 18 to 24 years with no health insurance coverage is the 11th variable from the top and corresponds with variable == "B27001_011"
in our table. The lines for the female population follow a similar pattern.
So what we need in the end:
-
- Total male population would be: B27001_009 + B27001_012
- Total female population would be B27001_037 + B27001_040 (not shown in table above)
- Total non-insured male pop: B27001_011 + B27001_014
- Total non-insured female pop: B27001_039 + B27001_042
Pull out the variables needed to calculate the total population ages 18 to 34 and the population without health insurance ages 18-34. We’ll assign these as pop1834
and pop1834ni
respectively. Remove all other variables.
We use the dplyr
function case_when()
instead of a series of if/then statements. Essentially this function allows us to say “if the variable value is B27001_009, B27001_012, B27001_037 or B27001_040 then assign the value of “pop1834” and so on.
dat <- mutate(dat,
cat = case_when(
variable %in% paste0("B27001_0",
c("09","12","37","40")) ~ "pop1834",
variable %in% paste0("B27001_0",
c("11","14","39","42")) ~ "pop1834ni")) %>%
filter(!is.na(cat))
Summarize the data by county-year-category
In order to calculate our estimates we’ll stack the data, separated by year, using tidyr::gather
then group based on GEOID
, NAME
, year
and our new cat
field so that we have counts for total population and population without insurance for each county for each year. Then we use spread()
to push the total population and population without insurance into their own variables. The final table should include a county-year summation for variables pop1834
and pop1834ni
.
# Create long version
dat <- tidyr::gather(dat, year, estimate, c(`2012`, `2016`))
# Group the data by our new categories and sum
dat <- group_by(dat, GEOID, NAME, year, cat) %>%
summarize(estimate = sum(estimate)) %>%
ungroup() %>%
tidyr::spread(cat, estimate)
head(dat)
## # A tibble: 6 x 5
## GEOID NAME year pop1834 pop1834ni
## <chr> <chr> <chr> <dbl> <dbl>
## 1 01001 Autauga County, Alabama 2012 10966 2348
## 2 01001 Autauga County, Alabama 2016 11246 1968
## 3 01003 Baldwin County, Alabama 2012 34353 11012
## 4 01003 Baldwin County, Alabama 2016 37178 9639
## 5 01005 Barbour County, Alabama 2012 5040 1918
## 6 01005 Barbour County, Alabama 2016 4996 1504
Calculate the final estimates
For our final numbers we want % of uninsured population ages 18-34 for both 2012 and 2016. In addition we’ll calculate the difference in % uninsured between years (i.e. 2016 - 2012
).
dat <- mutate(dat, est = (pop1834ni/pop1834) * 100) %>%
select(-c(pop1834, pop1834ni)) %>%
tidyr::spread(year, est) %>%
mutate(diff = `2016`-`2012`)
head(dat)
## # A tibble: 6 x 5
## GEOID NAME `2012` `2016` diff
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 01001 Autauga County, Alabama 21.4 17.5 - 3.91
## 2 01003 Baldwin County, Alabama 32.1 25.9 - 6.13
## 3 01005 Barbour County, Alabama 38.1 30.1 - 7.95
## 4 01007 Bibb County, Alabama 33.0 16.7 -16.3
## 5 01009 Blount County, Alabama 27.3 20.4 - 6.88
## 6 01011 Bullock County, Alabama 20.4 34.5 14.1
Initial visualization of the health insurance data using ggplot
Distributions by year
Take a quick look at the data using ggplot
. Compare the county distributions as well as the median % uninsured by year using facet_wrap
.
datlong <- select(dat, -diff) %>%
tidyr::gather(year, estimate, c(`2012`, `2016`)) %>%
group_by(year) %>%
mutate(med = round(median(estimate, na.rm = TRUE), 1))
ggplot(datlong, aes(estimate)) +
geom_histogram(fill = "firebrick2",
color = "white", bins = 60) +
xlab("Uninsured adults ages 18-34 by county (%)") +
theme(plot.title = element_text(hjust = 0.5)) +
facet_wrap(~year, ncol = 1) +
geom_vline(aes(xintercept = med,
group = year), lty = "dashed") +
geom_text(aes(label = paste("Median = ", med), x = med, y = 55))
Counties with greatest change (+/-) in % uninsured
We’re curious to know which counties experienced the largest increase or decrease in the % of uninsured. Use the function dplyr::top_n
to get the first and last 10 records from the diff
field.
d10 <- top_n(dat, 10, diff) %>%
mutate(type = "Insured population decreased",
difftemp = diff)
i10 <- top_n(dat, -10, diff) %>%
mutate(type = "Insured population increased",
difftemp = abs(diff))
id10 <- bind_rows(list(i10, d10)) %>%
arrange(desc(difftemp))
ggplot(id10) +
geom_col(aes(x = forcats::fct_reorder(NAME, difftemp),
y = difftemp, fill = type)) +
coord_flip() +
scale_fill_manual(values = c("firebrick2", "cyan4")) +
theme(plot.title = element_text(hjust = 0.5),
legend.position = "bottom",
legend.title = element_blank()) +
ggtitle("Counties with the greatest change (+/-) in
insured population, ages 18-34, 2012-2016") +
ylab("Difference in % insured (2016 - 2012)") +
xlab("")
Create a final geographic file for use with tmap
You may remember that our initial file, dat16
, included the actual geography while the file we just created is tabular-only. In order to map the variables we just created we can join dat
with the spatial object dat16
. Because of the way tidycensus
formats the data (stacked) we’ll remove all variables except for one and join the objects by GEOID
and NAME
.
# dat16 is our original geographic object and dat is the tabular data
shp <- dat16 %>%
filter(variable == "B27001_001") # much faster than using distinct()
select(GEOID, NAME) %>%
left_join(dat, by = c("GEOID", "NAME")) %>%
arrange(GEOID) %>%
rename(uninsured_2012 = `2012`,
uninsured_2016 = `2016`,
uninsured_diff = diff)
# Remove the Aleutians West from shp for display purposes.
# NOTE: this isn't necessary since I'm using the shift_geo
# argument in the get_acs function. However if you're not
# using shift_geo or joining to a different spatial layer
# for the full US you may want to consider removing this
# record for display purposes.
shp <- filter(shp, GEOID != "02016")
Part 2: Creating beautiful maps with tmap
One of the best features of tmap
is how easy it is to create maps. The tmap
syntax is modeled after ggplot2
whereby the initial command specifies the shape object and data input (tm_shape()
) and is followed by the map layer (e.g., tm_polygons()
or tm_lines()
). Layers can be stacked similar to ggplot2
using the +
symbol. In addition, attribute elements can be added to the map and maps can be faceted similar to using facets in ggplot2
.
Moving forward we encourage you to explore the tmap
documentation as this package is extremely function-rich and we are only skimming the surface of what’s available in this powerful package.
If you skipped Part 1 (otherwise you can skip this step)
Download the Census data here and create shp
for use with tmap
.
# Create shp file
shp <- st_read("path_to_downloaded_file/acs_2012_2016_county_us_B27001.shp",
stringsAsFactors = FALSE) %>%
rename(uninsured_2012 = un_2012,
uninsured_2016 = un_2016,
uninsured_diff = unnsrd_) %>%
mutate(STFIPS = stringr::str_sub(GEOID, 1, 2))
Super easy mapping
With tmap
there is not a lot of data preparation that needs to happen before mapping. With very little code you can create a simple map.
A) The easiest possible map, just the geography
The “shifted” geography we downloaded from tidycensus
is a huge help. The shift_geo
argument creates subsets of the continental US, Alaska and Hawaii. The author of tmap
has a helpful write-up here with guidance for displaying subsets.
# Define the shape and the layer elements
tm_shape(shp) +
tm_polygons()
B) Add a variable to your map
Below is a map of the 2012 data using all of the tmap
defaults – easy but not all that pretty! The default classification does not work well with this data and in some places it’s difficult to see the counties due to the heavy gray outlines.
tm_shape(shp) +
tm_polygons("uninsured_2012")
C) Change the shapes
The package provides a lot of flexibility in terms of different approaches to mapping your data. Here we use bubbles in place of polygons and note that no additional data processing is required.
tm_shape(shp) +
tm_bubbles("uninsured_2012")
D) Include multiple layers
Here we demonstrate including more than one layer. We initially use sf
code to create a simple table showing the location of the Empire State Building and add this to our map of counties.
# Create a simple geographic object with one point
dat <- data.frame(c("Empire State Building"),
lat = c(40.748595),
long = c(-73.985718))
sites <- sf::st_as_sf(dat, coords = c("long", "lat"),
crs = 4326,
agr = "identity")
tm_shape(shp) +
tm_polygons() +
tm_shape(sites) +
tm_dots(size = 2)
Projecting data on-the-fly
Probably one of the nicest features of tmap
is the ability to project data on-the-fly. There’s no need to re-project or transform your data prior to mapping. Simply use the projection
argument in the tm_shape
function to assign one of the following:
- coordinate reference system (CRS), examples include:
- 2163: U.S. National Atlas Equal Area
- 3857: Mercator
- proj4string, examples include:
- +proj=lcc +lat_1=41.03333333333333 +lat_2=40.66666666666666 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs: NAD83/New York Long Island
- +proj=utm +zone=12 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs: NAD83/UTM zone 12N
- shortcut/name of projection, examples include:
- eck4: Eckert IV
- wintri: Winkel-Tripel
Here’s the Winkel-Tripel projection. Interesting!
tm_shape(shp, projection = "wintri") +
tm_polygons()
Simplify polygons or lines
The simplify_shape
function is incredibly useful, particularly if you work with detailed linework or polygons which can be cumbersome to work with and time consuming to map. The function is from the tmaptools
package and uses the argument fact
, the simplification factor, to adjust the number of coordinates in the given spatial object. If you do a lot of simplication you can also review the st_simplify()
function in sf
and, for the greatest control, the functions in the package rmapshaper
.
The default geographies downloaded using tidycensus
are the Cartographic Boundary Files. For counties the resolution level is 500k. Although much more simplified than the original Tiger Line files, these simplified versions are still fairly large. We could simplify these even more by running the following code:
# Simplify shapes with tmap function
shpsimp <- simplify_shape(shp, fact = 0.05)
Working with colors and cuts
Built-in colors and cuts
The tmap
package makes it very easy to color and classify our data using the style
and palette
arguments.
- The
style
argument refers to the classification method that should be used to “bin” the data. - The
palette
argument refers to the color palette you wish to assign to the data. The type of palette (i.e. sequential, qualitativeand diverging) is automatically determined by the data but can be easily overwritten.
NOTE: The author wrote a useful Shiny app for choosing palettes from RColorBrewer
. This can be accessed by typing tmaptools::palette_explorer()
into your console. To run this you’ll need to make sure you have the shiny
and shinyjs
packages installed. You can also run display.brewer.all()
in your console to view the Color Brewer palettes.
A few of the style
options include:
- quantile
- jenks
- pretty
- equal
- sd
A few of the palette
options include:
- BuPu
- OrRd
- PuBuGn
- YlOrRd
Below is an example map showing the BuPu
color scheme with quantile
classification.
var <- "uninsured_2012"
tm_shape(shp, projection = 2163) +
tm_polygons(var,
style = "quantile",
palette = "BuPu") +
tm_legend(legend.position = c("left", "bottom"))
User-defined classification
For controlling the cut points in our data we’ll drop the style
argument and use breaks
. Notice too in this example that we changed the color of the county outlines and added a little transparency so that they’re not as overwhelming.
cuts <- c(0, 10, 20, 30, 40, 100)
tm_shape(shp, projection = 2163) +
tm_polygons(var,
breaks = cuts,
palette = "BuPu",
border.col = "white",
border.alpha = 0.5) +
tm_legend(legend.position = c("left", "bottom"))
Additional color options
Example 1: Apply type of palette instead of palette scheme
If you don’t know exactly which color scheme to use but you’d like to apply a sequential palette you can use palette = "seq"
. This will apply colors from the first sequential set of colors in the RColorBrewer
color schemes.
tm_shape(shp, projection = 2163) +
tm_polygons(var,
breaks = cuts,
palette = "seq",
border.col = "white",
border.alpha = 0.5) +
tm_legend(legend.position = c("left", "bottom"))
Example 2: Reverse the color scheme
We can also reverse the color schemes with a simple -
. Here we reverse the BuPu
color scheme – which doesn’t make any sense in this context!
tm_shape(shp, projection = 2163) +
tm_polygons(var,
breaks = cuts,
palette = "-BuPu",
border.col = "white",
border.alpha = 0.5) +
tm_legend(legend.position = c("left", "bottom"))
Example 3: Choose custom colors
If you want to assign colors outside of RColorBrewer
simply create a vector of HEX codes and apply them to the palette
argument.
mycols <- c("#f0f4c3", "#dce775", "#cddc39", "#afb42b", "#827717")
tm_shape(shp, projection = 2163) +
tm_polygons(var,
breaks = cuts,
palette = mycols,
border.col = "white",
border.alpha = 0.5) +
tm_legend(legend.position = c("left", "bottom"))
Customizing layout features and adding attributes to the map
Many, many options exist for customizing your map. We’ll highlight the ones that we end up using the most.
Add titles to the map
Add a main map title and a legend title.
- The main title is controlled by the
title
argument intm_layout
. - The legend title is controlled by the
title
argument in the layer (in this example tm_polygons).
mymap <- tm_shape(shp, projection = 2163) +
tm_polygons(var,
breaks = cuts,
palette = "BuPu",
border.col = "white",
border.alpha = 0.5,
title = "Uninsured (%)") +
tm_legend(legend.position = c("left", "bottom")) +
tm_layout(title = "Uninsured adults ages 18-34 by county, 2012",
title.size = 1.1,
title.position = c("center", "top"))
mymap
Increase the map margins
The default value for the inner margins (margins inside the frame) is a little small (0.02). Make room for adding other elements to the map. The order of inner.margins
inputs is bottom, left, top and right. Values can be between 0 and 1.
mymap + tm_layout(inner.margins = c(0.06, 0.10, 0.10, 0.08))
Add a scalebar and north arrow: the defaults
The default location for both the scalebar and north arrow is the bottom-right corner. Not so pretty.
mymap +
tm_scale_bar() +
tm_compass()
Add a scalebar and north arrow: customized
We want the scalebar to show units in miles, not kilometers. To do this we’ll need to add the unit
argument to the tm_shape
function (not the tm_compass
function).
# Add unit argument to tm_shape
tm_shape(shp, projection = 2163, unit = "mi")
# Customize scale bar and north arrow
mymap + tm_scale_bar(color.dark = "gray60",
position = c("right", "bottom")) +
tm_compass(type = "4star", size = 2.5, fontsize = 0.5,
color.dark = "gray60", text.color = "gray60",
position = c("left", "top"))
Adding additional layers to the map
We want to add state boundaries to the map. We can do this using 1 of 2 methods:
- Read in another spatial object and add it as a layer.
- Use the function
aggregate_map
fromlibrary(tmaptools)
to aggregate our county layer by state FIPS code and add it as a layer.
Below is an example of the second option.
# Create a state FIPS field
shp <- mutate(shp, STFIPS = stringr::str_sub(GEOID, 1, 2))
# Aggregate the county layer by state
states <- shp %>%
aggregate_map(by = "STFIPS")
# Add the new state layer
mymap + tm_shape(states) +
tm_borders(col = "black")
Voila!
Adding multiple maps to a page
There are a several options for adding multiple maps to the same page. The method you decide to use depends on the format of your data and what it is you’re trying to accomplish.
Using tmap_arrange
The function tmap_arrange
allows users to add objects to the page and separately control the look and order of the objects. For example if we want the maps to have different colors for the state boundaries we might create separate map objects and use the tmap_arrange
function to plot them.
First write a function that includes all of our previously described elements but allows us to tweak the features that we want.
createMap <- function(.data, varname, statecol, maptitle){
tm_shape(.data, projection = 2163, unit = "mi") +
tm_polygons(varname,
breaks = cuts,
palette = "BuPu",
border.col = "white",
border.alpha = 0.5,
title = "Uninsured (%)") +
tm_legend(legend.position = c("left", "bottom")) +
tm_layout(title = maptitle,
title.size = 1.1,
title.position = c("center", "top"),
inner.margins = c(0.06, 0.10, 0.10, 0.08),
frame = FALSE) +
tm_scale_bar(color.dark = "gray60",
position = c("right", "bottom")) +
tm_compass(type = "4star", size = 2.5, fontsize = 0.5,
color.dark = "gray60", text.color = "gray60",
position = c("left", "top")) +
tm_shape(states) +
tm_borders(col = statecol)
}
m1 <- createMap(shp,
varname = "uninsured_2012",
statecol = "green",
maptitle = "Here is title 1")
m2 <- createMap(shp,
varname = "uninsured_2016",
statecol = "yellow",
maptitle = "Here is title 2")
tmap_arrange(m1, m2, ncol = 1)
Using the aesthetic argument with “wide” data
Another way to access the facet functionality is to name the variables in the aesthetic argument of the layer. This works if your data is “wide”, that is if you have separate columns for variables, similar to what we have in our shp
object.
Below we identify the variables for mapping. Since we’ll be mapping 2 variables we can also assign 2 map titles.
var2 <- c("uninsured_2012", "uninsured_2016")
title2 <- c("Uninsured adults ages 18-34 by county, 2012",
"Uninsured adults ages 18-34 by county, 2016")
createMap(shp,
varname = var2,
statecol = "black",
maptitle = title2)
Using facets with “long” data
Faceting in tmap
is very similar to faceting in ggplot
, that is, small multiples will be created based on a given variable.
Below we use the tm_facet
function and the by
argument. In order for this to work with our data we’ll need to create a “long” version. Use the gather
function from tidyr
. As you can see below we now have 2 records for each county, the 2012 and 2016 estimates.
shplong <- select(shp, GEOID, NAME, uninsured_2012, uninsured_2016) %>%
tidyr::gather(year, estimate, c(uninsured_2012, uninsured_2016)) %>%
arrange(GEOID)
glimpse(shplong)
## Observations: 6,282
## Variables: 5
## $ GEOID <chr> "01001", "01001", "01003", "01003", "01005", "01005",...
## $ NAME <chr> "Autauga County, Alabama", "Autauga County, Alabama",...
## $ year <chr> "uninsured_2012", "uninsured_2016", "uninsured_2012",...
## $ estimate <dbl> 21.41164, 17.49956, 32.05542, 25.92662, 38.05556, 30....
## $ geometry <sf_geometry [m]> MULTIPOLYGON (((1269841 -13..., MULTIPOLY...
mymap <- tm_shape(shplong, projection = 2163) +
tm_polygons("estimate",
breaks = cuts,
palette = "BuPu", border.col = "white",
border.alpha = 0.3,
title = "Uninsured (%)") +
tm_facets(by = "year", free.coords = TRUE, ncol = 1) +
tm_shape(states) +
tm_borders(col = "black")
Taking a deeper dive with leaflet
If you remember the plot we made in this section you’ll notice that Texas had the most counties in each category (greatest change (+/-) in insured population, 2012-2016):
- Insured population increased: 4 counties in Texas were in the top 10
- Insured population decreased: 3 counties in Texas were in the top 10
Perhaps this isn’t surprising given that Texas has the most counties in the country…by far. There are 254 counties in TX followed by 159 counties in GA. In any case let’s take a closer look at Texas to see how the other counties stack up.
Prepare the Texas data
Filter to Texas and calculate several fields that will be used in the leaflet map. Instead of a chloropleth map we’ll use the function tm_bubbles
to map our data.
# Filter to Texas and create some additional
# fields for mapping. Since we'll be using
tx <- filter(shp, STFIPS == "48") %>%
mutate(NAME = stringr::str_remove(NAME, ", Texas"),
`Insured Adults Ages 18-34, 2012-2016` = case_when(
uninsured_diff < 0 ~ "Insured population increased",
uninsured_diff > 0 ~ "Insured population decreased",
uninsured_diff == 0 ~ "Insured population stayed the same"),
diff2 = round(abs(uninsured_diff), 1),
popup = ifelse(uninsured_diff != 0,
paste0(`Insured Adults Ages 18-34, 2012-2016`, " by ", diff2, "%"),
`Insured Adults Ages 18-34, 2012-2016`),
diffrad = as.numeric(cut(diff2, c(0, 5, 10, 20, 30, 45),
right = FALSE)))
# Remove some unnecessary fields
tx <- select(tx, -c(uninsured_2012, uninsured_2016, uninsured_diff, STFIPS))
Map the Texas data using tmap::tmap_leaflet()
The leaflet function in tmap
is super easy. Create a tmap
as you normally would and send it to the leaflet widget using the function tmap_leaflet
.
# Basemap
carto <- "https://cartodb-basemaps-{s}.global.ssl.fastly.net/light_all/{z}/{x}/{y}{r}.png"
# Create a "normal" tmap except we'll add
# the basemap and the `popup.vars` argument.
# The symbol size of the bubbles will be
# based on the data so use our calculated
# field `diffrad` which will apply sizes
# 1 through 5. Sizes can be further adjusted
# using the `scale` argument.
mymap <- tm_basemap(carto) +
tm_shape(tx) +
tm_borders(col = "azure2") +
tm_bubbles("diffrad",
col = "Insured Adults Ages 18-34, 2012-2016",
border.col = "white",
scale = 1.5,
style = "fixed",
palette = c("coral2", "aquamarine3", "gray"),
popup.vars = c("County: " = "NAME", "Change: " = "popup"))
tmap_leaflet(mymap)
Working with rasters
The syntax for adding raster elements in tmap
is similar to vectors. Below we construct a map using global land cover data provided by the tmap
package. The land cover data is a raster brick which contains several layers (land cover, tree cover and elevation).
Map the land cover layer and use the argument bbox
to assign the bounding box of the US counties to that of the raster. Keep in mind that our shp
object has the shifted geography so we’ll need to create another version without Alaska or Hawaii.
# Load the raster data
data(land)
# Remove AK and HI
shp_rmAKHI <- filter(shp, !STFIPS %in% c("02", "15"))
# Create a color palette for land cover
# This was taken directly from the tmap documentation
pal20 <- c("#003200", "#3C9600", "#006E00", "#556E19", "#00C800",
"#8CBE8C", "#467864", "#B4E664", "#9BC832", "#EBFF64", "#F06432",
"#9132E6", "#E664E6", "#9B82E6", "#B4FEF0", "#646464", "#C8C8C8",
"#FF0000", "#FFFFFF", "#5ADCDC")
# Map the raster data and assign the bounding box of
# the county layer. Add the county layer on top.
tm_shape(land, bbox = shp_rmAKHI) +
tm_raster("cover", palette = pal20, alpha = 0.8) +
tm_shape(shp_rmAKHI) +
tm_borders(alpha = 0.4, col = "black") +
tm_layout(inner.margins = c(0.06, 0.10, 0.10, 0.08)) +
tm_layout(legend.position = c("left","bottom"))
Summary: tmap
is awesome!
The tmap
package has streamlined the process such that pretty map-making in R can be achieved with ease! In addition to that tmap
includes some very useful functions for aggregating, clipping and simplifying data. The list is long and definitely worth checking out 🙂
This is arguably my favorite mapping post. Curious, however, is their advantage to the mutate step for the `cat` variable in `id10` [factor(1:length(NAME)…]? Seems like this results in a lot of wizardry to `geom_rect` and `scales_etc` for getting positing correct versus the `forcats::fct_recorder` approach. Something like: `ggplot(id10) + geom_col(x = fct_reorder(NAME, difftemp), y = difftemp, fill = type))` but not sure if there are drawbacks?
Thanks Jason for your comment – this is a great suggestion! Using forcats::fct_reorder() is indeed a better approach. The code has been updated 🙂
Thanks!
Hollie
Hi Hollie!
Great blog!
In your example, would it have been possible to label each US state with two letters (shown within the shape of each state), e.g. for on the map. I tried to use tm_text(“abbreviation_state”, size = 1, col = “black”) but did not succeed.
Thanks!
Hi Max –
Thanks for your comment! The state layer we’re using for the map in the blog post does not have a field for state abbreviations so you’ll need to add one. Here’s some code to get you started:
1. Download the table of Census states: https://www.census.gov/geo/maps-data/data/cbf/cbf_state.html
2. In R read in the downloaded shapefile
stabbr <- st_read("path_to_file/cb_2016_us_state_500k.shp", stringsAsFactors = FALSE) %>%
select(GEOID, STUSPS) %>%
st_set_geometry(NULL)
3. Find the code where we create the state shapefile and do the following:
# This is already in the blog post
states <- shp %>%
aggregate_map(by = "STFIPS")
# Run this code which will join the `stabbr` table to the `states` shp
states <- mutate(states, GEOID = unique(shp$STFIPS)) %>%
left_join(stabbr, by = "GEOID")
4. Now you should be able to use the `tm_text` function to add the state abbreviations
mymap + tm_shape(states) +
tm_borders(col = "black") +
tm_text("STUSPS", size = 0.7)
Hi Hollie! I need your help! After I write the shp file and try to run it through tm_shape, I get this error:
Error in get_projection(mshp_raw, output = “crs”) :
shp is neither a sf, sp, nor a raster object
This is for a separate project so I cannot simply use the downloadable shp file. Do you know how to make the shp dataframe we wrote into a .shp file?
Thanks!
We’d need to know more about your process. Can you put it in a stackoverflow post and send the link. Your process should be something like:
x < - st_read("my_shape.shp"); tm_shape(x) + tm_polygons()
Really great, thanks–tmap is awesome! I’ve been really frustrated with my previous experience mapping SF objects with other R options. ggplot2 works well for SP objects, but I have struggled getting custom mapping for SF objects.