The purrr
package is a functional programming superstar which provides useful tools for iterating through lists and vectors, generalizing code and removing programming redundancies. The purrr
tools work in combination with functions, lists and vectors and results in code that is consistent and concise.
In this post we focus primarily on the map
family of functions which, at their simplest, offer an alternative to the apply
functions and an alternative to for
loops. For more complex uses map
functions can be used to tidily manipulate multi-dimensional datasets and apply statistical models, for example.
This post explores several of the map
family workhorses: map
, pmap
and imap
. It assumes a basic familiarity with the map
functions and their variants (i.e. map_lgl
, map_chr
, etc). If you need a refresher we recommend this documentation. For advanced users looking to speed up their purrr
calculations we recommend our previous blog post on parellelizing your computations using the furrr
package.
Load the packages
library(purrr) # Functional programming
library(dplyr) # Data wrangling
library(tidyr) # Tidy-ing data
library(stringr) # String operations
library(repurrrsive) # Game of Thrones data
library(tidygraph) # Convert data into node/edge format
library(ggraph) # Network graphing
Get the data
For our example, we’re using the “Game of Thrones” dataset from the repurrrsive
package.
dat <- got_chars
The dataset is a list of 30 lists containing various information on Game of Thrones characters. The first list looks something like this:
glimpse(dat[[1]])
## List of 18
## $ url : chr "https://www.anapioficeandfire.com/api/characters/1022"
## $ id : int 1022
## $ name : chr "Theon Greyjoy"
## $ gender : chr "Male"
## $ culture : chr "Ironborn"
## $ born : chr "In 278 AC or 279 AC, at Pyke"
## $ died : chr ""
## $ alive : logi TRUE
## $ titles : chr [1:3] "Prince of Winterfell" "Captain of Sea Bitch" "Lord of the Iron Islands (by law of the green lands)"
## $ aliases : chr [1:4] "Prince of Fools" "Theon Turncloak" "Reek" "Theon Kinslayer"
## $ father : chr ""
## $ mother : chr ""
## $ spouse : chr ""
## $ allegiances: chr "House Greyjoy of Pyke"
## $ books : chr [1:3] "A Game of Thrones" "A Storm of Swords" "A Feast for Crows"
## $ povBooks : chr [1:2] "A Clash of Kings" "A Dance with Dragons"
## $ tvSeries : chr [1:6] "Season 1" "Season 2" "Season 3" "Season 4" ...
## $ playedBy : chr "Alfie Allen"
The map
function
The map
function iteratively applies a function or formula to each element of a list or vector. The operation is similar to a for
loop but with fewer keystrokes and cleaner code. The result of applying map
will be the same length as the input. Since purrr
functions are type-stable there is little guesswork in knowing which type of output will be returned. For example map_chr()
returns character vectors, map_dbl()
returns double vectors, etc.
Example 1: Extract a single element from a list
Extracting an element from a list can be done a number of ways. The following code chunks will produce the same results.
# Method 1: using the name of the list element, similar to dat[[1]]["name"], dat[[2]]["name"], etc
map(dat, "name")
# Method 2: using the `pluck` function
map(dat, pluck("name"))
# Method 3: using the index of the list element
map(dat, 3)
## [[1]]
## [1] "Theon Greyjoy"
##
## [[2]]
## [1] "Tyrion Lannister"
##
## [[3]]
## [1] "Victarion Greyjoy"
##
## [[4]]
## [1] "Will"
##
## [[5]]
## [1] "Areo Hotah"
Example 2a: Create a dataframe from a list (easier)
Create a dataframe from several of the list items. This method will only work if the element you’re requesting (in this case name
, gender
and culture
) has a length of 1.
What this is doing is the equivalent to dat[[1]][c("name", "gender", "culture")]
, dat[[2]][c("name", "gender", "culture")]
and so on. The _dfr
piece tells map
to convert the result to a data.frame
by row.
# The `[` is the function here -- essentially telling it to apply [] to each list
# and the name, gender and culter are the argument passed to []
map_dfr(dat,`[`, c("name", "gender", "culture"))
## # A tibble: 30 x 3
## name gender culture
## <chr> <chr> <chr>
## 1 Theon Greyjoy Male Ironborn
## 2 Tyrion Lannister Male ""
## 3 Victarion Greyjoy Male Ironborn
## 4 Will Male ""
## 5 Areo Hotah Male Norvoshi
## 6 Chett Male ""
## # ... with 24 more rows
Example 2b: Create a dataframe from a list (harder)
If you try running the code above but add “aliases” and “allegiances” you should get the following error: Error: Argument 4 must be length 1, not 4
.
x <- map_dfr(dat,`[`, c("name", "gender", "culture", "aliases", "allegiances"))
## Error: Argument 4 must be length 1, not 4
Take a closer look at the list elements “aliases” and “allegiances”. You’ll notice that some inputs are character vectors of length > 1.
glimpse(map(dat, "aliases"))
## List of 6
## $ : chr [1:4] "Prince of Fools" "Theon Turncloak" "Reek" "Theon Kinslayer"
## $ : chr [1:11] "The Imp" "Halfman" "The boyman" "Giant of Lannister" ...
## $ : chr "The Iron Captain"
## $ : chr ""
## $ : chr ""
## $ : chr ""
In order to include these items in our dataframe we’ll need to create a list-column using map
.
dat_m <- dat %>% {
tibble(
name = map_chr(., "name"),
gender = map_chr(., "gender"),
culture = map_chr(., "culture"),
aliases = map(., "aliases"),
allegiances = map(., "allegiances")
)}
## # A tibble: 30 x 5
## name gender culture aliases allegiances
## <chr> <chr> <chr> <list> <list>
## 1 Theon Greyjoy Male Ironborn <chr [4]> <chr [1]>
## 2 Tyrion Lannister Male "" <chr [11]> <chr [1]>
## 3 Victarion Greyjoy Male Ironborn <chr [1]> <chr [1]>
## 4 Will Male "" <chr [1]> <NULL>
## 5 Areo Hotah Male Norvoshi <chr [1]> <chr [1]>
## # ... with 25 more rows
Example 3: Apply a custom function to a list
Write a function that outputs a statement indicating whether a character is alive or dead. Note that we’re using map_chr
which will output a character vector instead of a list.
Also note that with the final season of Game of Thrones we already know that not all of these are true anymore!
dead_or_alive <- function(x){
ifelse(x[["alive"]], paste(x[["name"]], "is alive!"),
paste(x[["name"]], "is dead :("))
}
map_chr(dat, dead_or_alive)
## [1] "Theon Greyjoy is alive!" "Tyrion Lannister is alive!"
## [3] "Victarion Greyjoy is alive!" "Will is dead :("
## [5] "Areo Hotah is alive!" "Chett is dead :("
Example 4: Bonus, apply a custom function and create a network graph
Create a plot of all of Jon Snow’s aliases. First we’ll use compact
to remove elements from dat
that have length zero or are NULL. For example if you look at Cersei’s aliases you’ll see an empty list. Using compact
will remove this empty list.
# Empty list for Cersei Lannister's `aliases`
dat[[19]]$aliases
## list()
g <- map(dat, compact)
Write a function that pulls the name
and aliases
elements if aliases
exists. Create a tibble and using the function tidygraph::as_tbl_graph
convert the data into proper node and edge data that can be used for the plot.
also_known_as <- function(x){
if ("aliases" %in% names(x)){
g <- tibble(
from = x$name,
to = x$aliases)
g <- as_tbl_graph(g)
}
}
g <- map(g, also_known_as)
The tbl_graph
format should look something like this:
g[[1]]
## # A tbl_graph: 5 nodes and 4 edges
## #
## # A rooted tree
## #
## # Node Data: 5 x 1 (active)
## name
## <chr>
## 1 Theon Greyjoy
## 2 Prince of Fools
## 3 Theon Turncloak
## 4 Reek
## 5 Theon Kinslayer
## #
## # Edge Data: 4 x 2
## from to
## <int> <int>
## 1 1 2
## 2 1 3
## 3 1 4
## # ... with 1 more row
Finally plot the data using ggraph
. Jon Snow’s index within our g
object is 23 – so we’ll use g[[23]]
for his data. Below are all of Jon Snow’s aliases!
ggraph(g[[23]], layout = "graphopt") +
geom_edge_link() +
geom_node_label(aes(label = name),
label.padding = unit(1, "lines"),
label.size = 0) +
theme_graph()
The pmap
function
The pmap
function can be used on an arbitrary number of inputs and is great for doing row-wise iterations on a dataframe.
Example 1: Apply a function to each row
Using the pmap
function we can apply a row-wise operation to our dataset. As a reminder here’s what our dat_m
object looks like:
## # A tibble: 30 x 5
## name gender culture aliases allegiances
## <chr> <chr> <chr> <list> <list>
## 1 Theon Greyjoy Male Ironborn <chr [4]> <chr [1]>
## 2 Tyrion Lannister Male "" <chr [11]> <chr [1]>
## 3 Victarion Greyjoy Male Ironborn <chr [1]> <chr [1]>
## 4 Will Male "" <chr [1]> <NULL>
## 5 Areo Hotah Male Norvoshi <chr [1]> <chr [1]>
## # ... with 25 more rows
If we use pmap
and apply the paste
function but do not specify column names, the result will be all columns pasted together by row. Note that the aliases
column for Theon and Tyrion includes 4 and 11 entries respectively – and each alias is pasted with gender, culture and allegiance so you end up with 4 and 11 strings
pmap(dat_m, paste)
## [[1]]
## [1] "Theon Greyjoy Male Ironborn Prince of Fools House Greyjoy of Pyke"
## [2] "Theon Greyjoy Male Ironborn Theon Turncloak House Greyjoy of Pyke"
## [3] "Theon Greyjoy Male Ironborn Reek House Greyjoy of Pyke"
## [4] "Theon Greyjoy Male Ironborn Theon Kinslayer House Greyjoy of Pyke"
##
## [[2]]
## [1] "Tyrion Lannister Male The Imp House Lannister of Casterly Rock"
## [2] "Tyrion Lannister Male Halfman House Lannister of Casterly Rock"
## [3] "Tyrion Lannister Male The boyman House Lannister of Casterly Rock"
## [4] "Tyrion Lannister Male Giant of Lannister House Lannister of Casterly Rock"
## [5] "Tyrion Lannister Male Lord Tywin's Doom House Lannister of Casterly Rock"
## [6] "Tyrion Lannister Male Lord Tywin's Bane House Lannister of Casterly Rock"
## [7] "Tyrion Lannister Male Yollo House Lannister of Casterly Rock"
## [8] "Tyrion Lannister Male Hugor Hill House Lannister of Casterly Rock"
## [9] "Tyrion Lannister Male No-Nose House Lannister of Casterly Rock"
## [10] "Tyrion Lannister Male Freak House Lannister of Casterly Rock"
## [11] "Tyrion Lannister Male Dwarf House Lannister of Casterly Rock"
Example 2: Apply a function to each row using column names
Apply another function to the table but this time specify the column names.
Initial prep before using pmap
To make this more interesting let’s filter to any character with a Stark or Lannister allegiance. We can do this by extracting “Lannister” or “Stark” from the list of allegiances. Here’s an example of what the allegiances column looks like:
dat_m$allegiances[16:18]
## [[1]]
## [1] "House Stark of Winterfell"
##
## [[2]]
## [1] "House Baratheon of Storm's End" "House Stark of Winterfell"
## [3] "House Tarth of Evenfall Hall"
##
## [[3]]
## [1] "House Stark of Winterfell" "House Tully of Riverrun"
We will use str_extract()
from the stringr
package to pull out the words “Lannister” or “Stark. Note that the ~
and .x
are shorthand for function(x)
and x
as an example. You can see that it returns a vector with NA
values if neither ”Lannister" nor “Stark” exist.
dat_p <- mutate(dat_m,
stark_or_lannister = map(allegiances, ~str_extract(.x, "Lannister|Stark")))
dat_p$stark_or_lannister[16:18]
## [[1]]
## [1] "Stark"
##
## [[2]]
## [1] NA "Stark" NA
##
## [[3]]
## [1] "Stark" NA
We can drop the NA values with the handy discard()
function.
dat_p <- mutate(dat_p, stark_or_lannister = map(stark_or_lannister, ~discard(.x, is.na)))
dat_p$stark_or_lannister[16:18]
## [[1]]
## [1] "Stark"
##
## [[2]]
## [1] "Stark"
##
## [[3]]
## [1] "Stark"
Finally, we can filter to those that have a Lannister or Stark allegiance and then use unnest()
to essentially convert the stark_or_lannister
column to a traditional character column. Careful though, if there happened to be a character with both a Lannister and Stark allegiance and thus the stark_or_lannister
column had two entries unnest()
would create two rows for that character.
dat_p <- filter(dat_p, stark_or_lannister %in% c("Lannister", "Stark")) %>%
unnest(stark_or_lannister)
## # A tibble: 11 x 6
## name gender culture aliases allegiances stark_or_lannist~
## <chr> <chr> <chr> <list> <list> <chr>
## 1 Tyrion Lannist~ Male "" <chr [11]> <chr [1]> Lannister
## 2 Arya Stark Female Northmen <chr [16]> <chr [1]> Stark
## 3 Brandon Stark Male Northmen <chr [3]> <chr [1]> Stark
## 4 Brienne of Tar~ Female "" <chr [3]> <chr [3]> Stark
## 5 Catelyn Stark Female Rivermen <chr [5]> <chr [2]> Stark
## 6 Cersei Lannist~ Female Westerman <NULL> <chr [1]> Lannister
## 7 Eddard Stark Male Northmen <chr [3]> <chr [1]> Stark
## 8 Jamie Lannist~ Male Westerla~ <chr [4]> <chr [1]> Stark
## 9 Jon Snow Male Northmen <chr [8]> <chr [1]> Stark
## 10 Kevin Lannist~ Male "" <chr [1]> <chr [1]> Stark
## 11 Sansa Stark Female Northmen <chr [3]> <chr [2]> Stark
Use pmap
Write a function called whom_can_you_trust
that outputs a string using 3 columns from the table. Note that we are using the ...
(ellipsis) which allows us to apply the function to our table which contains more than the three columns. If we were to remove the ellipsis we would get an error message.
whom_can_you_trust <- function(name, allegiances, stark_or_lannister, ...) {
y <- glue::glue("{name} has an allegiance to the {stark_or_lannister} family")
ifelse(length(allegiances) > 1,
glue::glue("{y} but also has {length(allegiances)-1} other allegiance(s)."),
glue::glue("{y} and no other allegiances."))
}
dat_p %>% pmap_chr(whom_can_you_trust)
## [1] "Tyrion Lannister has an allegiance to the Lannister family and no other allegiances."
## [2] "Arya Stark has an allegiance to the Stark family and no other allegiances."
## [3] "Brandon Stark has an allegiance to the Stark family and no other allegiances."
## [4] "Brienne of Tarth has an allegiance to the Stark family but also has 2 other allegiance(s)."
## [5] "Catelyn Stark has an allegiance to the Stark family but also has 1 other allegiance(s)."
## [6] "Cersei Lannister has an allegiance to the Lannister family and no other allegiances."
## [7] "Eddard Stark has an allegiance to the Stark family and no other allegiances."
## [8] "Jaime Lannister has an allegiance to the Lannister family and no other allegiances."
## [9] "Jon Snow has an allegiance to the Stark family and no other allegiances."
## [10] "Kevan Lannister has an allegiance to the Lannister family and no other allegiances."
## [11] "Sansa Stark has an allegiance to the Stark family but also has 1 other allegiance(s)."
The imap
function
The imap
function can generally be thought of as indexed map. The function uses 2 arguments: the first is the value, the second is the position or the name.
- For objects that do not have names it is shorthand for
map2(x, seq_along(x), ...)
- For objects that have names it is shorthand for
map2(x, names(x), ...)
Example 1: Apply imap to character vector
Create a character vector of 10 names. Using imap
ouput the name (.x) and the index of name (.y).
v <- sample(dat_m$name, 10)
imap_chr(v, ~ paste0(.y, ": ", .x))
## [1] "1: Jon Snow" "2: Asha Greyjoy"
## [3] "3: Daenerys Targaryen" "4: Eddard Stark"
## [5] "5: Brienne of Tarth" "6: Melisandre"
## [7] "7: Kevan Lannister" "8: Davos Seaworth"
## [9] "9: Victarion Greyjoy" "10: Sansa Stark"
Example 2: Apply imap to list with names
For this example we create a new list called dat_i
. The new list has named elements but no longer contains the name
field for demonstration purposes.
dat_i <- dat_m %>% split(.$name) %>%
map(., ~select(.x, -name))
## $`Aeron Greyjoy`
## # A tibble: 1 x 4
## gender culture aliases allegiances
## <chr> <chr> <list> <list>
## 1 Male Ironborn <chr [2]> <chr [1]>
##
## $`Areo Hotah`
## # A tibble: 1 x 4
## gender culture aliases allegiances
## <chr> <chr> <list> <list> A tibble: 6 x 6
## 1 Male Norvoshi <chr [1]> <chr [1]>
##
## $`Arianne Martell`
## # A tibble: 1 x 4
## gender culture aliases allegiances
## <chr> <chr> <list> <list>
## 1 Female Dornish <chr [1]> <chr [1]>
Using imap
and mutate
we can apply the name of the list element (.y) as a new column in our table (.x).
dat_i <- imap(dat_i, ~mutate(.x, name = .y))
## $`Aeron Greyjoy`
## # A tibble: 1 x 5
## gender culture aliases allegiances name
## <chr> <chr> <list> <list> <chr>
## 1 Male Ironborn <chr [2]> <chr [1]> Aeron Greyjoy
##
## $`Areo Hotah`
## # A tibble: 1 x 5
## gender culture aliases allegiances name
## <chr> <chr> <list> <list> <chr>
## 1 Male Norvoshi <chr [1]> <chr [1]> Areo Hotah
##
## $`Arianne Martell`
## # A tibble: 1 x 5
## gender culture aliases allegiances name
## <chr> <chr> <list> <list> <chr>
## 1 Female Dornish <chr [1]> <chr [1]> Arianne Martell
Summary
The purrr
family of functions are an excellent choice for streamlining your code and removing programming redundancies. In this post we highlighted three of our favorite purrr
functions: map
, pmap
and imap
, plus some bonus functions like discard
and compact
. For a complete list of purrr
functions check out the purrr
cheat sheet. Happy coding!
// add bootstrap table styles to pandoc tables function bootstrapStylePandocTables() { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); } $(document).ready(function () { bootstrapStylePandocTables(); });