R has become a go-to tool for spatial analysis in many settings. You can read and edit spatial data, conduct geoprocessing and spatial analysis and create static and interactive maps. Of course, the first step in spatial analysis with R is often reading in your spatial data and this step can be confusing and frustrating.
The super-powerful grandfather of functions for reading vector-based spatial data is
readOGR from the package
rgdal. You can use this function to read in dozens of different formats but the syntax can be odd and, importantly, is different for different input types. As an example, you might surprised to learn that one of these lines of code for reading a shapefile will work fine and one will fail can you guess which is which?
readOGR(dsn="x:/junk", layer="myfile") # works fine readOGR(dsn="x:/junk/", layer="myfile") # fail
That trailing backslash is toxic and I'll bet that this single idiosynchrasy has prevented more than one researcher from conducting spatial analysis in R. Navigating these oddities is daunting but once you learn the syntax for the limited number of file types you likely work you’ll be able to comfortably move on to the fun of spatial analysis in R. This post focuses on four formats that are among the most common: shapefiles, MapInfo files, GeoJSON and GPS (with a GPX extension).
Note that this post is limited to reading and writing vector data. For raster data I use the
raster package rather than
rgdal and I find that these functions (
stack) are more straightforward and work smoothly. An example of using the
raster function can be found in our post on analyzing raster data in R.
Files we're using
You can download the files I use in the post as a ZIP here. For the blog post I'm storing all files in my directory at
X:/spatial_files these include:
- Heritage_Trees_pdx: An ESRI Shapefile (made up of four files) of heritage trees from Portland, Oregon. Original source is here.
- providence_park: a GeoJSON file (just one file) that I manually created using geojson.io.
- census_place_pdx: A MapInfo file (made up of four files) of Portland as a Census “Place”. Downloaded with the
tigrispackage and exported to MapInfo.
- portland_track: A GPS file (just one file) that I manually created with this site.
rgdal package has been around for more than a decade and provides bindings to the incredible Geospatial Data Abstraction Library (GDAL) for reading, writing and converting between spatial formats. You need to install the
rgdal package before you can run any of the code in this post. Keep in mind that there are several specialty packages for reading or writing various formats (e.g.,
plotKML) that you might consider using and I use occasinoally. But, if possible, I prefer to use one function, one package for reading spatial files and so this post focuses on
To read or write a specific file type you need to make sure that you have the drivers installed. For the four file types I cover below, these should be all installed by default but you should double check using the
ogrDrivers function. Here are the drivers on my machine. Yours may be slightly different but you likely have
library(rgdal) ogrDrivers()$name ##  "AeronavFAA" "ARCGEN" "AVCBin" "AVCE00" ##  "BNA" "CSV" "DGN" "DXF" ##  "EDIGEO" "ESRI Shapefile" "Geoconcept" "GeoJSON" ##  "Geomedia" "GeoRSS" "GML" "GMT" ##  "GPSBabel" "GPSTrackMaker" "GPX" "HTF" ##  "Idrisi" "KML" "MapInfo File" "Memory" ##  "MSSQLSpatial" "ODBC" "ODS" "OpenAir" ##  "OpenFileGDB" "PCIDSK" "PDF" "PDS" ##  "PGDump" "PGeo" "REC" "S57" ##  "SDTS" "SEGUKOOA" "SEGY" "SUA" ##  "SVG" "SXF" "TIGER" "UK .NTF" ##  "VRT" "Walk" "WAsP" "XLSX" ##  "XPlane"
A note about the data source name (
Although there are specific examples below, it's worth noting that for file types that require a
dsn argument without a layer name you need to remember to leave off the trailing forward slash. For files in your current working directory you can use
"." You can also use relative paths. Examples here:
# These are valid dsn arguments for reading a shapefile for a shapefile. # that I have in x:/junk and in x:/ readOGR(dsn="x:/junk", layer="myfile") # full path to x:/junk readOGR(dsn=".", layer="myfile") # current working directory X:/ readOGR(dsn="junk", layer="myfile") # relative path X:/junk
Reading and writing shapefiles with
With shapefiles the
dsn argument requires a directory path (without the filename and without the trailing forward slash) and the
layer argument requires a layer name without the suffix.
# no trailing slash in dsn and no suffix in layer! The shapefile is at # X:/spatial_files/Heritage_Trees_pdx.shp and the current working director # is X:/ trees <- readOGR(dsn="spatial_files", layer="Heritage_Trees_pdx") ## OGR data source with driver: ESRI Shapefile ## Source: "spatial_files", layer: "Heritage_Trees_pdx" ## with 291 features ## It has 14 fields bubble(trees['HEIGHT'], col=rgb(0.5,0.5,1,0.5))
Similar syntax for Writing a shapefile:
# write to current directory: x:/trees2.shp writeOGR(trees, dsn=".", layer="trees2", driver="ESRI Shapefile")
Reading and writing geojson with
GeoJSON is an increasingly common format. For testing purposes, it's fun to create and save a layer using the geojson.io site. But reading GeoJSON into R can be challenging if you're not careful. For GeoJSON the
dsn argument requires both the layer name and the suffix and the
layer arguments should be set to
OGRGeoJSON no matter what your actual layer is called! Repeat, no matter what your layer's actual name is you use
There are other packages for reading and writing geojson like the
geojsonio package that I recommend you look at. For me, despite the oddities of
readOGR I like using one package, one function if possible.
park <- readOGR(dsn="spatial_files/providence_park.geojson", layer="OGRGeoJSON") ## OGR data source with driver: GeoJSON ## Source: "spatial_files/providence_park.geojson", layer: "OGRGeoJSON" ## with 1 features ## It has 0 fields
Since the trees and parks have different coordinate systems I need to project before plotting them together:
park <- spTransform(park, CRS(proj4string(trees))) plot(trees, pch=16, cex=0.5) # lwd=10 is not adviseable but at least you can see the park plot(park, add=TRUE, lwd=10, border="forestgreen")
For some reason, when writing to GeoJSON you cannot use a period in the middle of the name. So I write a name without a period and then use
file.rename to add the period. You can see a conversation about this here. Not ideal but it works.
# writing file x:/myfile.geojson but I need to write a file called # myfilegeojson and then rename it with the period writeOGR(park, dsn="myfilegeojson", layer="", driver="GeoJSON") file.rename("myfilegeojson", "myfile.geojson")
Reading and writing MapInfo files with
With MapInfo the
dsn argument requires a full path and suffix and the
layer argument requires the layer name without a suffix.
# The MapInfo file is x:/spatial_files/census_place_pdx.tab place <- readOGR(dsn="spatial_files/census_place_pdx.tab", layer="census_place_pdx") ## OGR data source with driver: MapInfo File ## Source: "spatial_files/census_place_pdx.tab", layer: "census_place_pdx" ## with 1 features ## It has 16 fields
# project place to match the trees place <- spTransform(place, CRS(proj4string(trees))) plot(place, col="cadetblue", border="grey") plot(trees, add=TRUE, col="firebrick", pch=16)
Writing a MapInfo file is very similar to writing a shapefile:
# write to current directory: x:/place2.tab writeOGR(place, dsn=".", layer="place2", driver="MapInfo File")
Reading and writing GPS (GPX extension) files with
GPS files (with a
gpx extension) can be read in as either tracks (lines) or waypoints (points). In either case the data source name will be the path with the layer name and suffix. The layer argument, though, no matter what your actual layer name is, will be either
"track_points" depending on which you want to read in.
path <- readOGR(dsn = "spatial_files/portland_track.gpx", layer="tracks") ## OGR data source with driver: GPX ## Source: "spatial_files/portland_track.gpx", layer: "tracks" ## with 1 features ## It has 12 fields waypoints <- readOGR(dsn = "spatial_files/portland_track.gpx", layer="track_points") ## OGR data source with driver: GPX ## Source: "spatial_files/portland_track.gpx", layer: "track_points" ## with 35 features ## It has 24 fields
plot(path, col="coral4", lwd=2) plot(waypoints, add=TRUE, pch=16, col="chartreuse4")
For writing a GPX file you provide the full path and layer name under
dsn and then
track_points for layer:
writeOGR(path, dsn="mypath.gpx", layer="tracks", driver="GPX")
rgdal is not pretty but it's a powerful and important tool for reading vector data. Knowing the quirks and creating a cheat sheet for yourself will save a lot of hand wringing and allow you to start having fun with spatial analysis in R.
Postscript, R and package details
I'm going to start ending blog posts with sessionInfo() so that readers can know what R and package versions were used.
sessionInfo() ## R version 3.2.3 (2015-12-10) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## Running under: Windows 7 x64 (build 7601) Service Pack 1 ## ## locale: ##  LC_COLLATE=English_United States.1252 ##  LC_CTYPE=English_United States.1252 ##  LC_MONETARY=English_United States.1252 ##  LC_NUMERIC=C ##  LC_TIME=English_United States.1252 ## ## attached base packages: ##  stats graphics grDevices utils datasets methods base ## ## other attached packages: ##  rgdal_1.1-1 sp_1.2-1 knitr_1.10.5 RWordPress_0.2-3 ## ## loaded via a namespace (and not attached): ##  lattice_0.20-33 XML_3.98-1.2 digest_0.6.8 bitops_1.0-6 ##  grid_3.2.3 formatR_1.2 magrittr_1.5 evaluate_0.7 ##  stringi_1.0-1 rmarkdown_0.7 tools_3.2.3 stringr_1.0.0 ##  RCurl_1.95-4.6 yaml_2.1.13 htmltools_0.2.6 XMLRPC_0.3-0