Tips for reading spatial files into R with rgdal

R has become a go-to tool for spatial analysis in many settings. You can read and edit spatial data, conduct geoprocessing and spatial analysis and create static and interactive maps. Of course, the first step in spatial analysis with R is often reading in your spatial data and this step can be confusing and frustrating.

The super-powerful grandfather of functions for reading vector-based spatial data is readOGR from the package rgdal. You can use this function to read in dozens of different formats but the syntax can be odd and, importantly, is different for different input types. As an example, you might surprised to learn that one of these lines of code for reading a shapefile will work fine and one will fail can you guess which is which?

readOGR(dsn="x:/junk", layer="myfile")  # works fine
readOGR(dsn="x:/junk/", layer="myfile") # fail

That trailing backslash is toxic and I'll bet that this single idiosynchrasy has prevented more than one researcher from conducting spatial analysis in R. Navigating these oddities is daunting but once you learn the syntax for the limited number of file types you likely work you’ll be able to comfortably move on to the fun of spatial analysis in R. This post focuses on four formats that are among the most common: shapefiles, MapInfo files, GeoJSON and GPS (with a GPX extension).

Note that this post is limited to reading and writing vector data. For raster data I use the raster package rather than readGDAL from rgdal and I find that these functions (raster, brick and stack) are more straightforward and work smoothly. An example of using the raster function can be found in our post on analyzing raster data in R.

Setup

Files we're using

You can download the files I use in the post as a ZIP here. For the blog post I'm storing all files in my directory at X:/spatial_files these include:

  1. Heritage_Trees_pdx: An ESRI Shapefile (made up of four files) of heritage trees from Portland, Oregon. Original source is here.
  2. providence_park: a GeoJSON file (just one file) that I manually created using geojson.io.
  3. census_place_pdx: A MapInfo file (made up of four files) of Portland as a Census “Place”. Downloaded with the tigris package and exported to MapInfo.
  4. portland_track: A GPS file (just one file) that I manually created with this site.

The rgdal package

The rgdal package has been around for more than a decade and provides bindings to the incredible Geospatial Data Abstraction Library (GDAL) for reading, writing and converting between spatial formats. You need to install the rgdal package before you can run any of the code in this post. Keep in mind that there are several specialty packages for reading or writing various formats (e.g., geojsonio, plotKML) that you might consider using and I use occasinoally. But, if possible, I prefer to use one function, one package for reading spatial files and so this post focuses on readOGR.

Vector drivers

To read or write a specific file type you need to make sure that you have the drivers installed. For the four file types I cover below, these should be all installed by default but you should double check using the ogrDrivers function. Here are the drivers on my machine. Yours may be slightly different but you likely have ESRI Shapefile, MapInfo File, GeoJSON and GPX:

library(rgdal)
ogrDrivers()$name
##  [1] "AeronavFAA"     "ARCGEN"         "AVCBin"         "AVCE00"        
##  [5] "BNA"            "CSV"            "DGN"            "DXF"           
##  [9] "EDIGEO"         "ESRI Shapefile" "Geoconcept"     "GeoJSON"       
## [13] "Geomedia"       "GeoRSS"         "GML"            "GMT"           
## [17] "GPSBabel"       "GPSTrackMaker"  "GPX"            "HTF"           
## [21] "Idrisi"         "KML"            "MapInfo File"   "Memory"        
## [25] "MSSQLSpatial"   "ODBC"           "ODS"            "OpenAir"       
## [29] "OpenFileGDB"    "PCIDSK"         "PDF"            "PDS"           
## [33] "PGDump"         "PGeo"           "REC"            "S57"           
## [37] "SDTS"           "SEGUKOOA"       "SEGY"           "SUA"           
## [41] "SVG"            "SXF"            "TIGER"          "UK .NTF"       
## [45] "VRT"            "Walk"           "WAsP"           "XLSX"          
## [49] "XPlane"

A note about the data source name (dsn) argument

Although there are specific examples below, it's worth noting that for file types that require a dsn argument without a layer name you need to remember to leave off the trailing forward slash. For files in your current working directory you can use "." You can also use relative paths. Examples here:

# These are valid dsn arguments for reading a shapefile for a shapefile. 
# that I have in x:/junk and in x:/
readOGR(dsn="x:/junk", layer="myfile") # full path to x:/junk
readOGR(dsn=".", layer="myfile") # current working directory X:/
readOGR(dsn="junk", layer="myfile") # relative path X:/junk

Reading and writing shapefiles with rgdal

With shapefiles the dsn argument requires a directory path (without the filename and without the trailing forward slash) and the layer argument requires a layer name without the suffix.

# no trailing slash in dsn and no suffix in layer! The shapefile is at
# X:/spatial_files/Heritage_Trees_pdx.shp and the current working director
# is X:/
trees <- readOGR(dsn="spatial_files", layer="Heritage_Trees_pdx")
## OGR data source with driver: ESRI Shapefile 
## Source: "spatial_files", layer: "Heritage_Trees_pdx"
## with 291 features
## It has 14 fields
bubble(trees['HEIGHT'], col=rgb(0.5,0.5,1,0.5))

plot of chunk unnamed-chunk-5

Similar syntax for Writing a shapefile:

# write to current directory: x:/trees2.shp
writeOGR(trees, dsn=".", layer="trees2", driver="ESRI Shapefile")

Reading and writing geojson with rgdal

GeoJSON is an increasingly common format. For testing purposes, it's fun to create and save a layer using the geojson.io site. But reading GeoJSON into R can be challenging if you're not careful. For GeoJSON the dsn argument requires both the layer name and the suffix and the layer arguments should be set to OGRGeoJSON no matter what your actual layer is called! Repeat, no matter what your layer's actual name is you use layer="OGRGeoJSON".

There are other packages for reading and writing geojson like the geojsonio package that I recommend you look at. For me, despite the oddities of readOGR I like using one package, one function if possible.

park <- readOGR(dsn="spatial_files/providence_park.geojson", layer="OGRGeoJSON")
## OGR data source with driver: GeoJSON 
## Source: "spatial_files/providence_park.geojson", layer: "OGRGeoJSON"
## with 1 features
## It has 0 fields

Since the trees and parks have different coordinate systems I need to project before plotting them together:

park <- spTransform(park, CRS(proj4string(trees)))
plot(trees, pch=16, cex=0.5)
# lwd=10 is not adviseable but at least you can see the park
plot(park, add=TRUE, lwd=10, border="forestgreen")

plot of chunk unnamed-chunk-8

For some reason, when writing to GeoJSON you cannot use a period in the middle of the name. So I write a name without a period and then use file.rename to add the period. You can see a conversation about this here. Not ideal but it works.

# writing file x:/myfile.geojson but I need to write a file called
# myfilegeojson and then rename it with the period
writeOGR(park, dsn="myfilegeojson", layer="", driver="GeoJSON")
file.rename("myfilegeojson", "myfile.geojson")

Reading and writing MapInfo files with rgdal

With MapInfo the dsn argument requires a full path and suffix and the layer argument requires the layer name without a suffix.

# The MapInfo file is x:/spatial_files/census_place_pdx.tab
place <- readOGR(dsn="spatial_files/census_place_pdx.tab", layer="census_place_pdx")
## OGR data source with driver: MapInfo File 
## Source: "spatial_files/census_place_pdx.tab", layer: "census_place_pdx"
## with 1 features
## It has 16 fields
# project place to match the trees
place <- spTransform(place, CRS(proj4string(trees)))
plot(place, col="cadetblue", border="grey")
plot(trees, add=TRUE, col="firebrick", pch=16)

plot of chunk unnamed-chunk-11

Writing a MapInfo file is very similar to writing a shapefile:

# write to current directory: x:/place2.tab
writeOGR(place, dsn=".", layer="place2", driver="MapInfo File")

Reading and writing GPS (GPX extension) files with rgdal

GPS files (with a gpx extension) can be read in as either tracks (lines) or waypoints (points). In either case the data source name will be the path with the layer name and suffix. The layer argument, though, no matter what your actual layer name is, will be either "tracks" or "track_points" depending on which you want to read in.

path <- readOGR(dsn = "spatial_files/portland_track.gpx", layer="tracks")
## OGR data source with driver: GPX 
## Source: "spatial_files/portland_track.gpx", layer: "tracks"
## with 1 features
## It has 12 fields
waypoints <- readOGR(dsn = "spatial_files/portland_track.gpx", layer="track_points")
## OGR data source with driver: GPX 
## Source: "spatial_files/portland_track.gpx", layer: "track_points"
## with 35 features
## It has 24 fields
plot(path, col="coral4", lwd=2)
plot(waypoints, add=TRUE, pch=16, col="chartreuse4")

plot of chunk unnamed-chunk-14

For writing a GPX file you provide the full path and layer name under dsn and then tracks or track_points for layer:

writeOGR(path, dsn="mypath.gpx", layer="tracks", driver="GPX")

Conclusion

Working with rgdal is not pretty but it's a powerful and important tool for reading vector data. Knowing the quirks and creating a cheat sheet for yourself will save a lot of hand wringing and allow you to start having fun with spatial analysis in R.

Postscript, R and package details

I'm going to start ending blog posts with sessionInfo() so that readers can know what R and package versions were used.

sessionInfo()
## R version 3.2.3 (2015-12-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] rgdal_1.1-1      sp_1.2-1         knitr_1.10.5     RWordPress_0.2-3
## 
## loaded via a namespace (and not attached):
##  [1] lattice_0.20-33 XML_3.98-1.2    digest_0.6.8    bitops_1.0-6   
##  [5] grid_3.2.3      formatR_1.2     magrittr_1.5    evaluate_0.7   
##  [9] stringi_1.0-1   rmarkdown_0.7   tools_3.2.3     stringr_1.0.0  
## [13] RCurl_1.95-4.6  yaml_2.1.13     htmltools_0.2.6 XMLRPC_0.3-0

2 responses

  1. Hello! In case you need a free online tool to easily convert kml to gpx and the vice versa, I suggest trying http://gpx2kml.com/ . It doesn’t need installation, just upload files and the tool will generate its results.

  2. Great post ! How you can read geoJson file from HDFS ?

    I tried somthing like:

    readOGR(dsn=”hdfs:///user/skywalker/test.geojson”, layer=”OGRGeoJSON”)

    but I receive the following error message

    Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, :

    GDAL Error 3: Cannot open file ‘hdfs:///user/skywalker/test.geojson’

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *