R has become a go-to tool for spatial analysis in many settings. You can read and edit spatial data, conduct geoprocessing and spatial analysis and create static and interactive maps. Of course, the first step in spatial analysis with R is often reading in your spatial data and this step can be confusing and frustrating.
The super-powerful grandfather of functions for reading vector-based spatial data is readOGR
from the package rgdal
. You can use this function to read in dozens of different formats but the syntax can be odd and, importantly, is different for different input types. As an example, you might surprised to learn that one of these lines of code for reading a shapefile will work fine and one will fail can you guess which is which?
readOGR(dsn="x:/junk", layer="myfile") # works fine
readOGR(dsn="x:/junk/", layer="myfile") # fail
That trailing backslash is toxic and I'll bet that this single idiosynchrasy has prevented more than one researcher from conducting spatial analysis in R. Navigating these oddities is daunting but once you learn the syntax for the limited number of file types you likely work you’ll be able to comfortably move on to the fun of spatial analysis in R. This post focuses on four formats that are among the most common: shapefiles, MapInfo files, GeoJSON and GPS (with a GPX extension).
Note that this post is limited to reading and writing vector data. For raster data I use the raster
package rather than readGDAL
from rgdal
and I find that these functions (raster
, brick
and stack
) are more straightforward and work smoothly. An example of using the raster
function can be found in our post on analyzing raster data in R.
Setup
Files we're using
You can download the files I use in the post as a ZIP here. For the blog post I'm storing all files in my directory at X:/spatial_files
these include:
- Heritage_Trees_pdx: An ESRI Shapefile (made up of four files) of heritage trees from Portland, Oregon. Original source is here.
- providence_park: a GeoJSON file (just one file) that I manually created using geojson.io.
- census_place_pdx: A MapInfo file (made up of four files) of Portland as a Census “Place”. Downloaded with the
tigris
package and exported to MapInfo. - portland_track: A GPS file (just one file) that I manually created with this site.
The rgdal
package
The rgdal
package has been around for more than a decade and provides bindings to the incredible Geospatial Data Abstraction Library (GDAL) for reading, writing and converting between spatial formats. You need to install the rgdal
package before you can run any of the code in this post. Keep in mind that there are several specialty packages for reading or writing various formats (e.g., geojsonio
, plotKML
) that you might consider using and I use occasinoally. But, if possible, I prefer to use one function, one package for reading spatial files and so this post focuses on readOGR
.
Vector drivers
To read or write a specific file type you need to make sure that you have the drivers installed. For the four file types I cover below, these should be all installed by default but you should double check using the ogrDrivers
function. Here are the drivers on my machine. Yours may be slightly different but you likely have ESRI Shapefile
, MapInfo File
, GeoJSON
and GPX
:
library(rgdal)
ogrDrivers()$name
## [1] "AeronavFAA" "ARCGEN" "AVCBin" "AVCE00"
## [5] "BNA" "CSV" "DGN" "DXF"
## [9] "EDIGEO" "ESRI Shapefile" "Geoconcept" "GeoJSON"
## [13] "Geomedia" "GeoRSS" "GML" "GMT"
## [17] "GPSBabel" "GPSTrackMaker" "GPX" "HTF"
## [21] "Idrisi" "KML" "MapInfo File" "Memory"
## [25] "MSSQLSpatial" "ODBC" "ODS" "OpenAir"
## [29] "OpenFileGDB" "PCIDSK" "PDF" "PDS"
## [33] "PGDump" "PGeo" "REC" "S57"
## [37] "SDTS" "SEGUKOOA" "SEGY" "SUA"
## [41] "SVG" "SXF" "TIGER" "UK .NTF"
## [45] "VRT" "Walk" "WAsP" "XLSX"
## [49] "XPlane"
A note about the data source name (dsn
) argument
Although there are specific examples below, it's worth noting that for file types that require a dsn
argument without a layer name you need to remember to leave off the trailing forward slash. For files in your current working directory you can use "."
You can also use relative paths. Examples here:
# These are valid dsn arguments for reading a shapefile for a shapefile.
# that I have in x:/junk and in x:/
readOGR(dsn="x:/junk", layer="myfile") # full path to x:/junk
readOGR(dsn=".", layer="myfile") # current working directory X:/
readOGR(dsn="junk", layer="myfile") # relative path X:/junk
Reading and writing shapefiles with rgdal
With shapefiles the dsn
argument requires a directory path (without the filename and without the trailing forward slash) and the layer
argument requires a layer name without the suffix.
# no trailing slash in dsn and no suffix in layer! The shapefile is at
# X:/spatial_files/Heritage_Trees_pdx.shp and the current working director
# is X:/
trees <- readOGR(dsn="spatial_files", layer="Heritage_Trees_pdx")
## OGR data source with driver: ESRI Shapefile
## Source: "spatial_files", layer: "Heritage_Trees_pdx"
## with 291 features
## It has 14 fields
bubble(trees['HEIGHT'], col=rgb(0.5,0.5,1,0.5))
Similar syntax for Writing a shapefile:
# write to current directory: x:/trees2.shp
writeOGR(trees, dsn=".", layer="trees2", driver="ESRI Shapefile")
Reading and writing geojson with rgdal
GeoJSON is an increasingly common format. For testing purposes, it's fun to create and save a layer using the geojson.io site. But reading GeoJSON into R can be challenging if you're not careful. For GeoJSON the dsn
argument requires both the layer name and the suffix and the layer
arguments should be set to OGRGeoJSON
no matter what your actual layer is called! Repeat, no matter what your layer's actual name is you use layer="OGRGeoJSON"
.
There are other packages for reading and writing geojson like the geojsonio
package that I recommend you look at. For me, despite the oddities of readOGR
I like using one package, one function if possible.
park <- readOGR(dsn="spatial_files/providence_park.geojson", layer="OGRGeoJSON")
## OGR data source with driver: GeoJSON
## Source: "spatial_files/providence_park.geojson", layer: "OGRGeoJSON"
## with 1 features
## It has 0 fields
Since the trees and parks have different coordinate systems I need to project before plotting them together:
park <- spTransform(park, CRS(proj4string(trees)))
plot(trees, pch=16, cex=0.5)
# lwd=10 is not adviseable but at least you can see the park
plot(park, add=TRUE, lwd=10, border="forestgreen")
For some reason, when writing to GeoJSON you cannot use a period in the middle of the name. So I write a name without a period and then use file.rename
to add the period. You can see a conversation about this here. Not ideal but it works.
# writing file x:/myfile.geojson but I need to write a file called
# myfilegeojson and then rename it with the period
writeOGR(park, dsn="myfilegeojson", layer="", driver="GeoJSON")
file.rename("myfilegeojson", "myfile.geojson")
Reading and writing MapInfo files with rgdal
With MapInfo the dsn
argument requires a full path and suffix and the layer
argument requires the layer name without a suffix.
# The MapInfo file is x:/spatial_files/census_place_pdx.tab
place <- readOGR(dsn="spatial_files/census_place_pdx.tab", layer="census_place_pdx")
## OGR data source with driver: MapInfo File
## Source: "spatial_files/census_place_pdx.tab", layer: "census_place_pdx"
## with 1 features
## It has 16 fields
# project place to match the trees
place <- spTransform(place, CRS(proj4string(trees)))
plot(place, col="cadetblue", border="grey")
plot(trees, add=TRUE, col="firebrick", pch=16)
Writing a MapInfo file is very similar to writing a shapefile:
# write to current directory: x:/place2.tab
writeOGR(place, dsn=".", layer="place2", driver="MapInfo File")
Reading and writing GPS (GPX extension) files with rgdal
GPS files (with a gpx
extension) can be read in as either tracks (lines) or waypoints (points). In either case the data source name will be the path with the layer name and suffix. The layer argument, though, no matter what your actual layer name is, will be either "tracks"
or "track_points"
depending on which you want to read in.
path <- readOGR(dsn = "spatial_files/portland_track.gpx", layer="tracks")
## OGR data source with driver: GPX
## Source: "spatial_files/portland_track.gpx", layer: "tracks"
## with 1 features
## It has 12 fields
waypoints <- readOGR(dsn = "spatial_files/portland_track.gpx", layer="track_points")
## OGR data source with driver: GPX
## Source: "spatial_files/portland_track.gpx", layer: "track_points"
## with 35 features
## It has 24 fields
plot(path, col="coral4", lwd=2)
plot(waypoints, add=TRUE, pch=16, col="chartreuse4")
For writing a GPX file you provide the full path and layer name under dsn
and then tracks
or track_points
for layer:
writeOGR(path, dsn="mypath.gpx", layer="tracks", driver="GPX")
Conclusion
Working with rgdal
is not pretty but it's a powerful and important tool for reading vector data. Knowing the quirks and creating a cheat sheet for yourself will save a lot of hand wringing and allow you to start having fun with spatial analysis in R.
Postscript, R and package details
I'm going to start ending blog posts with sessionInfo() so that readers can know what R and package versions were used.
sessionInfo()
## R version 3.2.3 (2015-12-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rgdal_1.1-1 sp_1.2-1 knitr_1.10.5 RWordPress_0.2-3
##
## loaded via a namespace (and not attached):
## [1] lattice_0.20-33 XML_3.98-1.2 digest_0.6.8 bitops_1.0-6
## [5] grid_3.2.3 formatR_1.2 magrittr_1.5 evaluate_0.7
## [9] stringi_1.0-1 rmarkdown_0.7 tools_3.2.3 stringr_1.0.0
## [13] RCurl_1.95-4.6 yaml_2.1.13 htmltools_0.2.6 XMLRPC_0.3-0
Hello! In case you need a free online tool to easily convert kml to gpx and the vice versa, I suggest trying http://gpx2kml.com/ . It doesn’t need installation, just upload files and the tool will generate its results.
Great post ! How you can read geoJson file from HDFS ?
I tried somthing like:
readOGR(dsn=”hdfs:///user/skywalker/test.geojson”, layer=”OGRGeoJSON”)
but I receive the following error message
Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, :
GDAL Error 3: Cannot open file ‘hdfs:///user/skywalker/test.geojson’
Thanks
Great post!
Please write one about reading and writing .DXF files in R.
VGF
Hi!
I want to open a DFX in r using readOGR. how can I do it ? is there a Website, describing the way the commands should be written for all the different Formats with ReadOGR?
Thank you very much,
Alejandra
This post is a few years old now. For reading spatial files you should use the
sf
package instead. I assume you mean a DXF file — assuming you have the right driver you can use thest_read()
function. Find the drivers you have withst_drivers()
.