One of my favorite packages for creating maps in R is
ggplot2. No matter what, though, creating maps in R is trickier than doing it in a GIS system, particularly when you don't have 'on the fly' projection as you have in both ArcGIS and QGIS. To help you create maps on your own we share a typical use case below (including a coordinate system mismatch). In this example we use points and polygons by themselves but if you'd like to include tilemaps from Google, Stamen and others you should check out the
1. Read in the point and polygon data
Start by reading in the data. Our point data is in a comma-separated file with latitude and longitude values. Our polygons are a shapefile of NYC counties (also known as boroughs). You can get a similar file from the NYC Department of City Planning. Note that we're using the
readOGR function from the package
rgdal instead of
maptools. You'll see why later in the post.
library(rgdal) library(ggplot2) # read in point data (tabular data) mapdata<-read.csv("mapdata.csv", stringsAsFactors=FALSE) head(mapdata) ## id buildarea latitude longitude ## 1 10023 12.820 40.76 -73.99 ## 2 10027 22.092 40.76 -73.99 ## 3 10030A 26.081 40.76 -73.98 ## 4 10030B 25.597 40.75 -73.98 ## 5 10072 3.775 40.76 -73.82 ## 6 10225 16.416 40.76 -73.97 # the shapefile of NYC counties/boroughs. Careful about how you define the path # and layer. I always find this odd. counties<-readOGR("nybb.shp", layer="nybb") ## OGR data source with driver: ESRI Shapefile ## Source: "nybb.shp", layer: "nybb" ## with 5 features and 6 fields ## Feature type: wkbMultiPolygon with 2 dimensions
2. Make some simple maps using
Now we can create the maps in the same way we make non-geographic charts in
ggplot. (If you know NYC, you know that the map is distorted — don’t worry we will fix this in the last step).
# map the counties ggplot() + geom_polygon(data=counties, aes(x=long, y=lat, group=group))
How about the points
# map the points ggplot() + geom_point(data=mapdata, aes(x=longitude, y=latitude), color="red")
Great, this is easy. Let's combine the two:
# map both polys and points together ggplot()+geom_polygon(data=counties, aes(x=long, y=lat, group=group)) + geom_point(data=mapdata, aes(x=longitude, y=latitude), color="red")
Shucks! Usually when you see something odd like this it's related to inconsisitent projections.
3. Make the projections consistent
Projecting geographic data is a pain no matter how long you've been doing it. Even though I've been mapping with R for many years I still need to refer to my cheat sheet with some common projections and examples of the code needed to project.
ggplot2 does have a function called
coord_map() that can be used as a sort of on-the-fly projection but the function is still experimental so we’re sticking with this approach for now.]
Before projecting we need to know the existing projection/coordinate system for the layers. For the counties we can use the function
proj4string. Note, though, that if you read in the shapefile using
readShapePoly from the library
maptools instead of
readOGR the projection information will not be included.
proj4string(counties) ##  "+proj=lcc +lat_1=40.66666666666666 +lat_2=41.03333333333333 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +datum=NAD83 +units=us-ft +no_defs +ellps=GRS80 +towgs84=0,0,0"
Well, the proj4 format is about as clear as mud. To get a more readable format you can look at the .prj file that came with shapefile. For this one you would see the following (note that what you’ll actually see is plain text — the nice formatting you see here comes from spatialreference.org):
This is a good, common projection for New York City – Long Island State Plane (NAD83). But looking at the tabular point data file I see latitude and longitude coordinates (not X/Y) so I know that the point data is not projected. I need to project the point data to State Plane to match the polygon data. To do this we use the
spTransform() function (from the
rgdal package). But in order to use it we need to convert the points data frame to class
class(mapdata) ##  "data.frame" coordinates(mapdata)<-~longitude+latitude class(mapdata) ##  "SpatialPointsDataFrame" ## attr(,"package") ##  "sp" # does it have a projection/coordinate system assigned? proj4string(mapdata) # nope ##  NA # we know that the coordinate system is NAD83 so we can manually # tell R what the coordinate system is proj4string(mapdata)<-CRS("+proj=longlat +datum=NAD83") # now we can use the spTransform function to project. We will project # the mapdata and for coordinate reference system (CRS) we will # assign the projection from counties mapdata<-spTransform(mapdata, CRS(proj4string(counties))) # double check that they match identical(proj4string(mapdata),proj4string(counties)) ##  TRUE
4. Back to the maps
# ggplot can't deal with a SpatialPointsDataFrame so we can convert back to a data.frame mapdata<-data.frame(mapdata) # we're not dealing with lat/long but with x/y # this is not necessary but for clarity change variable names names(mapdata)[names(mapdata)=="longitude"]<-"x" names(mapdata)[names(mapdata)=="latitude"]<-"y" # now create the map ggplot() +geom_polygon(data=counties, aes(x=long, y=lat, group=group))+ geom_point(data=mapdata, aes(x=x, y=y), color="red")
Ahh, that’s better. Still not pretty, but we have the pieces working.
5. Final touches
Here we're going make the map a little nicer. Get rid of axes, new color gradient, clean up title etc.
ggplot() + geom_polygon(data=counties, aes(x=long, y=lat, group=group), fill="grey40", colour="grey90", alpha=1)+ labs(x="", y="", title="Building Area Within 1000m")+ #labels theme(axis.ticks.y = element_blank(),axis.text.y = element_blank(), # get rid of x ticks/text axis.ticks.x = element_blank(),axis.text.x = element_blank(), # get rid of y ticks/text plot.title = element_text(lineheight=.8, face="bold", vjust=1))+ # make title bold and add space geom_point(aes(x=x, y=y, color=buildarea), data=mapdata, alpha=1, size=3, color="grey20")+# to get outline geom_point(aes(x=x, y=y, color=buildarea), data=mapdata, alpha=1, size=2)+ scale_colour_gradientn("Building\narea (sq-km)", colours=c( "#f9f3c2","#660000"))+ # change color scale coord_equal(ratio=1) # square plot to avoid the distortion