We use PostgreSQL/PostGIS to manage a lot of our tabular and geographic data from the US Census. In terms of workflow we will either download a shapefile manually from ftp://ftp.census.gov/geo/tiger/ or, if we’re dealing with more than one file (block groups or blocks for example), we will do this from within R (using the download.file()
and unzip()
functions in a loop). In order to import the data to Windows-based PostgreSQL we can generally rely on GDAL’s org2ogr program and this works out well (ogr2ogr can be used to convert between a ton of different formats). For example we might use the code:
ogr2ogr -overwrite -lco GEOMETRY_NAME=geom -a_srs "EPSG:4269" -f "PostgreSQL" PG:"host=localhost user=postgres dbname=census password=xxx" -nlt MULTIPOLYGON existing_shapefile.shp -nln target_table
to write from a shapefile to a PostgreSQL/PostGIS database and we might use R’s system()
function to execute this on the command line.
But we have found that the default character encoding for PostgreSQL (UTF-8) does not handle the occasional names with accents or other non-English words that you find in county or Core Based Statistical Area files. Running the ogr2ogr program on county shapefiles, for example, brings up the error:
ERROR 1: Terminating translation prematurely after failed translation
In order to address this we used R’s cat()
function to create a *.bat file that first sets the encoding to LATIN1 and then run the same ogr2ogr command. So the bat file looks like:
SET PGCLIENTENCODING=LATIN1
ogr2ogr -overwrite -lco GEOMETRY_NAME=geom -a_srs "EPSG:4269" -f "PostgreSQL" PG:"host=localhost user=postgres dbname=census password=spatial" -nlt MULTIPOLYGON existing_shapefile.shp -nln target_table
You can then run the bat file with the system()
function in R and the shapefile will import without errors.