Although we still use ESRI products such as ArcGIS for geographic analysis, more and more we find ourselves turning to open source software like PostGIS and QGIS. In many cases, though, even PostGIS and QGIS are more software than necessary for a given task and we prefer to skip the ‘middleman’ in favor of the stripped down tools available in the Geospatial Data Abstraction Library (GDAL) which can be used to read and write an incredible number of spatial formats and to manipulate geographies (calculating area, converting types, extracting feature details).
For more detail on using GDAL in Python take a look at this great cookbook. Some of the code below comes from the cookbook and we even left in one of their comments for clarification.
We already have a Python function that computes the network distance between two points using a popular mapping API. We want to apply this function to two existing shapefiles (origin, destination). Unfortunately, they have projections and, as a result, we can’t just plug in the X and Y coordinates (the mapping API requires lat/long). We need to project the points and extract the coordinates.
We could use traditional GIS methods and the GUI in ArcGIS or QGIS. We could also use ArcGIS’ Python library
arcpy (in fact, this is how we did it previously) or python with QGIS. With GDAL, however, we don’t need ArcGIS or QGIS and we can extract the coordinates very quickly with very little code.
1. Install GDAL and import the libraries
Depending our your system details you can use
easy_install GDAL or, if you’re on Windows, you can use the binaries available here.
You will need to import the modules
ogr (for vector manipulations) and
osr (to handle spatial references).
from osgeo import ogr, osr
2. Read in the shapefile as a layer
driver = ogr.GetDriverByName('ESRI Shapefile') dataSource = driver.Open("myshapefile.shp", 0) #0 means read-only, 1 means writeable layer = dataSource.GetLayer()
3. Specify the input and output spatial references
You can grab the input spatial references from the layer directly and you can specify the output using the EPSG. To find an appropriate EPSG you can use spatialreference.org. In this case I want to use the unprojected coordinate system WGS84 (EPSG=4326):
sourceprj = layer.GetSpatialRef() targetprj = osr.SpatialReference() targetprj.ImportFromEPSG(4326) transform = osr.CoordinateTransformation(sourceprj, targetprj)
from osgeo import ogr, osr
4. Grab features, project and extract coordinates
Now that you have the layer and the projection details set up you’re ready to iterate through the features, project and extract the coordinates — all done with
osgeo. If you’re more comfortable with a
for loop you can use this code:
results =  for feature in layer: pt = feature.GetGeometryRef() pt.Transform(transform) results.append(pt.GetPoint()[0:2])
And if you prefer using list comprehensions this code works well
def doProjection(feature): pt = feature.GetGeometryRef() pt.Transform(transform) return pt.GetPoint()[0:2] results = [doProjection(feature) for feature in layer]
5. Create functions and timing
Using the code below we timed reading in the shapefile, cycling through the points, projecting to WGS84 and extracting the coordinates on a shapefile of 180 points and it took 0.05 seconds — pretty good!
t1 = time.clock() results = returnCoordSet("myshape.shp") print time.clock()-t1 def returnCoordSet(inputlayer): from osgeo import ogr, osr driver = ogr.GetDriverByName('ESRI Shapefile') dataSource = driver.Open(inputlayer, 0) #0 means read-only, 1 means writeable layer = dataSource.GetLayer() source = layer.GetSpatialRef() target = osr.SpatialReference() target.ImportFromEPSG(4326) transform = osr.CoordinateTransformation(source, target) return [doProjection(feature, transform) for feature in layer] def doProjection(feature, transform): pt = feature.GetGeometryRef() pt.Transform(transform) return [pt.GetPoint()[i] for i in [1,0]]