Pages

Showing posts with label merge. Show all posts
Showing posts with label merge. Show all posts

Monday, May 21, 2012

Advanced Shapefile Merger

Italian GIS blogger Toni sent me a message about a sophisticated OGR-based shapefile merger utility he created.  Last year I posted a simple pyshp example that would find all ".shp" files in a directory and merge the geometry and attributes into a single shapefile.  Toni's version takes this concept much further to include wildcards, recursive directory walking, exclusion lists, and some dbf tricks.  You can find this utility at Toni's blog "Furious GIS":

http://furiousgis.blogspot.it/2012/05/python-shapefile-merger-utility.html

Thursday, February 10, 2011

Merging Lots of Shapefiles (quickly)

Arne, over at GIS-Programming.com, recently posted about merging shapefiles using a batch process. I can't remember the last time I merged two or more shapefiles but after googling around it is a very common use case.  GIS forums are littered with requests for the best way to batch merge a directory full of files.  My best guess is people have to work with automatically-generated, geographically disperse data with a common projection and database schema.  I imagine these files would be the result of some automated sensor output. If you know some use cases requiring merging many shapefiles I'd be curious to hear about it.

Arne pointed out that all the code samples out there iterate through each feature in a shapefile and add them to the merged file.  He says this method is slow. I agree to an extent (no pun intended).  However, at some point the underlying shapefile library MUST iterate through each feature in order to generate the summary information, namely the bounding box, required to write a valid shapefile header.  But it is theoretically slightly more efficient to wait until the merge is finished so there is only one iteration cycle.  At the very least, waiting till the end requires less code.

The following example merges all the shapefiles in the current directory into one file and it is quite fast.

# Merge a bunch of shapefiles with attributes quickly!
import glob
import shapefile
files = glob.glob("*.shp")
w = shapefile.Writer()
for f in files:
  r = shapefile.Reader(f)
  w._shapes.extend(r.shapes())
  w.records.extend(r.records())
w.fields = list(r.fields)
w.save("merged")