Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

May 27, 2016 4:20pm

five-minute geocoder for openaddresses

The OpenAddresses project recently crossed 250 million worldwide address points with the addition of countrywide data for Australia. Data from OA is used by Mapbox, Consumer Finance Protection Bureau, and my company, Mapzen.

Now, you can use OpenAddresses in a high-quality geocoder yourself. “Geocoding” is the process of transforming input text, such as an address, or a name of a place to a geographic location on the earth's surface. Every time you search for a destination on your phone, you’re geocoding. Mapzen’s Search service uses an open source server we call Pelias, and if you’re using the popular Ubuntu Linux operating system, you can get it set up and serving addresses in just a few minutes.

Start with a clean server running a current version of Ubuntu LTS (long-term support); either 14.04 or 16.04 will work. Amazon has readymade Ubuntu images available on EC2, or a local copy running under Virtualbox will do for testing. Both the address import process and the Elasticsearch index are hungry for lots of memory, so pick a server with 4-8GB of memory to prevent failures.

Next, install the Pelias software using instructions from OpenAddresses:

# Tell Ubuntu where to find packages:
add-apt-repository ppa:openaddresses/geocoder -y
wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | apt-key add -
echo "deb http://packages.elastic.co/elasticsearch/1.7/debian stable main" | tee -a /etc/apt/sources.list.d/elasticsearch-1.7.list

# Install Pelias and dependencies:
apt-get update && apt-get install pelias

This installs the Pelias geocoder, the OpenAddresses importer, a simple web-based map search interface, and the underlying Elasticsearch index.

After installation, you will need to import data. Visit results.openaddresses.io and pick a processed zip file to download. Start small with a city like Berkeley, CA to test the process. Download and unzip it in the directory `/var/tmp/openaddresses` where Pelias expects to find CSV files, then run `pelias-openaddresses-import` to index the data.

# Get a sample file of address data:
cd /var/tmp/openaddresses
curl -OL https://results.openaddresses.io/latest/run/us/ca/berkeley.zip
apt-get install unzip && unzip berkeley.zip

# Index the addresses:

That’s it!

Pelias includes many neat features out of the box, such as reverse geocoding and autocomplete. Read the docs on Github.

The Mapzen Search service includes some additional features that aren’t yet covered here. For example, to include administrative areas like cities or states in searches, it’s necessary to do an admin lookup while importing, and to include data from Who’s On First. I’m also interested to learn more about tuning Elasticsearch for smaller-sized servers with less system RAM. It should be possible to run a geocoder with 1-2GB of memory, and Elasticsearch may require adjustments to make this possible.

Links to more information about geocoding with OpenAddresses:

Comments (3)

  1. You should probably point out that many if the data sources in OpenAddresses have usage terms that need to be adhered to (for example the Australian GNAF address data, that is not actually "open") and that you will have to check them for licence compatibility with whatever you want to do with Pelias (and the same goes for WOF).

    Posted by Simon Poole on Saturday, May 28 2016 10:22am UTC

  2. Thanks, Simon!

    Posted by Michal Migurski on Saturday, May 28 2016 4:13pm UTC

  3. @Simon Poole- I'm interested in the restrictions you've been hitting with the Australian G-NAF address data. What part of them has been closed? I'm also using the dataset. My reading of the license is that it falls under Creative Commons 4, which is quite open, except with a clause preventing the data being used for sending spam snail mail: https://data.gov.au/dataset/geocoded-national-address-file-g-naf

    Posted by Timothy Asquith on Friday, July 29 2016 12:37am UTC

Sorry, no new comments on old posts.

June 2024
Su M Tu W Th F Sa

Recent Entries

  1. Mapping Remote Roads with OpenStreetMap, RapiD, and QGIS
  2. How It’s Made: A PlanScore Predictive Model for Partisan Elections
  3. Micromobility Data Policies: A Survey of City Needs
  4. Open Precinct Data
  5. Scoring Pennsylvania
  6. Coming To A Street Near You: Help Remix Create a New Tool for Street Designers
  7. planscore: a project to score gerrymandered district plans
  8. blog all dog-eared pages: human transit
  9. the levity of serverlessness
  10. three open data projects: openstreetmap, openaddresses, and who’s on first
  11. building up redistricting data for North Carolina
  12. district plans by the hundredweight
  13. baby steps towards measuring the efficiency gap
  14. things I’ve recently learned about legislative redistricting
  15. oh no
  16. landsat satellite imagery is easy to use
  17. openstreetmap: robots, crisis, and craft mappers
  18. quoted in the news
  19. dockering address data
  20. blog all dog-eared pages: the best and the brightest