Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Nov 20, 2014 1:07am

open address machine

The OpenAddresses project is super-interesting right now:

OpenAddresses is a global repository for open address data. In good open source fashion, OpenAddresses provides a space to collaborate. Today, OpenAddresses is a downloadable archive of address files, it is an API to ingest those address files into your application and, more than anything, it is a place to gather more addresses and create a movement: add your government’s address file and if there isn’t one online yet, petition for it. —Launching OpenAddresses.

OA is the free and open global address collection, but it’s just getting off the ground. Ian Dees of longtime OpenStreetMap involvement kicked off the project early this year when OSM balked at bulk address imports. It’s more sensible as a separate project anyway.

I’ve been working on data.openaddresses.io to make the project more legible and responsive.

I’m about six months late to the party, but there’s a ton to do right now. Thinking back on my own involvement in OSM, I remembered that around 2006 the street map tiles were being updated infrequently, and my own willingness to add data was gated by the turnaround time of seeing my input on the real, live map. I’d add some stuff, then twiddle my thumbs for days (or weeks) while the render refreshed. My satisfaction from adding data improved with every advance in OSM’s rendering stack re-render time. Seeing your effect on the data set is an important motivational factor.

OA has a similar issue for me. It’s implemented as a giant bag of JSON files stored in Github, so it’s not immediately obvious where the data lives, how up-to-date it is, or (if you’re submitting new files) whether a data source even works. The processing code works, but it’s not immediately obvious how to make all the pieces fit together.

I have been working on machine, a harness for running the whole process on a more regular cycle. There’s a bunch of interesting moving pieces.

I’ve taken Andy Allan’s chef advice to heart and created a chef recipe collection for preparing OA to run on a bare Ubuntu 14.04 machine. Chef is a no-brainer for me now, and I use it for everything that stands any chance of being important. Andy says:

Configuration management really kicks in to its own when you have dozens of servers, but how few are too few to be worth the hassle? It’s a tough one. Nowadays I’d say if you have only one server it’s still worth it – just – since one server really means three, right? The one you’re running, the VM on your laptop that you’re messing around with for the next big software upgrade, and the next one you haven’t installed yet.

If you want to add a skeletal chef script to any existing repository, start here:

git pull https://github.com/migurski/chefbase.git master

The whole OA codebase is now possible to run on a scratch machine, which means that once each week I can start an EC2-XXXL server and have it set up with complete OA code in minutes. It takes a few hours to run everything. We can keep data.openaddresses.io up-to-date with the status of the data, including a fresh map of data from US states and counties (even though OA is international), a complete listing of cached and processed status for all data, and small data samples to provide hints for correctly mapping (“conforming”) source data to OA’s needs.

There remains a lengthy ticket backlog, but I am hoping that OA provides a way to better expose and unify the world’s municipal government spatial data. Today, addresses. Tomorrow, parcels.

October 2017
Su M Tu W Th F Sa

Recent Entries

  1. planscore: a project to score gerrymandered district plans
  2. blog all dog-eared pages: human transit
  3. the levity of serverlessness
  4. three open data projects: openstreetmap, openaddresses, and who’s on first
  5. building up redistricting data for North Carolina
  6. district plans by the hundredweight
  7. baby steps towards measuring the efficiency gap
  8. things I’ve recently learned about legislative redistricting
  9. oh no
  10. landsat satellite imagery is easy to use
  11. openstreetmap: robots, crisis, and craft mappers
  12. quoted in the news
  13. dockering address data
  14. blog all dog-eared pages: the best and the brightest
  15. five-minute geocoder for openaddresses
  16. notes on debian packaging for ubuntu
  17. guyana trip report
  18. openaddresses population comparison
  19. blog all oft-played tracks VII
  20. week 1,984: back to the map