Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Sep 29, 2009 2:56am

openstreetmap genuine advantage

It should be obvious by now that I've got OpenStreetMap on the brain, and I'm not alone (though I hope I'm able to out-dork Flickr here).

Since we were involved in last month's DataSF launch, I've been thinking a bit about how an anarcho-syndicalist geo data project becomes useful to a city like San Francisco. Right now, we have two broad sets of free-enough SF streets data: the city's own shapefiles and OSM's excellent coverage. There's been a bit of effort expended on moving from the latter to the former; the OSM data itself is largely based on a mass import of the TIGER/Line set. What about movements from OSM to SF, and continued cross feeds between the two? At the launch party with the mayor, I asked whether the city had a plan in place to handle feedback and corrections on its data, which I think is absolutely critical for a mature data curation practice. Right now, I suspect that the DataSF centerlines file contains a large number of paper streets and a few long-since-demolished freeway overpasses.

A technical answer to this issue would address the need for a city to vouch for what it knows and verify changes made by others, as well as preserve the flexibility needed by editors of OpenStreetMap. Public key cryptography, the Right Answer No One Likes, has a feature called signing, where it's possible for the holder of a key to add a forgery-proof signature to a block of data. I've put together a small project called GOSM (Genuine OpenStreetMap) that automates the process of adding signatures to ways in the OSM database and stashing them in tags.

There are a few needs here that are worth thinking about.

First, it's likely that there are multiple overlapping constituencies for any given bit of geography: the city, the county, the state, neighborhood groups, commercial interests, etc. It should be possible for any or all of these groups to offer independent signatures on bits of geography that concern them.

Second, it's important to sign only the aspects of the geography that matter, and to do so in a way that's resistant to noisy changes. For example, a signature on a road that vouches for its name and classification should not be invalidated by the addition of a bike lane tag.

Third, it should be possible for the signing authority to publish a list of their own contributions for comparison or verification.

How do these come into play in GOSM? Using it is very simple, from the command line. Here we sign the highway, name, and oneway tags on two streets:

python sign.py -u (osm username) -p (osm password) -k (gpg key) -t highway,name,oneway 28518589 23969004 > out.txt

The signature is added to each way as a tag, called gosm:sig:8CBDE645 (that last bit is the key ID - there might be more than one). The value is a string that includes the requested tag names ("highway", etc.), a base64-encoded GPG signature, and a date+time. The signed message is not stored, but it's an encoding of the tag values and the geographic location of each node in the way that's easily derivable from the way itself. I use Bencode for the encoding because each value has one and only one possible encoding, and Geohash for the locations because Bencode doesn't like floating point numbers.

The important bit is that there can be many signatures on a way, one for each interested signing authority. I've signed a few streets I know, and a city could do the same with a good key.

Checking a signature is easy:

python verify.py -k 8CBDE645 28518589 23969004

How many people actually need to verify a signature? Probably not many; public key crypto is sort of a mathematician's backup to common sense. What happens when signatures are invalidated by later edits? I expect that a bit of common sense would apply here as well, with consumers of the data using investigation and judgement to decide whom to trust, and signing authorities keeping an eye on signatures. The outcome I think would be especially worthwhile would be if a city or county used a mechanism like this to determine when its own database fell out of date, and treated the new conflicting information as input rather than contradiction, signing subsequent versions of streets as OSM participants mark them with updates.

October 2017
Su M Tu W Th F Sa

Recent Entries

  1. planscore: a project to score gerrymandered district plans
  2. blog all dog-eared pages: human transit
  3. the levity of serverlessness
  4. three open data projects: openstreetmap, openaddresses, and who’s on first
  5. building up redistricting data for North Carolina
  6. district plans by the hundredweight
  7. baby steps towards measuring the efficiency gap
  8. things I’ve recently learned about legislative redistricting
  9. oh no
  10. landsat satellite imagery is easy to use
  11. openstreetmap: robots, crisis, and craft mappers
  12. quoted in the news
  13. dockering address data
  14. blog all dog-eared pages: the best and the brightest
  15. five-minute geocoder for openaddresses
  16. notes on debian packaging for ubuntu
  17. guyana trip report
  18. openaddresses population comparison
  19. blog all oft-played tracks VII
  20. week 1,984: back to the map