Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Sep 29, 2009 2:56am

openstreetmap genuine advantage

It should be obvious by now that I've got OpenStreetMap on the brain, and I'm not alone (though I hope I'm able to out-dork Flickr here).

Since we were involved in last month's DataSF launch, I've been thinking a bit about how an anarcho-syndicalist geo data project becomes useful to a city like San Francisco. Right now, we have two broad sets of free-enough SF streets data: the city's own shapefiles and OSM's excellent coverage. There's been a bit of effort expended on moving from the latter to the former; the OSM data itself is largely based on a mass import of the TIGER/Line set. What about movements from OSM to SF, and continued cross feeds between the two? At the launch party with the mayor, I asked whether the city had a plan in place to handle feedback and corrections on its data, which I think is absolutely critical for a mature data curation practice. Right now, I suspect that the DataSF centerlines file contains a large number of paper streets and a few long-since-demolished freeway overpasses.

A technical answer to this issue would address the need for a city to vouch for what it knows and verify changes made by others, as well as preserve the flexibility needed by editors of OpenStreetMap. Public key cryptography, the Right Answer No One Likes, has a feature called signing, where it's possible for the holder of a key to add a forgery-proof signature to a block of data. I've put together a small project called GOSM (Genuine OpenStreetMap) that automates the process of adding signatures to ways in the OSM database and stashing them in tags.

There are a few needs here that are worth thinking about.

First, it's likely that there are multiple overlapping constituencies for any given bit of geography: the city, the county, the state, neighborhood groups, commercial interests, etc. It should be possible for any or all of these groups to offer independent signatures on bits of geography that concern them.

Second, it's important to sign only the aspects of the geography that matter, and to do so in a way that's resistant to noisy changes. For example, a signature on a road that vouches for its name and classification should not be invalidated by the addition of a bike lane tag.

Third, it should be possible for the signing authority to publish a list of their own contributions for comparison or verification.

How do these come into play in GOSM? Using it is very simple, from the command line. Here we sign the highway, name, and oneway tags on two streets:

python sign.py -u (osm username) -p (osm password) -k (gpg key) -t highway,name,oneway 28518589 23969004 > out.txt

The signature is added to each way as a tag, called gosm:sig:8CBDE645 (that last bit is the key ID - there might be more than one). The value is a string that includes the requested tag names ("highway", etc.), a base64-encoded GPG signature, and a date+time. The signed message is not stored, but it's an encoding of the tag values and the geographic location of each node in the way that's easily derivable from the way itself. I use Bencode for the encoding because each value has one and only one possible encoding, and Geohash for the locations because Bencode doesn't like floating point numbers.

The important bit is that there can be many signatures on a way, one for each interested signing authority. I've signed a few streets I know, and a city could do the same with a good key.

Checking a signature is easy:

python verify.py -k 8CBDE645 28518589 23969004

How many people actually need to verify a signature? Probably not many; public key crypto is sort of a mathematician's backup to common sense. What happens when signatures are invalidated by later edits? I expect that a bit of common sense would apply here as well, with consumers of the data using investigation and judgement to decide whom to trust, and signing authorities keeping an eye on signatures. The outcome I think would be especially worthwhile would be if a city or county used a mechanism like this to determine when its own database fell out of date, and treated the new conflicting information as input rather than contradiction, signing subsequent versions of streets as OSM participants mark them with updates.

Comments (5)

  1. Why *is* it that nobody likes public key crypto? The almost complete non-uptake has long baffled me.

    Posted by George on Tuesday, September 29 2009 12:25pm EDT

  2. The problem with public key crypto is: 1) Nobody understands it (for common definitions of "nobody", meaning "non-developer/math geeks") 2) It's too conspicuous, or to turn it around, it's not unobtrusive. You generally have to go out of your way to use the tools, and it isn't always obvious/easy how to use them. If Windows, OS X, and Linux (not to mention the mobile platforms) all had an interoperable set of tools, built in at a low level, with (and this is key) *super easy-to-use* tools that weren't scary for average people to use, and with APIs that other software tools could easily hook into for signing and verification, it might start to get some uptake. It would need to be something that stayed out of the way, for the most part. Part of setting up a system login would be to automagically generate a secret key (with the ability for power-users to override with their own key, of course). Applications, including web apps, would need the ability to request approval to sign data. Third-party signatures would be verified, and unobtrusive flags displayed, and easy little "trust this source?" hints provided. When we get to that point, PK might get real uptake. Some of the keyring management stuff out there now (Linux and OS X, dunno about Windows) is a start. But there needs to be more services around it.

    Posted by Dougal Campbell on Thursday, October 1 2009 10:31am EDT

  3. Dougal, I definitely agree - the primary problem with public key crypto is that it solves a problem that most people don't think they have. More deeply, it's a mechanism that requires attention and tending which most people probably aren't interested in offering. Generally speaking, to do crypto right, you have to know what you're doing and when you're doing it. Your example of an easy, automated setup has a serious problem in that the keys used by power users sound like they imply the same level of trust as the keys used by less-interested, just-make-it-work users. I'm not sure that any of this is actually all that relevant to something like the OSM signing idea I'm describing though. I think that there are levels of trust out there, just like with open source code and legislation, where it's assumed that openness allows *someone* who knows what they're doing to keep an eye on things, and raise the alarm if problems were to occur. So in this example, you'd have power users at the local authority doing the hard work of signing, a few power users out in the world doing the periodic hard work of verifying, and some much larger number of normal people not verifying a thing and trusting that they'll hear about problems if they were to come up. Obviously there's a potential for inattention leading to corruption here, but I think that's just a social reality we have to accept and mitigate as it come. I guess what I'm saying is that PK is and will remain a special tool used by the few, with a resulting halo that protects the many.

    Posted by Michal Migurski on Thursday, October 1 2009 12:36pm EDT

  4. Nice idea but given that OSM already provides an "audit trail" for you (i.e. lists all accounts who have touched an object), would it not be sufficient to keep a list of accounts you trust (or a web of trust if you want) instead of putting that info into every single object? Also, and this is a problem with my approach as well, I could move a whole way signed by you from San Francisco to New York without your signature becoming obviously invalid, since I would not modify the way object, just the underlying nodes...

    Posted by Frederik Ramm on Monday, December 14 2009 8:47am EST

  5. Hi Frederik, Moving a way would invalidate the signature, because it's calculated from the location of each node and not the ID. I think the existing audit trail in OSM doesn't account for the intent of the signing party. For example, a local authority might choose to vouch for the names, locations, and highway tags of streets in its area, and choose to sign just those attributes. It would be possible for you or me to go in and add traffic lights or bike lanes without invalidating the signature. It would also be possible for a bike advocacy group to sign just the bicycle network attributes on top of the other signatures. A local authority might also be satisfied with the street layout and make updates solely to signature tags. In both cases, the signature offers information that the edit trail doesn't. The idea in this post is a way for such entities to unambiguously sign off on the state of the map without having to fully engage in an edit race. The public key allows a designated user to make assertions about the map independently of their account.

    Posted by Michal Migurski on Monday, December 14 2009 12:33pm EST

Sorry, no new comments on old posts.

May 2017
Su M Tu W Th F Sa

Recent Entries

  1. three open data projects: openstreetmap, openaddresses, and who’s on first
  2. building up redistricting data for North Carolina
  3. district plans by the hundredweight
  4. baby steps towards measuring the efficiency gap
  5. things I’ve recently learned about legislative redistricting
  6. oh no
  7. landsat satellite imagery is easy to use
  8. openstreetmap: robots, crisis, and craft mappers
  9. quoted in the news
  10. dockering address data
  11. blog all dog-eared pages: the best and the brightest
  12. five-minute geocoder for openaddresses
  13. notes on debian packaging for ubuntu
  14. guyana trip report
  15. openaddresses population comparison
  16. blog all oft-played tracks VII
  17. week 1,984: back to the map
  18. bike eleven: trek roadie
  19. code like you don’t have the time
  20. projecting elevation data