Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Jan 8, 2013 9:55pm

work in progress: green means go

Since State Of The Map in Portland, I’ve been applying simple raster methods to OpenStreetMap data to draw a picture of the current state of U.S. Government TIGER/Line data in the project. TIGER data is a street-level dataset of widely varying quality covering the whole of the United States, and much of OSM in this country is built on TIGER. OSM US Board member Martijn Van Exel explains, in his TIGER deserts post:

Back in 2007, we imported TIGER/Line data from the U.S. Census into OpenStreetMap. TIGER/Line was and is pretty crappy geodata, never meant to make pretty maps with, let alone do frivolous things like routing. But we did it anyway, because it gave us more or less complete base data for the U.S. to work with. …there’s lots of places where we haven’t been taking as good care of the data. Vast expanses of U.S. territory where the majority of the data in OSM is still TIGER as it was imported all those years ago. The TIGER deserts.

TIGER data has been a fantastic leg up for the U.S. map, but elsewhere in the world data imports are frowned upon. The german community in particular feels that imports are antithetical to local community mapping. The U.S. is very different from Europe in terms of population density and driving distances. As Toby Murray said in this message last year, the imbalance between mapper population and surface area between Kansas and Germany is potentially insurmountable:

It is a 9 hour drive from Topeka to Denver and I think you go past a total of 3 cities with a population of over 10,000. In fact, out of the 54 counties west of Wichita, only 7 have a population for the whole county of over 10,000. So while we might be able to start OSM communities in some of the larger cities, vast stretches of the country would remain completely empty.

In many rural parts of the country, the prospective local OSM mapping population and the creators of government data are exactly the same people. I talked to an Esri employee at SOTM this year who told me that at every year's User Conference, she gets a regular stream of these folks approaching her with data in hand, asking how they can get it into OSM. They are the local community we want, and it’s not always clear how we can help them help us.

Based on the full history dump, I’ve been working on a map that I’m calling “Green Means Go,” a visualization of the state of TIGER/Line data in OpenStreetMap. The map shows a grid of 1km×1km squares covering the continental United States. Green squares show places where data imports are unlikely to interfere with community mapping, based on a count of unique participating mappers who don’t appear to be part of any of the three big TIGER imports.

Large, densely-populated urban areas show a similar pattern, with a dark center where many individual mappers have contributed, surrounded by a green rural fringe where no OSM community members have participated in the cleanup and checking of TIGER data.

This pattern shows a lot of local variety. For example, the area around Portland and Salem in Oregon, where we held last year’s SOTM-US conference, shows a broad swath of edited area. Portland in particular has shown a strong local uptake of OSM, basing its official TriMet trip planner on OpenStreetMap.

Other parts of the country, especially in the Great Plains, show the pattern of relative non-participation described by Toby Murray:

Good data does exist in these places, and in fact can be found in the more recent TIGER data sets which rely much more heavily on data generated directly by local county officials. In an area like the one above, the Green Means Go map should help a GIS data owner see that his or her own data and local knowledge would interfere minimally (if at all) with local community mappers.

In some cases, we see patterns that are worth exploring further. Entire counties in Pennsylvania show up as edited, but it’s not obvious to me that there is a county-wide local community here. Have these areas already been replaced by county-level importers who’ve improved the data, or is there some portion of the 2007 TIGER import that I’m missing?

In this other image, the relative lack of any kind of data (OSM or TIGER) is visible on the grounds of Eglin Air Force Base, south of Interstate 10 and east of Pensacola in Florida:

This work is heavily in progress. I’d also like to write about the process of making it, using the National Landcover Dataset and Hadoop to generate this imagery. Some possible next steps include:

  • Collaborating with Ian Dees, Alex Barth, Ruben Mendoza and others from the US OSM community to develop better ways of seeing TIGER data.
  • Creating static, per-County and Census Place views.
  • Developing a plan to regenerate these map tiles for future data updates.
May 2017
Su M Tu W Th F Sa

Recent Entries

  1. three open data projects: openstreetmap, openaddresses, and who’s on first
  2. building up redistricting data for North Carolina
  3. district plans by the hundredweight
  4. baby steps towards measuring the efficiency gap
  5. things I’ve recently learned about legislative redistricting
  6. oh no
  7. landsat satellite imagery is easy to use
  8. openstreetmap: robots, crisis, and craft mappers
  9. quoted in the news
  10. dockering address data
  11. blog all dog-eared pages: the best and the brightest
  12. five-minute geocoder for openaddresses
  13. notes on debian packaging for ubuntu
  14. guyana trip report
  15. openaddresses population comparison
  16. blog all oft-played tracks VII
  17. week 1,984: back to the map
  18. bike eleven: trek roadie
  19. code like you don’t have the time
  20. projecting elevation data