Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Jul 28, 2013 9:45pm

tiled vectors update, with math

Back in March, I released an experimental format for vector tiles compatible with the Mapnik Python plugin. I’ve been slowly iterating on that work in the months since, and have changed my approach somewhat along the way. I initially imagined these would be useful for speedier rendering of raster tiles, and that’s exactly the use case the Mapbox has defined for the vector tile format they released in May. However, the most interesting and future-facing uses I’ve seen have been Javascript-driven, client-side interfaces:

We just got new memory and faster storage on the OpenStreetMap US server, so I’m getting more comfortable talking about these as a proper service. Everything is available in TopoJSON and GeoJSON, rendering speeds are improving, and I’m adapting TopoJSON to Messagepack to support basic binary output (protobufs are a relative pain to create and use). I’m also starting to pay attention to the lower zoom levels, and adding support for Natural Earth data where applicable. So far, you’ll see that active in the Water and Land layers, with NE lakes, playas, parks and urban areas.

Last month, I followed up on Nelson Minar’s TopoJSON measurements from his State of the Map talk to make some predictions on the overall resource requirements for pre-rendering the earth. The remainder of this post is adapted from an email I sent to a small group of people collaborating on this effort.

Measuring Vector Tiles

I was interested to see how the vector tile rendering process performed in bulk, so I extracted a portion of the OSM + NE databases I’m using for the current vector tiles, and got to work on EC2. I found that render times for TopoJSON and GeoJSON “all” tiles were similar, and tile sizes for TopoJSON the usual ~40% smaller than GeoJSON.

What I most wanted from this was a sense of scope for pre-rendering worldwide vector tiles in order to run a more reliable service on the hardware we have.

I’m not thrilled with the results here; mostly they make me realize that to truly pre-render the world we’ll want to stop at ~Z13/14, and use the results of that process to generate the final JSON tiles sen by the outside world. This would actually be a similar process to that used by Mapbox, with the difference that Mapbox uses weirdly-formatted vector tiles to generate raster tiles, while I’m thinking to use weirdly-formatted vector tiles to generate other, less-weirdly formatted vector tiles. This is all getting into apply-for-a-grant territory, and I continue to be excited about the potential for running a reliable source of these tiles for client-side rendering experiments.

My area of interest is covered by this Z5 tile, including most of CA and NV: http://tile.openstreetmap.org/5/5/12.png

I used tilestache-seed to render tiles up to Z14, to fit everything into about a day. There ended up being about 350k tiles in that area, 0.1% of the total that Mapbox is rendering for the world. Since they I’m including several major cities, I’m guessing that a tile like 5/5/12 represents an above-average amount of data for vector tiles.

The PostGIS data was served from a 2GB database living on a ramfs volume, to mitigate EC2 IO impact.

Rendering Time

Times are generally comparable between the two, and I assume that it should be possible to beat some additional performance out of these with a compiled python module or node magic or… something. It will be interesting to profile the actual code at some point, I don’t know if we’re losing time converting from database WKB to shapely objects, gzipping to the cache, or if this is all good enough. Since our Z14s are not sufficient for rendering at higher zooms, I’ll want to mess with the queries to make something that could realistically be used to render full-resolution tiles from Z14 vector tiles.

The similar rendering times between the two surprised me; I expected to see more of a difference. I was also surprised to see the lower-zoom TopoJSON tiles come out faster. I suspect that with more geometry to encode at those levels, the relative advantage of integers over floats in JSON comes into play.

GeoJSON All 4h31m
TopoJSON All 4h19m 4% faster
GeoJSON Z14 2.67 /sec.
TopoJSON Z14 1.71 /sec. 36% slower
GeoJSON Z12 1.24 /sec.
TopoJSON Z12 1.35 /sec. 9% faster
GeoJSON Z10 0.23 /sec.
TopoJSON Z10 0.25 /sec. 9% faster

Response Size

Nelson’s already done a bunch of this work, but it seemed worthwhile to measure this specific OSM-based datasource. TopoJSON saves more space at high zooms than low zooms. I measured the file length and disk usage of all cached tiles, which are stored gzipped and hopefully represent the actual size of a response over HTTP.

(on disk)
GeoJSON All 2.1GB
TopoJSON All 1.7GB 19% smaller
GeoJSON All 922MB
TopoJSON All 527MB 43% smaller
(95th %)
GeoJSON Z14 673MB 14.6KB 135KB
TopoJSON Z14 365MB 46% smaller 6.7KB 75.2KB
GeoJSON Z13 165MB 13.5KB 176KB
TopoJSON Z13 107MB 35% smaller 7.5KB 112KB
GeoJSON Z12 62.9MB 18.2KB 106KB
TopoJSON Z12 41.8MB 33% smaller 11.4KB 80KB
February 2018
Su M Tu W Th F Sa

Recent Entries

  1. planscore: a project to score gerrymandered district plans
  2. blog all dog-eared pages: human transit
  3. the levity of serverlessness
  4. three open data projects: openstreetmap, openaddresses, and who’s on first
  5. building up redistricting data for North Carolina
  6. district plans by the hundredweight
  7. baby steps towards measuring the efficiency gap
  8. things I’ve recently learned about legislative redistricting
  9. oh no
  10. landsat satellite imagery is easy to use
  11. openstreetmap: robots, crisis, and craft mappers
  12. quoted in the news
  13. dockering address data
  14. blog all dog-eared pages: the best and the brightest
  15. five-minute geocoder for openaddresses
  16. notes on debian packaging for ubuntu
  17. guyana trip report
  18. openaddresses population comparison
  19. blog all oft-played tracks VII
  20. week 1,984: back to the map