Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Jul 29, 2013 1:45am

tiled vectors update, with math

Back in March, I released an experimental format for vector tiles compatible with the Mapnik Python plugin. I’ve been slowly iterating on that work in the months since, and have changed my approach somewhat along the way. I initially imagined these would be useful for speedier rendering of raster tiles, and that’s exactly the use case the Mapbox has defined for the vector tile format they released in May. However, the most interesting and future-facing uses I’ve seen have been Javascript-driven, client-side interfaces:

We just got new memory and faster storage on the OpenStreetMap US server, so I’m getting more comfortable talking about these as a proper service. Everything is available in TopoJSON and GeoJSON, rendering speeds are improving, and I’m adapting TopoJSON to Messagepack to support basic binary output (protobufs are a relative pain to create and use). I’m also starting to pay attention to the lower zoom levels, and adding support for Natural Earth data where applicable. So far, you’ll see that active in the Water and Land layers, with NE lakes, playas, parks and urban areas.

Last month, I followed up on Nelson Minar’s TopoJSON measurements from his State of the Map talk to make some predictions on the overall resource requirements for pre-rendering the earth. The remainder of this post is adapted from an email I sent to a small group of people collaborating on this effort.

Measuring Vector Tiles

I was interested to see how the vector tile rendering process performed in bulk, so I extracted a portion of the OSM + NE databases I’m using for the current vector tiles, and got to work on EC2. I found that render times for TopoJSON and GeoJSON “all” tiles were similar, and tile sizes for TopoJSON the usual ~40% smaller than GeoJSON.

What I most wanted from this was a sense of scope for pre-rendering worldwide vector tiles in order to run a more reliable service on the hardware we have.

I’m not thrilled with the results here; mostly they make me realize that to truly pre-render the world we’ll want to stop at ~Z13/14, and use the results of that process to generate the final JSON tiles sen by the outside world. This would actually be a similar process to that used by Mapbox, with the difference that Mapbox uses weirdly-formatted vector tiles to generate raster tiles, while I’m thinking to use weirdly-formatted vector tiles to generate other, less-weirdly formatted vector tiles. This is all getting into apply-for-a-grant territory, and I continue to be excited about the potential for running a reliable source of these tiles for client-side rendering experiments.

My area of interest is covered by this Z5 tile, including most of CA and NV: http://tile.openstreetmap.org/5/5/12.png

I used tilestache-seed to render tiles up to Z14, to fit everything into about a day. There ended up being about 350k tiles in that area, 0.1% of the total that Mapbox is rendering for the world. Since they I’m including several major cities, I’m guessing that a tile like 5/5/12 represents an above-average amount of data for vector tiles.

The PostGIS data was served from a 2GB database living on a ramfs volume, to mitigate EC2 IO impact.

Rendering Time

Times are generally comparable between the two, and I assume that it should be possible to beat some additional performance out of these with a compiled python module or node magic or… something. It will be interesting to profile the actual code at some point, I don’t know if we’re losing time converting from database WKB to shapely objects, gzipping to the cache, or if this is all good enough. Since our Z14s are not sufficient for rendering at higher zooms, I’ll want to mess with the queries to make something that could realistically be used to render full-resolution tiles from Z14 vector tiles.

The similar rendering times between the two surprised me; I expected to see more of a difference. I was also surprised to see the lower-zoom TopoJSON tiles come out faster. I suspect that with more geometry to encode at those levels, the relative advantage of integers over floats in JSON comes into play.

GeoJSON All 4h31m
TopoJSON All 4h19m 4% faster
GeoJSON Z14 2.67 /sec.
TopoJSON Z14 1.71 /sec. 36% slower
GeoJSON Z12 1.24 /sec.
TopoJSON Z12 1.35 /sec. 9% faster
GeoJSON Z10 0.23 /sec.
TopoJSON Z10 0.25 /sec. 9% faster

Response Size

Nelson’s already done a bunch of this work, but it seemed worthwhile to measure this specific OSM-based datasource. TopoJSON saves more space at high zooms than low zooms. I measured the file length and disk usage of all cached tiles, which are stored gzipped and hopefully represent the actual size of a response over HTTP.

(on disk)
GeoJSON All 2.1GB
TopoJSON All 1.7GB 19% smaller
GeoJSON All 922MB
TopoJSON All 527MB 43% smaller
(95th %)
GeoJSON Z14 673MB 14.6KB 135KB
TopoJSON Z14 365MB 46% smaller 6.7KB 75.2KB
GeoJSON Z13 165MB 13.5KB 176KB
TopoJSON Z13 107MB 35% smaller 7.5KB 112KB
GeoJSON Z12 62.9MB 18.2KB 106KB
TopoJSON Z12 41.8MB 33% smaller 11.4KB 80KB

Comments (3)

  1. Mike, Any idea whether re-projecting GeoJSON back to 4326 could add to the slowness at lower zooms? (I submitted a pull request to allow them to stay projected.) A little while ago I looked into using numpy for vectorized topojson encoding.. (vectorized forward() and diff_encode()). Although the encoding was a lot faster, getting the data in/out of the shapely geometries to numpy ate up most performance gain. It probably needs more of an end-to-end numpy treatment that I haven't had a chance to look into yet, but there might be something there. I'd be happy to revive that work if you think it might be worth it...

    Posted by JW on Thursday, August 1 2013 1:34pm UTC

  2. Re: protobuf being a pain, you should look at Cap'n Proto - http://kentonv.github.io/capnproto/ - by the original creater of protobuf. In theory they have applied the learnings from the original implementation.

    Posted by Michael P on Friday, August 2 2013 2:24am UTC

  3. JW, the end-to-end numpy work would be pretty interesting. Even a compiled C extension might do the trick, just something with a fast inner loop. I don't know how the reprojection factors on. I'll look at the pulls, see if anything jumps out! Michael, that's a good tip.

    Posted by Michal Migurski on Friday, August 2 2013 5:51am UTC

Sorry, no new comments on old posts.

June 2024
Su M Tu W Th F Sa

Recent Entries

  1. Mapping Remote Roads with OpenStreetMap, RapiD, and QGIS
  2. How It’s Made: A PlanScore Predictive Model for Partisan Elections
  3. Micromobility Data Policies: A Survey of City Needs
  4. Open Precinct Data
  5. Scoring Pennsylvania
  6. Coming To A Street Near You: Help Remix Create a New Tool for Street Designers
  7. planscore: a project to score gerrymandered district plans
  8. blog all dog-eared pages: human transit
  9. the levity of serverlessness
  10. three open data projects: openstreetmap, openaddresses, and who’s on first
  11. building up redistricting data for North Carolina
  12. district plans by the hundredweight
  13. baby steps towards measuring the efficiency gap
  14. things I’ve recently learned about legislative redistricting
  15. oh no
  16. landsat satellite imagery is easy to use
  17. openstreetmap: robots, crisis, and craft mappers
  18. quoted in the news
  19. dockering address data
  20. blog all dog-eared pages: the best and the brightest