tecznotes

Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Jul 29, 2013 1:45am

tiled vectors update, with math

Back in March, I released an experimental format for vector tiles compatible with the Mapnik Python plugin. I’ve been slowly iterating on that work in the months since, and have changed my approach somewhat along the way. I initially imagined these would be useful for speedier rendering of raster tiles, and that’s exactly the use case the Mapbox has defined for the vector tile format they released in May. However, the most interesting and future-facing uses I’ve seen have been Javascript-driven, client-side interfaces:

My GL-Solar, Rainbow Road edition
VectorMill by Bobby Sudekum
TopoJSON vector maps by Nelson Minar
Vector Tiles demo by Mike Bostock
Polymaps railways and SF: buildings & railways by Paul Mison
Steve Gifford’s iOS WhirlyGlobe demo

We just got new memory and faster storage on the OpenStreetMap US server, so I’m getting more comfortable talking about these as a proper service. Everything is available in TopoJSON and GeoJSON, rendering speeds are improving, and I’m adapting TopoJSON to Messagepack to support basic binary output (protobufs are a relative pain to create and use). I’m also starting to pay attention to the lower zoom levels, and adding support for Natural Earth data where applicable. So far, you’ll see that active in the Water and Land layers, with NE lakes, playas, parks and urban areas.

Last month, I followed up on Nelson Minar’s TopoJSON measurements from his State of the Map talk to make some predictions on the overall resource requirements for pre-rendering the earth. The remainder of this post is adapted from an email I sent to a small group of people collaborating on this effort.

Measuring Vector Tiles

I was interested to see how the vector tile rendering process performed in bulk, so I extracted a portion of the OSM + NE databases I’m using for the current vector tiles, and got to work on EC2. I found that render times for TopoJSON and GeoJSON “all” tiles were similar, and tile sizes for TopoJSON the usual ~40% smaller than GeoJSON.

What I most wanted from this was a sense of scope for pre-rendering worldwide vector tiles in order to run a more reliable service on the hardware we have.

I’m not thrilled with the results here; mostly they make me realize that to truly pre-render the world we’ll want to stop at ~Z13/14, and use the results of that process to generate the final JSON tiles sen by the outside world. This would actually be a similar process to that used by Mapbox, with the difference that Mapbox uses weirdly-formatted vector tiles to generate raster tiles, while I’m thinking to use weirdly-formatted vector tiles to generate other, less-weirdly formatted vector tiles. This is all getting into apply-for-a-grant territory, and I continue to be excited about the potential for running a reliable source of these tiles for client-side rendering experiments.

My area of interest is covered by this Z5 tile, including most of CA and NV: http://tile.openstreetmap.org/5/5/12.png

I used tilestache-seed to render tiles up to Z14, to fit everything into about a day. There ended up being about 350k tiles in that area, 0.1% of the total that Mapbox is rendering for the world. Since they I’m including several major cities, I’m guessing that a tile like 5/5/12 represents an above-average amount of data for vector tiles.

The PostGIS data was served from a 2GB database living on a ramfs volume, to mitigate EC2 IO impact.

Rendering Time

Times are generally comparable between the two, and I assume that it should be possible to beat some additional performance out of these with a compiled python module or node magic or… something. It will be interesting to profile the actual code at some point, I don’t know if we’re losing time converting from database WKB to shapely objects, gzipping to the cache, or if this is all good enough. Since our Z14s are not sufficient for rendering at higher zooms, I’ll want to mess with the queries to make something that could realistically be used to render full-resolution tiles from Z14 vector tiles.

The similar rendering times between the two surprised me; I expected to see more of a difference. I was also surprised to see the lower-zoom TopoJSON tiles come out faster. I suspect that with more geometry to encode at those levels, the relative advantage of integers over floats in JSON comes into play.

	Time
GeoJSON All	4h31m
TopoJSON All	4h19m	4% faster
	Speed
GeoJSON Z14	2.67 /sec.
TopoJSON Z14	1.71 /sec.	36% slower
GeoJSON Z12	1.24 /sec.
TopoJSON Z12	1.35 /sec.	9% faster
GeoJSON Z10	0.23 /sec.
TopoJSON Z10	0.25 /sec.	9% faster

Response Size

Nelson’s already done a bunch of this work, but it seemed worthwhile to measure this specific OSM-based datasource. TopoJSON saves more space at high zooms than low zooms. I measured the file length and disk usage of all cached tiles, which are stored gzipped and hopefully represent the actual size of a response over HTTP.

	Size (on disk)
GeoJSON All	2.1GB
TopoJSON All	1.7GB	19% smaller
	Size (data)
GeoJSON All	922MB
TopoJSON All	527MB	43% smaller
	Size (data)		Tile (95th %)	Tile (max)
GeoJSON Z14	673MB		14.6KB	135KB
TopoJSON Z14	365MB	46% smaller	6.7KB	75.2KB
GeoJSON Z13	165MB		13.5KB	176KB
TopoJSON Z13	107MB	35% smaller	7.5KB	112KB
GeoJSON Z12	62.9MB		18.2KB	106KB
TopoJSON Z12	41.8MB	33% smaller	11.4KB	80KB

Comments (3)

Mike, Any idea whether re-projecting GeoJSON back to 4326 could add to the slowness at lower zooms? (I submitted a pull request to allow them to stay projected.) A little while ago I looked into using numpy for vectorized topojson encoding.. (vectorized forward() and diff_encode()). Although the encoding was a lot faster, getting the data in/out of the shapely geometries to numpy ate up most performance gain. It probably needs more of an end-to-end numpy treatment that I haven't had a chance to look into yet, but there might be something there. I'd be happy to revive that work if you think it might be worth it...

Posted by JW on Thursday, August 1 2013 1:34pm UTC
Re: protobuf being a pain, you should look at Cap'n Proto - http://kentonv.github.io/capnproto/ - by the original creater of protobuf. In theory they have applied the learnings from the original implementation.

Posted by Michael P on Friday, August 2 2013 2:24am UTC
JW, the end-to-end numpy work would be pretty interesting. Even a compiled C extension might do the trick, just something with a fast inner loop. I don't know how the reprojection factors on. I'll look at the pulls, see if anything jumps out! Michael, that's a good tip.

Posted by Michal Migurski on Friday, August 2 2013 5:51am UTC

Sorry, no new comments on old posts.

permanent link | tecznotes