tecznotes

Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Aug 26, 2009 7:14am

tile drawer, round one

For better or worse, Mapnik has long had a reputation for being difficult to install. Ditto OSM planet dumps. I know and you know this is not true, but just because Dane has done so much work to drastically simplify the process doesn't mean you want to do it yourself, like how people stopped soldering their own MP3 players when the iPod came out.

Meanwhile, Amazon has a thing called EC2, the Elastic Compute Cloud, and it's amazing, really one of the biggest changes to my work as a technology/research/CTO guy. One of the things you can do with an elastic compute cloud is to create templates for new machines, and let other people run them. So you put these things together, and you get the Tile Drawer.

Tile Drawer is a new thing I've been working on that makes it fast and easy to run an OpenStreetMap tile server on Amazon's infrastructure for ten cents per hour, with just about no configuration at all. When you instantiate the Tile Drawer AMI (ami-e1ea0a88) you provide two bits of information:

The geographical bounding box of an area you'd like maps for, and
a Cascadenik stylesheet for how it should look.

Your new instance will boot itself up, and immediately start pulling down OpenStreetMap data to render map tiles with. This is pretty much the whole story.

I have a couple example stylesheets up for your use, and I'm very interested in adding more. I'm excited by the idea of a community-maintained collection of cartographic styles, all of which can be immediately applied to any geographic location covered by OpenStreetMap. I'm also excited by the idea that designers can create and rapidly use new kinds of map renderings - we've been plumbing the depths of customized Mapnik renderings for a few years, and it's time to see it break out into wider use. We're already seeing beautiful new map designs like Craig Mod's Art Space Tokyo book, Wilson Miner's EveryBlock maps, and Matt Jones's Lynchian_Mid Cloudmade style for Dopplr. There should be more.

The availability of the OpenStreetMap data set itself is a whopping great slab of social surplus. What did people do when you still needed to get someone from NavTeq or TeleAtlas on the phone for an indeterminate amount of money when you wanted some map data? I was hearing from new Google friends at the Camp Roberts thing a few weeks back that even they reach for OSM when they need to experiment. It's so simple, the whole thing is available, all the time, with regular updates. I'm hoping to see it continue to gradually replace other source of map data for normal, day-to-day city navigation use.

There are a few things that Tile Drawer does not do yet, because it's new. For example, it does not keep up to date with OSM updates. Since it's so cheap and easy to fire up additional instances, I'm imagining that this need can be addressed for the moment by periodically creating replacement Tile Drawers and killing old ones. It doesn't serve anything other than OpenStreetMap data. It doesn't do WMS. It doesn't render multiple styles at once. None of these things seem important yet, which doesn't mean they won't happen at some point in the future.

But first go check out Tile Drawer.

Comments (22)

This is really, really cool. I've fired up a couple of instances but they are taking hours to download the initial data set - I wonder if the planet.openstreetmap.org server is being clobbered by a bunch of people trying this out at once. It would be nice if there was a mirror of the file on S3 or EC2 somewhere to save on the cost of bandwidth to an outside-Amazon IP.

Posted by Simon Willison on Wednesday, August 26 2009 11:45am UTC
Perhaps a snapshot of OSM should be an AWS Public Data Set.

Posted by Jeremy Dunck on Wednesday, August 26 2009 12:12pm UTC
Indeed - super nice. This will definitely clobber someone's mirror or server. You could simply make an EBS snapshot and make that available for anyone to attach to an instance. The problem with making it an official AWS public data set is that it would be a snapshot for a particular day - so they would want to keep moving the pointer up after diffs are applied. This is particularly useful given the recent (planned) server outage of OSM. So encouraging people to go ahead and fire-up their own tiles.

Posted by Andrew Turner on Wednesday, August 26 2009 12:47pm UTC
This is probably the coolest thing i have seen in the geo world for a while.

Posted by john fagan on Wednesday, August 26 2009 1:39pm UTC
This is great. Projects like this are going to help reduce the barrier to entry a lot. Great work, as always from you guys.

Posted by Eric on Wednesday, August 26 2009 3:58pm UTC
Thanks guys! Simon, it's possible that the planet server is getting a bunch of requests. You could try using one of the Cloudmade extracts, which are smaller and on S3. It would also be a good idea for me to start mirroring the main planet to S3 myself - I'll try and get that going today. Anecdotally, I did a test run with the complete planet file and an extract of the NYC metro area, and it took about an hour for the planet file to download, and another *8* hours for the chosen bounds to be extracted - those files are big! It was more like 30 minutes total for the Cloudmade New York State extract. Andrew, is it possible for multiple people to have the same EBS snapshot attached to individual servers? Does it become a read-only filesystem? I'll investigate this, too. Maybe that's what the public data sets are. I've talked to the InfoChimps guys about getting OSM up as a public data set, but it has that disadvantage of requiring a refresh every week. Also if you're curious about progress, I think the output of the extract script should be visible on the EC2 console, since it's run as a startup script out of /etc/init.d.

Posted by Michal Migurski on Wednesday, August 26 2009 4:12pm UTC
Thanks Eric - someone sent me a link to http://mapbox.com the other day, which also looks like it's seriously pushing the EC2 + Maps joy buzzer. I can't wait to see it!

Posted by Michal Migurski on Wednesday, August 26 2009 4:13pm UTC
Really fantastic and very cool!

Posted by Jay Fienberg on Wednesday, August 26 2009 7:49pm UTC
So will a EC2 based mapwarper/tiledrawer be next? Give it a link to an unrectified image and an array of GCPs and blammo you get a tileset?

Posted by Marc Pfister on Thursday, August 27 2009 1:59am UTC
Really great, any chance for an AMI in the EU-West region (for us non-Americans). Or access to the code so that we can build it ourselves and/or run it on a local virtual machine? Just something similar to the presenter notes for the MapsFromScratch AMI This would be really helpful.

Posted by Gerd Kamp on Thursday, August 27 2009 4:41am UTC
Would also be cool if tile drawer could be configured to utilise any vector data source, not just OSM.

Posted by John Fagan on Thursday, August 27 2009 12:03pm UTC
I'm so excited I could plotz. This is absolutely spectacular! Good show! Here's another +1 for creating EBS snapshots of popular datasets. And yes, this fits *squarely* into the realm of public data sets, which Amazon hosts as EBS snapshots for free. Check out the site here: http://aws.amazon.com/publicdatasets/

Posted by Aron Pilhofer on Thursday, August 27 2009 1:28pm UTC
I've been working on a set of scripts that can seed and expire tiles at: http://github.com/aub/tile_flip/tree/master and also another set for our app that will pull osm patches, apply the changes, and expire the appropriate tiles. If that's interesting to you, I'd be happy to have them as part of this project, which sounds great.

Posted by Aubrey Holland on Thursday, August 27 2009 9:06pm UTC
Marc, that sounds suspiciously like a bunch of work but not out of the question. ;) Gerd, can you explain? I hadn't realized that AMI's were in some way region-bound - can't you just instantiate them in any datacenter? John, do you think calling out additional zipped shapefiles would be enough? This is already effectively happening with the processed coastlines. If there were clear rules for the intake and postgis-ification of shapefiles, I think it would be possible to add new data like you're describing. Maybe in a near future revision! Aron, I'm excited that you're excited. =) Aubrey I'll have a look at that repository, it looks pretty awesome.

Posted by Michal Migurski on Friday, August 28 2009 4:28pm UTC
Michal: Gerd's right, AMIs that are created in the US can't be used in the EU. To make it available as an EU image, you need to create an EU-based S3 bucket and upload/register the image there. There's a more detailed guide here: http://www.dotanmazor.com/index.php?option=com_content&view=article&id=96

Posted by Paul Mison on Saturday, August 29 2009 8:38pm UTC
Thanks Paul! I'll have a look at that guide and see what I can do.

Posted by Michal Migurski on Sunday, August 30 2009 1:06am UTC
Hey, just one more note about the whole OSM mirrors deal - I'm very interested in the idea of a public data set of Planet.osm and/or a pg_data representation of it (since importing Planet.osm is time-consuming and PostGIS is the preferred backend for rendering) - but in the meantime, the king-nerd mirror ( http://planet.king-nerd.com/?C=M;O=A ) is absurdly fast from an EC2 instance - it might be actually an EC2 instance, I just haven't checked. Currently I'm pulling from it at 4-10MB/s, getting a Planet.osm file in about 25 minutes.

Posted by Tom MacWright on Thursday, September 10 2009 6:00pm UTC
Wow, thanks for the tip Tom. I tried to push planet.osm to S3 but got a "file too big" error, maybe this other mirror is the right thing. One other way to deal with the issue is to do very coarse sub-planets. I imagine just eastern/western hemisphere divisions might be enough to fit within S3's size limit.

Posted by Michal Migurski on Thursday, September 10 2009 10:00pm UTC
I'm loving this whole notion. Quick question: can the renderer be instructed to render everything to every zoom level, producing a complete tileset that could then be slurped off the cloud and used in prosaic, static ways (publicity, slippymaps on extant real servers, etc)? Or is it driven by requests?

Posted by Andy Gates on Friday, September 25 2009 12:25pm UTC
Andy, the renderer is TileCache, which has a tilecache_seed command that can be told to pre-render a full set of tiles. It's driven by requests, but everything gets bagged off into a directory of images that you can use. You could definitely slurp it up into a prosaic, static context!

Posted by Michal Migurski on Friday, September 25 2009 5:02pm UTC
Well, that's my weekend planned out then. :)

Posted by Andy Gates on Saturday, September 26 2009 8:38am UTC
Michal -- shoot me an e-mail (http://mailhide.recaptcha.net/d?k=014Gowk4GIygii5Sr2It1xIw==&c=g3Vsjx4cYtPzj9fPiw1GUFaH_YxjtygLbolGFAZH4Zk=) and I will forward you the e-mail conversation I had with Amazon about setting up Public Data Set. At the time, it seemed like they didn't get how we accessed the data. I imagine this would be a good example for them to look at.

Posted by Ian on Friday, October 16 2009 6:26pm UTC

Sorry, no new comments on old posts.

permanent link | tecznotes