...so take it easy.
My name is Michal Migurski. Until December 2012, I was technology head at Stamen, a San Francisco design and development studio focused on data visualization and map-making. You might remember me from such recent projects as Oakland Crimespotting, Walking Papers, Maps From Scratch, Digg Labs and API, Modest Maps, Mappr, or Reblog. Below, you will find my weblog, tecznotes, my link blog (high-frequency, short posts), and with a collection of smaller and older things I've worked on.
Background photo by Fred.
Subscribe to this site.
- My GL-Solar, Rainbow Road edition
- VectorMill by Bobby Sudekum
- TopoJSON vector maps by Nelson Minar
- Vector Tiles demo by Mike Bostock
- Polymaps railways and SF: buildings & railways by Paul Mison
- Steve Gifford’s iOS WhirlyGlobe demo
We just got new memory and faster storage on the OpenStreetMap US server, so I’m getting more comfortable talking about these as a proper service. Everything is available in TopoJSON and GeoJSON, rendering speeds are improving, and I’m adapting TopoJSON to Messagepack to support basic binary output (protobufs are a relative pain to create and use). I’m also starting to pay attention to the lower zoom levels, and adding support for Natural Earth data where applicable. So far, you’ll see that active in the Water and Land layers, with NE lakes, playas, parks and urban areas.
Last month, I followed up on Nelson Minar’s TopoJSON measurements from his State of the Map talk to make some predictions on the overall resource requirements for pre-rendering the earth. The remainder of this post is adapted from an email I sent to a small group of people collaborating on this effort.
Measuring Vector Tiles
I was interested to see how the vector tile rendering process performed in bulk, so I extracted a portion of the OSM + NE databases I’m using for the current vector tiles, and got to work on EC2. I found that render times for TopoJSON and GeoJSON “all” tiles were similar, and tile sizes for TopoJSON the usual ~40% smaller than GeoJSON.
What I most wanted from this was a sense of scope for pre-rendering worldwide vector tiles in order to run a more reliable service on the hardware we have.
I’m not thrilled with the results here; mostly they make me realize that to truly pre-render the world we’ll want to stop at ~Z13/14, and use the results of that process to generate the final JSON tiles sen by the outside world. This would actually be a similar process to that used by Mapbox, with the difference that Mapbox uses weirdly-formatted vector tiles to generate raster tiles, while I’m thinking to use weirdly-formatted vector tiles to generate other, less-weirdly formatted vector tiles. This is all getting into apply-for-a-grant territory, and I continue to be excited about the potential for running a reliable source of these tiles for client-side rendering experiments.
My area of interest is covered by this Z5 tile, including most of CA and NV: http://tile.openstreetmap.org/5/5/12.png
I used tilestache-seed to render tiles up to Z14, to fit everything into about a day. There ended up being about 350k tiles in that area, 0.1% of the total that Mapbox is rendering for the world. Since they I’m including several major cities, I’m guessing that a tile like 5/5/12 represents an above-average amount of data for vector tiles.
Times are generally comparable between the two, and I assume that it should be possible to beat some additional performance out of these with a compiled python module or node magic or… something. It will be interesting to profile the actual code at some point, I don’t know if we’re losing time converting from database WKB to shapely objects, gzipping to the cache, or if this is all good enough. Since our Z14s are not sufficient for rendering at higher zooms, I’ll want to mess with the queries to make something that could realistically be used to render full-resolution tiles from Z14 vector tiles.
The similar rendering times between the two surprised me; I expected to see more of a difference. I was also surprised to see the lower-zoom TopoJSON tiles come out faster. I suspect that with more geometry to encode at those levels, the relative advantage of integers over floats in JSON comes into play.
|TopoJSON All||4h19m||4% faster|
|GeoJSON Z14||2.67 /sec.|
|TopoJSON Z14||1.71 /sec.||36% slower|
|GeoJSON Z12||1.24 /sec.|
|TopoJSON Z12||1.35 /sec.||9% faster|
|GeoJSON Z10||0.23 /sec.|
|TopoJSON Z10||0.25 /sec.||9% faster|
Nelson’s already done a bunch of this work, but it seemed worthwhile to measure this specific OSM-based datasource. TopoJSON saves more space at high zooms than low zooms. I measured the file length and disk usage of all cached tiles, which are stored gzipped and hopefully represent the actual size of a response over HTTP.
|TopoJSON All||1.7GB||19% smaller|
|TopoJSON All||527MB||43% smaller|
|TopoJSON Z14||365MB||46% smaller||6.7KB||75.2KB|
|TopoJSON Z13||107MB||35% smaller||7.5KB||112KB|
|TopoJSON Z12||41.8MB||33% smaller||11.4KB||80KB|
Hey look, a month went by and I stopped blogging because I have a new job. Great.
One of my responsibilities is keeping an eye on our sprawling Github account, currently at 326 repositories and 151 members. The current fellows are working on a huge number of projects and I frequently need to be able to quickly install, test and run projects with a weirdly-large variety of backend and server technologies. So, it’s become incredibly important to me to be able to rapidly spin up disposable Linux web servers to test with. Seth clued me in to Linux Containers (LXC) for this:
LXC provides operating system-level virtualization not via a full blown virtual machine, but rather provides a virtual environment that has its own process and network space. LXC relies on the Linux kernel cgroups functionality that became available in version 2.6.24, developed as part of LXC. … It is used by Heroku to provide separation between their “dynos.”
I use a Mac, so I’m running these under Virtualbox. I move around between a number of different networks, so each server container had to have a no-hassle network connection. I’m also impatient, so I really needed to be able to clone these in seconds and have them ready to use.
This is a guide for creating an Ubuntu Linux virtual machine under Virtualbox to host individual containers with simple two-way network connectivity. You’ll be able to clone a container with a single command, and connect to it using a simple <container>.local host name.
The Linux Host
Create a new Virtualbox virtual machine to boot from the Ubuntu installation ISO. For a root volume, I selected the VDI format with a size of 32GB. The disk image will expand as it’s allocated, so it won’t take up all that space right away. I manually created three partitions on the volume:
- 4.0 GB ext4 primary.
- 512 MB swap, matching RAM size. Could use more.
- All remaining space btrfs, mounted at /var/lib/lxc.
Btrfs (B-tree file system, pronounced “Butter F S”, “Butterfuss”, “Better F S”, or “B-tree F S") is a GPL-licensed experimental copy-on-write file system. It will allow our cloned containers to occupy only as much disk space as is changed, which will decrease the overall file size of the virtual machine.
During the OS installation process, you’ll need to select a host name. I used “ubuntu-demo” for this demonstration.
Host Linux Networking
Boot into Linux. I started by installing some basics, for me: git, vim, tcsh, screen, htop, and etckeeper.
Set up /etc/network/interfaces with two bridges for eth0 and eth1, both DHCP. Note that eth0 and eth1 must be commented-out, as in this sample part of my /etc/network/interfaces:
## The primary network interface #auto eth0 #iface eth0 inet dhcp auto br0 iface br0 inet dhcp dns-nameservers 220.127.116.11 bridge_ports eth0 bridge_fd 0 bridge_maxwait 0 auto br1 iface br1 inet dhcp bridge_ports eth1 bridge_fd 0 bridge_maxwait 0
Back in Virtualbox preferencese, create a new network adapter and call it “vboxnet0”. My settings are 10.1.0.1, 255.255.255.0, with DHCP turned on.
Shut down the Linux host, and add the secondary interface in Virtual box. Choose host-only networking, the vboxnet0 adapter, and “Allow All” promiscuous mode so that the containers can see inbound network traffic.
The primary interface will be NAT by default, which will carry normal out-bound internet traffic.
- Adapter 1: NAT (default)
- Adapter 2: Host-Only vboxnet0
Start up the Linux host again, and you should now be able to ping the outside world.
% ping 18.104.22.168 PING 22.214.171.124 (126.96.36.199) 56(84) bytes of data. 64 bytes from 188.8.131.52: icmp_req=1 ttl=63 time=340 ms …
Use ifconfig to find your Linux IP address (mine is 10.1.0.2), and try ssh’ing to that address from your Mac command line with the username you chose during initial Ubuntu installation.
% ifconfig br1 br1 Link encap:Ethernet HWaddr 08:00:27:94:df:ed inet addr:10.1.0.2 Bcast:10.1.0.255 Mask:255.255.255.0 inet6 addr: …
Next, we’ll set up Avahi to broadcast host names so we don’t need to remember DHCP-assigned IP addresses. On the Linux host, install avahi-daemon:
% apt-get install avahi-daemon
In the configuration file /etc/avahi/avahi-daemon.conf, change these lines to clarify that our host names need only work on the second, host-only network adapter:
Then restart Avahi.
% sudo service avahi-daemon restart
Now, you should be able to ping and ssh to ubuntu-demo.local from within the virtual machine and your Mac command line.
No Guest Containers
So far, we have a Linux virtual machine with a reliable two-way network connection that’s resilient to external network failures, available via a meaningful host name, and with a slightly funny disk setup. You could stop here, skipping the LXC steps and use Virtualbox’s built-in cloning functionality or something like Vagrant to set up fresh development environments. I’m going to keep going and set up LXC.
Linux Guest Containers
% sudo apt-get lxc
Initial LXC setup uses templates, and on Ubuntu there are several useful ones that come with the package. You can find them under /usr/lib/lxc/templates; I have templates for ubuntu, fedora, debian, opensuse, and other popular Linux distributions. To create a new container called “base” use lxc-create with a chosen template.
% sudo lxc-create -n base -t ubuntu
This takes a few minutes, because it needs retrieve a bunch of packages for a minimal Ubuntu system. You’ll see this message at some point:
## # The default user is 'ubuntu' with password 'ubuntu'! # Use the 'sudo' command to run tasks as root in the container. ##
Without starting the container, modify its network adapters to match the two we set up earlier. Edit the top of /var/lib/lxc/base/config to look something like this:
lxc.network.type=veth lxc.network.link=br0 lxc.network.flags=up lxc.network.hwaddr = 00:16:3e:c2:9d:71 lxc.network.type=veth lxc.network.link=br1 lxc.network.flags=up lxc.network.hwaddr = 00:16:3e:c2:9d:72
An initial MAC address will be randomly generated for you under lxc.network.hwaddr, just make sure that the second one is different.
Modify the container’s network interfaces by editing /var/lib/lxc/base/rootfs/etc/network/interfaces (/var/lib/lxc/base/rootfs is the root filesystem of the new container) to look like this:
auto eth0 iface eth0 inet dhcp dns-nameservers 184.108.40.206 auto eth1 iface eth1 inet dhcp
Now your container knows about two network adapters, and they have been bridged to the Linux host OS virtual machine NAT and host-only adapters. Start your new container:
% sudo lxc-start -n base
You’ll see a normal Linux login screen at first, use the default username and password “ubuntu” and “ubuntu” from above. The system starts out with minimal packages. Install a few so you can get around, and include language-pack-en so you don’t get a bunch of annoying character set warnings:
% sudo apt-get install language-pack-en % sudo apt-get install git vim tcsh screen htop etckeeper % sudo apt-get install avahi-daemon
Make a similar change to the /etc/avahi/avahi-daemon.conf as above:
Shut down to return to the Linux host OS.
% sudo shutdown -h now
Now, restart the container with all the above modifications, in daemon mode.
% sudo lxc-start -d -n base
After it’s started up, you should be able to ping and ssh to base.local from your Linux host OS and your Mac.
% ssh firstname.lastname@example.org
Cloning a Container
Finally, we will clone the base container. If you’re curious about the effects of Btrfs, check the overall disk usage of the /var/lib/lxc volume where the containers are stored:
% df -h /var/lib/lxc Filesystem Size Used Avail Use% Mounted on /dev/sda3 28G 572M 26G 3% /var/lib/lxc
Clone the base container to a new one, called “clone”.
% sudo lxc-clone -o base -n clone
Look at the disk usage again, and you will see that it’s not grown by much.
% df -h /var/lib/lxc Filesystem Size Used Avail Use% Mounted on /dev/sda3 28G 573M 26G 3% /var/lib/lxc
If you actually look at the disk usage of the individual container directories, you’ll see that Btrfs is allowing 1.1GB of files to live in just 573MB of space, representing the repeating base files between the two containers.
% sudo du -sch /var/lib/lxc/* 560M /var/lib/lxc/base 560M /var/lib/lxc/clone 1.1G total
You can now start the new clone container, connect to it and begin making changes.
% sudo lxc-start -d -n clone % ssh email@example.com
I have been using this setup for the past few weeks, currently with a half-dozen containers that I use for a variety of jobs: testing TileStache, installing Rails applications with RVM, serving Postgres data, and checking out new packages. One drawback that I have encountered is that as the disk image grows, my nightly time machine backups grow considerably. The Mac host OS can only see the Linux disk image as a single file.
On the other hand, having ready access to a variety of local Linux environments has been a boon to my ability to quickly try out ideas. Special thanks again to Seth for helping me work through some of the networking ugliness.
Tao of Mac has an article on a similar, but slightly different Virtualbox and LXC setup. They don’t include the promiscuous mode setting for the second network adapter, which I think is why they advise using Avahi and port forwarding to connect to the machine. I believe my way here might be easier.
Shift describes a Vagrant and LXC setup that skips Avahi and uses a plain hostnames for internal connectivity.
I’ll write more about the actual thing soon, but for now I’m just basking in the weirdness of being in an office again. I have a desk and a calendar and colleagues and blooming, buzzing confusion. The code I’ve written this week is more angry, productive birds than TileStache or Extractotron, and that’s a funny feeling. There’s a shower at work, so I can ride my bike in and not offend people. The office is in SOMA, so I’ve started bringing my lunch. No one knew me when I was 25 (well, almost) and the organization is young, so the potential feels sky-high.
Soon, I will wear my new track jacket:
I’ve known Jen, Abhi, Meghan and the CfA team for many years, but I also spent most of April doing a big, serious grown-up job search. This was an entirely new experience for me, and very educational. I learned that recruiting is a real job for actual smart people at companies looking for talent, and that some people are mediocre at it while others are amazing. I learned that I enjoy technical interviews, the ones with whiteboards or pair-coding machines and one hour to solve a technical problem. Every single one that I did was fun, as long as I remembered to narrate my process and say “I don’t actually know what I’m doing here” where appropriate. I talked to some of the smartest, most interesting people I’ve ever dealt with and got to spend a whole month playing what-if with a variety of employers. I don’t know if this kind of abundance is something I’ll ever experience again, so I tried to savor it as much as possible.
TileStache, the map tile rendering server I’ve been working on since 2010, hit version 1.47 this weekend. The biggest change comes from Seth, who streamlined and expanded TileStache’s HTTP chops with the new TheTileLeftANote exception. The documentation needs an update, but the gist is that it’s now possible to customize tile HTTP responses from deep inside the rendering pipeline, with control over headers, status codes, and content. I’m excited that this didn’t require a backwards-incompatible change to the API, and that it’s now possible to tweak behavior in concert with Apache X-Sendfile or NGinx X-Accel.
Google Maps gives a nice unintentional before & after view of the construction along the south end of Lake Merritt in Oakland, if you turn the 45° aerials off and on.
The gated-up and pissed-drenched pedestrian tunnels are gone. The connection to the bay is wider. There’s a separate pedestrian bridge, more grass, and proper crosswalks to the courthouse and museum.
Modest Maps is a BSD-licensed display and interaction library for tile-based maps in Adobe Flash 7+, written in ActionScript. This is an active project I'm working on with Darren, Shawn, and Tom.
Mappr is a geographic browser of Flickr's photo collection. I wrote a large portion of this application with Tomas and Eric, notably the place-name matching and geolocation bits, and pretty much the entire back-end.
Jitter and 3D Geometry
Updated experiments in 3D geometry handling using OpenGL and PHP.
Photos taken from the roof of the SOMA-SF warehouse space I lived in, summer of 2002.
Collages of freeway satellite imagery to satisfy a fetish for complex interchanges.
Quickdraw and basic 3D
Rough experiments in 3D rendering basics and matrix math.
moveon: fahrenheit 9/11 national town meeting / part of a nationally-broadcast conversation between Michael Moore and MoveonPAC directors.
stamen google news visualizer / data visualisation experiment intended to give a high-level view of who's making news at the moment, and who made the news at specified times in the past.
bmw design priorities / rich internet application development in collaboration with DesignworksUSA Advanced Communications Group
moveon: bush uncovered / map of moveon.org's bush uncovered event series
naral/pro-choice america / map of the march for women's lives
sflnc / web dev political activism on behalf of the san francisco late night community
bipole / audio-video synchronicity courtesy of me & andy w.
video riot / “an edgy electronic tailgate party and a real-time drive-in multiplex”
viberation / event production, multimedia installations, dancing all night
Map Projection / a collection of classes used to project GPS data points onto maps, implemented in PHP 4
OSC hub / PHP-based client and server for Open Sound Control, optimized for use with Max/MSP implementation.
flash component of the H&K global website, a database-driven worldwide office map
coho / content management display component, for Apache/PHP/MySQL
sordid / command-line mp3 sorting utility for mac OS X, unix