We’ve been working on an update to the technology behind OpenAddresses, and it’s now being used in public.
OpenAddresses is a global repository for open address data. In good open source fashion, OpenAddresses provides a space to collaborate. Today, OpenAddresses is a downloadable archive of address files, it is an API to ingest those address files into your application and, more than anything, it is a place to gather more addresses and create a movement: add your government’s address file and if there isn’t one online yet, petition for it. —Launching OpenAddresses.
Timely feedback is vital to a shared data project like OA, something I learned many years ago when I started contributing to OpenStreetMap. In 2006, tiles rendered many days after edits were made, and the impossibility of seeing the results of your own work gated participation. Today, the infrastructure behind OSM makes it easy to see changes immediately and incorporate them into other projects, feedback vital to keeping editors motivated.
Last year, we automated the OpenAddresses process to cut the update frequency from weeks or months to days. Now, we’d like to cut that frequency from days to hours or minutes.
OpenAddresses is run from Github. If you host a code project there and you’re serious about code quality, it’s likely that you’ve configured Travis or Circle to automatically run your test suite as you work. For external contributors sending pull requests to a project, CI services make it easy to see whether changes will work:
OA has always used Travis CI to verify the syntax correctness of submitted data, and it will tell you that your JSON is valid and that you’re using the right tags. We wanted to be able to see the true results of integrating that data into OA. Ordinarily, Travis itself might be a good tool for this job, but OA sources can take many hours to run, and a single PR might include changes to many sources. So, we needed to roll our own continuous integration service.
Creating our own service to run OA source submissions required three parts:
- A web service to listen for events from Github.
- A pool of workers to act on those events.
- Communication back to Github.
The web service is the easiest part, and consists of a simple Flask-based application listening for events from Github. These events can be signed with a secret to ensure that only real requests are acted on. There are dozens of event types to choose from, but we care about just two: push events when data in the OA repository is changed, and pull request events when a contributor suggests new data from outside. Events come in the form of JSON data, and it takes a bit of rooting around in the Github API to determine what actual files were affected. Git’s underlying data model (more, easier) is helpful here, with commits linked to directory trees and trees linked to individual file blobs. Each event from Github is turned into a job of added or changed source data, and each individual source is queued up for work. Nelson chose PQ for the queue implementation since we’re already using Python and PostgreSQL, and it’s been working very well.
The worker pool is tricky. It’s wasteful to keep a lot of workers standing around and waiting, but you still want to act quickly on a new submissions so people don’t get antsy. There are also a lot of interesting things that can go wrong. Amazon’s EC2 service is a big help here, with a few useful features to use. Auto Scaling Groups make it easy to spin up new workers to do big jobs in parallel. We’ve set up a few triggers based on the size of the queue backlog to determine how large a group is needed. When there have been tasks waiting for a worker for longer than a few minutes, the pool grows. When no new tasks have been waiting for a few hours, the pool shrinks. We use Amazon Cloudwatch to continuously communicate the size of the queue. We have struck a balance here, aiming for results within hours or minutes rather than seconds, so we only grow the pool to a half-dozen workers or so.
Finally, Github needs to know about the work being done. As each task is completed, a status of "pending", "success", or "failure" is communicated back to Github where it is shown to a user along with a link to a detailed page. Commercial CI services use the Commit Status API to integrate with Github, and it’s available to anyone. The tricky part here is how to differentiate between failed jobs and ones that simply take a long time. In our case, we have a hard limit of three hours for any job, and judge a job to have failed when it’s been AWOL for more than three hours. Right now, we’re not retrying failed jobs.
There are still bugs and weird behaviors in the CI service, so I’m shaking those out as I watch it in action.
We are continuing to run the batch job process. There’s nothing else that can generate the summary page at data.openaddresses.io for the time being, so the new continuous integration feature is being used solely to inform data contributors within pull requests. I’d like to replace the batch job with a smaller one that schedules missing sources, renders maps, and summarizes output. Then we can kill off the old batch process.
Two weeks ago I was in New York for State Of The Map 2015, the annual OpenStreetMap conference.
2014’s event was in DC, and I had kind of a hard time with it. Good event, but dispiriting to come back to the community after a year away and encountering all the same arguments and buillshit as 2013 and before.
This year in New York was awesome, though. It felt as though some kind of logjam had been cleared in the collective consciousness of OpenStreetMap. Kathleen Danielson, new international board member, delivered a mic-drop talk on diversity. New York designers and urbanists were in attendance. A wider variety of companies than ever before sponsored and presented.
I was a last-minute addition to the program, invited by Brett Camper to take over moderation of his panel on vector rendering. There is video now on the website (here is a direct Youtube link for when the permalink-haters who do the site each year break all the current URLs).
We had four participnts on the panel.
Mapzen has been doing a bunch of interesting things with 3D, and when I visited the office Peter Richardson had a bunch of printed Manhattan tiles on his desk, including this one of 16/19300/24630 with Grand Central and the NYPL viewed from the north:
Konstantin Käfer from Mapbox works on the new GL rendering product, and he’s been producing a regular stream of new rendering work and data format output throughout the three years I’ve known him. Konstantin shared this gorgeous animated view of a map zooming from Boston to Melbourne, showing off dynamic text rendering and frame-by-frame adjustments:
Hannes Janetzek of OpenScienceMap produces a rendering product intended for scientific use. His work is used to support academic research, and he was unique on the panel for not having a commercial product on offer: “OpenScienceMap is a platform to enable researchers to implement their ideas, to cooperate with others, and to share their results.”
Steve Gifford of WhirlyGlobe-Maply is primarily in the consulting business, and his open source 3D globe rendering platform is used by the current number two app in the Apple app store, Dark Sky. The rendering output of Steve’s work is typically different from the other three, in that he’s mostly delivering zoomed-out views of entire regions rather than the street-level focus of OpenScienceMap, Mapzen, and Mapbox. It was telling that everyone except Steve identified text rendering and labeling as their primary difficulty in delivering vector-driven client-side renders.
Steve’s rendering pipelines commonly cover more raster rendering than the others, such as this screen from Dark Sky showing stormy weather over Ohio and Kentucky:
I produced a bunch of vector rendering work two years after I left Stamen, and I enjoyed moderating a panel on a topic I’m familiar with without having any skin in the game. It’s super exciting to see all of this happening now, and it feels a bit like OSM raster rendering in 2006/07, when Mapnik was still impossible to install but a growing group of people like Dane were nudging it forward into general accessibility. I give vector tiles and vector rendering another 1-2 years before it tips from weird research and supervised, commercial deployment into wide use and hacking.
I’ve installed an internally-geared hub on the Schwinn touring frame.
I had originally expected to get a Sturmey-Archer 5-speed, but numerous bike shops told me that they were seeing a high rate of returns on the Sturmey’s, so I picked a Shimano instead. The rear dropout spacing on the Schwinn is just 126mm while the hub is 130mm across. It needs a bit of stretching to get the wheel in.
I used the equipment and help at the fabulous SF Bike Kitchen to get this made. It needed two trips, one to put together the basics and a second shorter trip to true it all into shape. Adam’s help last year gave me the confidence to try this out, and Esther from Bike Kitchen staff helped me understand dish and tension.
I swapped in a pair of Velo Orange Montmartre handlebars to accommodate the shifter. They look a little nicer than the flipped-back bull horns I had previously, but the shifter itself is kind of a disaster.
It’s big and knobbly and really doesn’t fit anywhere useful. I placed it on the right bar-end like Sheldon Brown’s Raleigh International, but he’s got his on a drop bar while mine’s right under my wrist. I’m looking for alternatives here; most of the Shimano products for this purpose look like cheap stereo equipment. I did run into a guy on BART with an interesting stem-mounted Shimano shifter that looked to be made of milled steel, but apparently it’s a pre-release prototype by Mission Bicycle, and the only one currently on the road. They sounded surprised when I called to ask them about.
The internal hub is a dream, and overall the bike continues to take shape slowly.
Update: Trying this variant with a bar-end extension:
- made its way to iTunes in 2014,
- and got listened to a lot.
10. Voyou: Houseman
Gem pointed me to DJ Jeb Edwards’ Beat Bash mixes early in the year, each a themed collection of late 80’s and early 90’s dance music. Some industrial, some hip-house, some techno. Voyou’s Houseman was included on March 6th in a selection of Cold War tracks.
9. Pet Shop Boys: Bolshy
We saw Pet Shop Boys perform in Oakland in the spring, a fantastic show. I hadn’t realized they were touring in support of a new album. Fluorescent is another good track on that album.
8. Gidge: You
Neb proudly called out Gidge as a new addition to the slooowly-changing music rotation in his convertible.
7. Aphrodite’s Child: The Four Horsemen
I was curious to learn more about Vangelis, composer of the 1982 Blade Runner soundtrack. Aphrodite’s Child was his Greek progressive rock band formed in 1967, with Demis Roussos, Loukas Sideras, Anargyros Koulouris, and Vangelis Papathanassiou on keyboards. I found this song sticky, and was briefly scared that I like new age music. I had a song called Four Horsemen last year too, a popular theme.
6. µ-Ziq: Roy Castle
From the altogether-excellent Trance Europe Express vol. 3 compilation, which also introduced me to Biosphere and other key trance and ambient artists in college.
5. The Melvins: A History Of Bad Men
True Detective featured music selected by T Bone Burnett, and there’s a chunk of music here that I pulled together from track listings for the series. A History Of Bad Men can be heard in the background of the bar scene where Rust goes to meet Ginger.
4. Grinderman: Honey Bee (Let’s Fly To Mars)
More from True Detective. Honey Bee comes in right at the end of the infamous six-minute tracking shot.
3. Bosnian Rainbows: Eli
I played this one when Burrito Justice had me on his show in November. Haunting.
2. SIL: Windows (Original Mix)
Basically perfect 1991 Amsterdam prog-house.
1. Junior Vasquez: Live at Sound Factory 1993
A complete mix of New York disco and house.
The 2015 Code for America fellows show up next week from all over. In 2013, CfA published the book Beyond Transparency. I heard from last year’s class that many had read it cover-to-cover before starting. So, we gathered-I-mean-curated a pile of essays and blog posts on design, culture, and code, assembled them into a 400 page reader, and shipped one to each of our 24 incoming fellows.
We included gems like Leisa Reichelt’s Help Joy Help You, Dan Milstein’s Coding, Fast and Slow, the GDS banned words list, timeless design classic Deep Inside Taco Bell’s Doritos Locos Taco, and dozens of other bits of propaganda that we thought might come in handy this year.
We assembled the materials via Github and delivered scrubbed HTML files. Interior design was handled by Noel Callego via oDesk, and I made the cover. Lulu.com did the printing; overall turnaround was a little over three weeks for design, printing, and shipping.
Here are some photos by Frances: