- made its way to iTunes in 2015,
- and got listened to a lot.
1. Grimes: Flesh without Blood
2. Trust: Shoom
3. Klatsch!: God Save The Queer
4. Zed’s Dead: Lost You
5. The Communards: Disenchanted
6. Delia Gonzalez & Gavin Russom: Relevee (Carl Craig Remix)
7. The Smiths: Rubber Ring
8. Hot Chip: Need You Now
9. The All Seeing I: 1st Man In Space
10. Missy Elliott: WTF (Where They From) ft. Pharrell Williams
After 2½ years as Code for America CTO, I’m moving on to the next thing. Starting December 14, I’ll be joining a crew of former Stamen colleagues & clients, CfA friends, OpenStreetMappers, and co-geobreakfasters at Mapzen, part of Samsung Accelerator. If Mapzen was a game show, it’d be This Is Your Life. I’ll be combining my background in open source mapping and my more recent experience working on CfA technology products to lead a team making writings, demos, tools, and entry points for Mapzen’s work on routing, search, transit, and the brainmelting beauty of Tangram. We’re actively hiring (especially front-end developers), so please get in touch.
I will miss Code for America greatly, particularly the technology and product crew we built to deliver new communications and engagement approaches for digital government, the three years of fellowship classes we collaborated with, the whole staff of people making it work, and that one time my team dressed like me for April Fools.
This month is an especially hard time to go, with a major victory from Dan Hon on Child Welfare Services technology procurement—if you are a California design or dev shop, bid on this project to literally save children’s lives. It’s also an auspicious time to go, with a few key colleagues like Cyd Harrell and Frances Berriman heading out and a break between the 2015 and 2016 fellowship classes.
I‘ve started doing long monthly rides with a group of fellow Stamen alums. On honor of Eric, we call ourselves The Rodenbikes. At first, I was using the Schwinn touring bike with an internal hub, but after the July ride toting beers and burritos I decided it was time to switch to a bike better-suited to longer rides. One of us is training for the 2016 AIDS Lifecycle, and my heavy, crunchy retro-grouch bike was leaving me far behind.
Earlier in the year, I had already bought a used old-style Trek frame and wasn’t yet sure what style of bike I wanted to use it for. I decided to make it into a road bike:
This is my first regular road bike, with gears and slicks.
The frame is a 1982 Trek 311 “multi-purpose sport.” It’s at the low-end of the 1982 product line, using slightly-cheaper tubing and (I imagine) lower-grade components.
The paint job is in fine condition, and I bought it as a raw frame with no attached parts other than a headset to hold the fork on. I had been looking for something with the classic vertical Trek logo:
One of the first challenges I encountered on this project was parts selection. Initially, I attempted to piece together a groupset from separate purchases, pricing cranks and derailleurs individually and trying to arrive at a complete bicycle. After a few weeks of research and talking with bike stores, I learned that it would make more sense to buy a complete groupset from a single manufacturer that was known to work as a unit. I decided on the Shimano Tiagra groupset, the fourth-tier kit for road cycles. Researching bike components is surprisingly difficult. Shimano’s website is sloppy and unreliable, and seems to be written for an audience of mostly distributors and retailers.
Missing Link Bicycle Co-Op had the most helpful sales people, and assisted me in thinking through my options and their effects on performance and weight. I decided to buy all the parts with them, except for the wheels.
I bought the wheels used instead, to take advantage of lower prices and easy compatibility with major manufacturer parts from Shimano.
The other big challenge was cable routing, something I’d never done before with a road bike. On most modern bikes, there are cable stops on the frame and a plastic cable guide that screws into the bottom bracket:
The Trek 311 frame was built for downtube shifters, and lacked stops or a threaded hole for the guide. I had to improvise somewhat, and found Origin8 cable stops as well as a used metal cable guide in my parts bin from previous projects.
It looks like this from the bottom, with a pair of Origin8 singles routing the front derailleur cable under the bottom bracket and the vintage Shimano guide holding the rear derailleur cable taut above:
Finally, as the sixth bike in my one-car garage I had run out of room along the walls. Since this bike was going to be used for big occasional rides instead of regular commuting and shopping like the others, I rigged a pulley system to the ceiling to pull the bike up and out of the way when it wasn’t in use:
The ride has been great. We’ve done three big rides with it: 45 miles from SF to Halfmoon Bay and back, 80 miles around the Bay via Dumbarton Bridge, and 70 miles round the San Pablo Reservoir and along Richmond Bay Trail.
I’m in management these days, and I’m still an active developer of open source projects. I enjoy code, I want to keep my hands in open data initiatives like OpenAddresses, I value a connection to current practice, but it’s a way of spending time that competes directly with Day Job.
I’ve been working on a set of choices that make it possible to advance an open source software project in small, manageable increments, based around three values: predictability, accessibility, and repeatability.
My primary evenings-and-weekends project these days is OpenAddresses and the Machine repository that processes the data. My first commit to Machine was almost a full year ago, and it’s been evolving ever since despite being in continual contention with all the other evening-and-weekend responsibilities I have to my life and health away from the keyboard. At any given moment, development might be interrupted by dinner, a movie, a night out, or a long bike ride. This is a catalog of the tools and practices I’m using to make it possible to work on OA long-term, using time and energy sustainably along the way.
Easy To Predict
Semantic Versioning (Semver) is a requirement for any open source code project.
Under this scheme, version numbers and the way they change convey meaning about the underlying code and what has been modified from one version to the next. … Software using Semantic Versioning MUST declare a public API. This API could be declared in the code itself or exist strictly in documentation. However it is done, it should be precise and comprehensive. A normal version number MUST take the form X.Y.Z where X, Y, and Z are non-negative integers, and MUST NOT contain leading zeroes. X is the major version, Y is the minor version, and Z is the patch version. (Tom Preston-Werner)
Semver is not to be confused with Sentimental Versioning: “You may explain the system you create, if the beauty is enhanced by understanding it. You may just improvise new numbers from your mood on that day. The only important thing, is that the version number must be meaningful to you, the author.” The documentation clause of Semver requires that a public API be specified for the version numbers to mean anything. It’s a way of making and keeping promises to collaborators such as your bigger, meaner, future self.
If a project is anything other than a library of code, such as a website, a running service, or a specification, it becomes important to coordinate changes in a stable way. OpenAddresses is all of those things. Take this ticket from the Ops repository as an example: it’s a proposal to modify the data source syntax, which will impact both the code that processes OA data sources and the documentation in the sources repository. To maintain continuous stability over the lifetime of this change, each move needs to be small and atomic, able to be interrupted at any moment for any reason.
- Get input in Ops so we know we’re making a good choice.
- Pick a small part of the work to implement and deploy.
- Modify the Machine code so it supports the new behavior without breaking support for the old.
- Deploy changes to any running instance.
- Document the new behavior publicly, once it’s deployed and reflects reality.
I’m on a flight right now, so the last step above could be interrupted by turbulence or failed #planeclub wi-fi. The steps are in place to brace against instability. The image of coordination I try to keep in mind is crossing a river over a series of stepping stones: regain balance at each step, and never fall in when the wind kicks up.
Easy To Access
To ensure your work is accessible, you need documentation, you need eyes on that documentation, and you need software unit and integration tests on everything.
It's called Accessibility, and it's the most important thing in the computing world. The. Most. Important. Thing. … When software—or idea-ware for that matter—fails to be accessible to anyone for any reason, it is the fault of the software or of the messaging of the idea. It is an Accessibility failure. (Steve Yegge)
To make the Machine accessible to new contributors, we worked out a high-level set of documents that focus on four main areas: moving pieces like commands and scripts, persistent things like datastores, processes like the lifecycle of test jobs, and externalities like Amazon Web Services account information. This is the result, and this is where the work happened. Everything is cross-referenced and linked to source code, and I’ve attempted to ensure that everything can be linked with a stable URL.
So much of Machine lives in my head that I reached out to Nelson Minar for help. We set up a sort of interrogation, where I attempted to draw everything, he poked at my drawing with questions, and we added and removed detail until it made sense. Nelson recommended the four-part focus on components, processes, storage, and externals above, and the work is in this thread. The docs are not auto-generated. Even on older projects like TileStache, we wrote human documentation defining the API for Semver purposes that’s separate from machine documentation enumerating the code. They serve different purposes.
Tests are the other critical piece of this puzzle. In dynamic languages like Python, tests fill a need that overlaps with static types in other languages—they help ensure that a method does what it’s supposed to do, and back-fill the clarity exiled by rigid DRY. Machine is covered by bucketloads of tests, and I’ve been slowly improving my instincts for good unit tests as I’ve worked.
Easy To Repeat
Andy Allan’s 2012 post about getting started with Chef hooked me on using it for configuration management, for this reason:
Configuration management really kicks in to its own when you have dozens of servers, but how few are too few to be worth the hassle? It’s a tough one. Nowadays I’d say if you have only one server it’s still worth it–just–since one server really means three, right? The one you’re running, the VM on your laptop that you’re messing around with for the next big software upgrade, and the next one you haven’t installed yet. (Any Allan)
Previously, I’d always used shell scripts to configure services. Chef can be used the same way, with the key difference that a Chef recipe/cookbook/whatever defines an end-state rather than a process. It will “converge” the state of your system, taking whatever actions are needed to adjust it to match the declared intent (using a tedious, cringeworthy metaphor-cloud of kitchen words). In OpenAddresses, we have a few example pieces of Chef use:
- Complete cookbooks and recipes in the chef directory.
- A run script to call instead of chef-solo commands.
- Role files like this one where configuration details end up.
- Recipes consisting of resources such as this simple one.
At this point, I will probably never again run a server without involving configuration management in some form. The converge concept is too much of an improvement over simple bash scripts to ignore. If you’ve already started from a bash script, try adapting this simple base chef configuration with its one weird recipe. You don’t need to install a Chef server, or do any of the other crazy multi-machine fleet things either; chef-solo alone provides much of the benefit.
Lastly, it’s important to repeatability that your environment doesn’t slip around too much while you’re off doing something else. If months or years might pass between opportunities to devote significant time to a project, you don’t want your welcome back to be a continual module update juggling act. For this reason, stick to a language’s standard library wherever possible. Python advertises itself as “batteries included,” which will often rescue you from unpredictably changing requirements. Your code is the interesting part; rely only on modules outside a standard library if they’re stable, provide significant functionality, and themselves comply with Semver. At the system level, try to rely on package managers like Debian’s Apt and look for signs of stability in system releases such as Ubuntu’s Long-Term Support concept. Packaging is hard, espeically if you’re reliant on someone upstream from you doing it right:
I am not at all happy with the one package manager per programming language situation. I am old and crotchety and I’m tired of how every programming language keeps rediscovering just how fucking hard packaging and software distribution is. It really deserves to be elevated to hardest problem in computer science, ahead of cache invalidation and naming things. (JordiGH comments on Kevin Burke)
These practices and tools have made it possible to make steady, incremental progress on OpenAddresses with a group of collaborators over the past year. I’ve kept the work predictable, accessible, and repeatable. Even though Machine’s bus number is still close to 1, I’ve tried to keep the bus moving so slowly that it poses less of a risk while keeping the excitement focused on the 213m+ addresses we’ve managed to collect.
I enjoy answering questions about geospatial techniques, and I’ve been trying to do it on this blog where others might benefit. I’ve done one about historical map projections and another about extracting point data from OSM.
I got an email from Djordje Spasic, a Serbian architect working on a project in Barcelona and trying to understand how to use elevation data in a modeling environment:
I am trying to create a digital terrain model of Barcelona, based on an .asc file, but in Rhino 5 application without the usage of NumPy. I attached the .asc file below. … I googled a bit and looks like both cell width and height should be 0.000833333333 decimal degrees. … There is one problem with this: I noticed that terrain model looks a bit stretched in east-west direction. Meaning, that the width of the cell is a bit too large. This got me thinking: is this due to significant latitude to which .asc file corresponds to (41.35 North)? I thought that maybe I could somehow not use equal cell width and height.
Djordje got in touch because of a hillshading script I had written, though I’ve also written a full library for tiling digital elevation models. Elevation data can be a pain to work with if you’re not familiar with either geographic projections or tools for working with raster data. Fortunately, GDAL does both if you know how to use it, and it comes built-in to QGIS, the desktop GIS application.
Based on the 0.00083 degree resolution, his data probably comes from the Shuttle Radar Topography Mission (SRTM) 3-arc-second data set, which has worldwide coverage at 1/1200° resolution. It’s also often known as the 90-meter dataset, because 1/1200° of latitude is a little over 90m on Earth.
The tricky part is that degrees of longitude aren’t a consistent width on a spherical earth, and Djordje was getting confused about how to apply this dataset to an architectural model in a small, flat location. I pulled together some OpenStreetMap metro extract data for Barcelona, opened everything in QGIS, and set the projection to the UTM grid zone for Barcelona, 31N. UTM uses the conformal Mercator projection, and visualizing the layers immediately showed the stretching that Djordje described:
The cells are indeed about 90m North-South, but quite a bit less than that East-West due to the curvature of the earth’s surface. At the equator, they’d be square. At the poles, they’d be infinitely narrow. Here, the aspect ratio is about 1.2:1. Djordje’s first instinct was to simply scale the cells. That’d work for a small patch such as this, but would introduce distortions at larger sizes such as a whole city (approx. 27m of difference from southern to northern Barcelona over 1/4° of latitude).
It’s a relatively simple two-part operation to get the .asc file into a more correct form: first warp it, then translate it to a usable data format.
Warping on the command line is pretty easy:
gdalwarp -r cubic \ -s_srs EPSG:4326 -t_srs EPSG:32631 \ Barcelona_elevations.asc out.tif
Skipping the command line and warping in QGIS is also easy, using the menu command Raster ➤ Projections ➤ Warp.
EPSG:4326 is the spatial reference for unprojected degrees, and matches the source dataset of SRTM. EPSG:32631 is a spatial reference for Mercator meters in UTM zone 31N, a convenient choice for this location. I might also have chosen Google Maps Mercator if I didn’t care about meters specifically, or created a new projection centered on Barcelona if I wanted to be much more precise and have geographic North pointing exactly up. -r cubic uses a smoother interpolation function to generate new elevation values between the existing ones, similar to resizing a photo in Photoshop. The output file is a GeoTIFF, and looks like this in QGIS:
The individual pixels are now square, and at a known size of 78.9m on the ground. The grid is 34×34 instead of 34×29, after warping and interpolating new values to cover the same area. The visual differences between the two are minimal:
The format translation needs to be a second step, because GDAL and QGIS don’t know how to write to the ASCII grid format from the warp operation. I’m not sure why this is the case, but here is the second command:
gdal_translate -of AAIGrid out.tif out.asc
Translation is also available in QGIS as a menu command, under Raster ➤ Conversion ➤ Translate.