Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Dec 8, 2014 11:28pm

bike ten: schwinn touring

Between the yellow cargo bike and the green commuter bike, I have a mid-90s Trek that I use for errands around town. It’s not very interesting, so it’s time to replace it with a new project like the others.

Earlier in the fall, I picked up this 1985 Schwinn Tour De Luxe frame:

Here it is in its original glory, minus the Surly fork and Box Dog sticker. I’m borrowing a page from the religious fervor for agile/lean development process at work and building this one up starting with the pile of bike parts I already have in the garage.

I had to pick up the stem, front wheel, brake levers, and backwards-mounted bullhorn handlebars. Everything else is from other bike projects. I’ve started riding it a bit this week, and I’m really happy with the stiff frame and upright riding position. I did find one problem with the rear brake bosses: one of them is slightly loose, so I’m using a caliper brake instead of a cantilever until I can get a welder to help.

The back wheel is clearly about to fall apart. I want to replace it with an internally-geared hub, probably one of the Sturmey-Archer 5-speeds that can fit in the 126mm of rear spacing.

Dec 2, 2014 9:37pm

more open address machine

I spent a substantial portion of Thanksgiving break working on Openaddresses Machine and publishing new data to data.openaddresses.io. Introducing any kind of reliable automation to a process like this is going to be a bumpy ride.

I like Ivan Sagalaev’s take:

Ever since I made an automatic publishing infrastructure for highlight.js releases… there wasn't a single time when it really worked automatically as planned!
There was always something: directory structure changes that require updates to the automation tool itself, botched release tagging, out of date dependencies on the server, our CDN partners having their own bugs with automatic updates, etc. … You can't really automate anything. You just shift maintenance from your thing to your automation tools. But! It still makes sense because by introducing automation you can do more complex things at the next level and keep maintenance essentially constant.

So that’s where the non-daylight hours of my holiday weekend went: shifting maintenance from the thing to the automation tool so OA can do more complex things. Previously, I was running the OA data process using a multi-step process:

  1. Start up a new EC2 server, stepping through the console wizard.
  2. Clone the machine code to the new server.
  3. Run chef to install all the pieces.
  4. Run openaddr-process and wait until it completes.
  5. Kill the server.

I’ve introduced a new script, openaddr-ec2-run, that pulls the steps above into just one, and learned a bunch of annoying things along the way.

On Monday, I encountered the excitement of Ruby dependency hell (“either you or the maintainer of a gem you depend on will fuck up the dependencies at some point”) for the first time when Opscode released ohai 7.6.0 and ruined a bunch of people’s days. Running an automation process that relies on external services like RubyGems or NPM can be a risky business, but on balance I prefer this type of risk to the delaying strategy of virtualual images, Docker, and friends. It’s a way to keep maintenance constant as Ivan says. Opscode fixed the problem, I removed my workaround, and only my already-frayed trust in the Ruby ecosystem was harmed.

Today, I encountered a set of finicky NPM issues connected to machine’s use of EC2’s user data shell scripts. iconv, or really node-gyp I guess, really wants HOME to be present in the environment variables and will not build if it’s not found. Fixing this took a bit of debugging with env, and I discovered some more Ruby derp along the way:

This error happens because { } has two different meanings in Ruby: Hash value expressions and method blocks. If a procedure is called in poetry mode (no parens) then there is an ambiguity if the parser encounters a { after a method name.

“Poetry mode” is a thing in Ruby, and it will fuck your shit up because Matz didn’t read PEP 20:

There should be one - and preferably only one - obvious way to do it.


Boto and Amazon Web Services are blessedly stable; I was able to copy-paste code from three years ago and have it just work to guess a reasonable EC2 spot request bid and start up a server instance. One interaction I introduced was the result of finding a runaway 8xlarge EC2 instance from earlier this month that I had forgotten about and continued to pay for (it adds up): the instance terminates itself when it’s done, the run script monitors the instance on a loop, and canceling the script with a KeyboardInterrupt at any point will immediately terminate the instance and cancel the reservation. Just because a computer is in the sky doesn’t mean I don’t want a convincing illusion of running it in my own terminal.

Nov 19, 2014 10:07pm

open address machine

The OpenAddresses project is super-interesting right now:

OpenAddresses is a global repository for open address data. In good open source fashion, OpenAddresses provides a space to collaborate. Today, OpenAddresses is a downloadable archive of address files, it is an API to ingest those address files into your application and, more than anything, it is a place to gather more addresses and create a movement: add your government’s address file and if there isn’t one online yet, petition for it. —Launching OpenAddresses.

OA is the free and open global address collection, but it’s just getting off the ground. Ian Dees of longtime OpenStreetMap involvement kicked off the project early this year when OSM balked at bulk address imports. It’s more sensible as a separate project anyway.

I’ve been working on data.openaddresses.io to make the project more legible and responsive.

I’m about six months late to the party, but there’s a ton to do right now. Thinking back on my own involvement in OSM, I remembered that around 2006 the street map tiles were being updated infrequently, and my own willingness to add data was gated by the turnaround time of seeing my input on the real, live map. I’d add some stuff, then twiddle my thumbs for days (or weeks) while the render refreshed. My satisfaction from adding data improved with every advance in OSM’s rendering stack re-render time. Seeing your effect on the data set is an important motivational factor.

OA has a similar issue for me. It’s implemented as a giant bag of JSON files stored in Github, so it’s not immediately obvious where the data lives, how up-to-date it is, or (if you’re submitting new files) whether a data source even works. The processing code works, but it’s not immediately obvious how to make all the pieces fit together.

I have been working on machine, a harness for running the whole process on a more regular cycle. There’s a bunch of interesting moving pieces.

I’ve taken Andy Allan’s chef advice to heart and created a chef recipe collection for preparing OA to run on a bare Ubuntu 14.04 machine. Chef is a no-brainer for me now, and I use it for everything that stands any chance of being important. Andy says:

Configuration management really kicks in to its own when you have dozens of servers, but how few are too few to be worth the hassle? It’s a tough one. Nowadays I’d say if you have only one server it’s still worth it – just – since one server really means three, right? The one you’re running, the VM on your laptop that you’re messing around with for the next big software upgrade, and the next one you haven’t installed yet.

If you want to add a skeletal chef script to any existing repository, start here:

git pull https://github.com/migurski/chefbase.git master

The whole OA codebase is now possible to run on a scratch machine, which means that once each week I can start an EC2-XXXL server and have it set up with complete OA code in minutes. It takes a few hours to run everything. We can keep data.openaddresses.io up-to-date with the status of the data, including a fresh map of data from US states and counties (even though OA is international), a complete listing of cached and processed status for all data, and small data samples to provide hints for correctly mapping (“conforming”) source data to OA’s needs.

There remains a lengthy ticket backlog, but I am hoping that OA provides a way to better expose and unify the world’s municipal government spatial data. Today, addresses. Tomorrow, parcels.

Apr 24, 2014 9:07am

making the right job for the tool

Near the second half of most nerd debates, your likelihood of hearing the phrase “pick the right tool for the job” approaches 100% (cf. frameworks, rails, more rails, node, drupal, jquery, rails again). “Right tool for the job” is a conversation killer, because no shit. You shouldn’t be using the wrong tool. And yet, working in code is working in language (naming things is the second hard problem) so it’s equally in-bounds to debate the choice of job for the tool. “Right tool” assumes that the Job is a constant and the Tool is a variable, but this is an arbitrary choice and notably contradicted by our own research into the motivations of idealistic geeks. Volunteers know the tools they know, and are looking for ways to use their existing powers for good. They are selecting a job to fit the tool. Martin Seay’s brilliant essay on pop music, Ke$ha’s TiK ToK, pro wrestling and conservatism critiques the type of realist resignation that assumes the environment (the job) is immutable:

It is a sterling example of what a number of commentators—I’ll refer you to k-punk—have characterized as the fantasy of realism: an expedient and comfortable confusion of what is politically difficult with what is physically impossible. … This kind of “realism” offers something even more desirable than a clear-eyed assessment of your current circumstances, namely the feeling that you’ve made such an assessment, and that you’ve come away with the conclusion that this is as good as it gets. … This is professional wrestling again: the comforting notion that you know what you need to know, that everything is clear.

At some level, our tools come preselected. At Day Job, we have tool guy Eric Ries on our board, and by design stick to the universes of web scripting languages and user needs research. Going in to a government partnership, we know that the set of jobs for which we are suited is bounded by time and scale. Instead, we look for opportunities where governments are creating the wrong jobs based on the tools they have available. One example is Digital Front Door, an emerging project on publishing and content management where we’re looking at the intertwined evils of CMS software and omnibus vendor contracts. Given a late 90s consensus on content publishing, it seems inevitable that every website project must result in a massive single-source RFP, design, and migration effort. So much risk to pile onto a single spot. How would a city government change the scope of a job if it knew it had other tools available? Would the presence of static site generators and workflows based on git-style sharing models influence the redefinition of the job to be smaller, lower-risk, more agile? I think yes.

“Pick the right tool” is common-sense advice that elides a more interesting set of possibilities. When you can redefine the job, the best tool may be the one you already have.

Apr 12, 2014 10:03pm

the hard part

The hard part of coming to State of the Map is that I’m only a little bit connected to the OpenStreetMap project right now, and not spending most of my time on geospatial open source like I used to. I’ll come back to it, but today I’ve had a number of conversations about projects of mine, their status, and whether I have abandoned them. Metropolitan Extracts have not yet been run during 2014, TileStache is stable but has a few outstanding pull requests, and it’s high time I merged Walking Papers with Stamen’s more-stable Field Papers offshoot. Thankfully Vector Tiles remain happily running on the US OSM server.

I wish I could say I had easy answers for these projects; they seem genuinely useful to people but not something I can maintain at the moment and not something I can exactly delagate at CfA.

December 2014
Su M Tu W Th F Sa

Recent Entries

  1. bike ten: schwinn touring
  2. more open address machine
  3. open address machine
  4. making the right job for the tool
  5. the hard part
  6. end the age of gotham-everywhere
  7. on this day
  8. write code
  9. managers are awesome / managers are cool when they’re part of your team
  10. bike eight: french parts
  11. being a client
  12. bike seven: building a cargo bike
  13. blog all video timecodes: how buildings learn, part 3
  14. talk notes, urban airship speaker series
  15. john mcphee on structure
  16. blog all oft-played tracks V
  17. tiled vectors update, with math
  18. disposable development boxes: linux containers on virtualbox
  19. week 1,851: week one
  20. tilestache 0.7% better