tecznotes

Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Dec 3, 2014 12:37am

more open address machine

I spent a substantial portion of Thanksgiving break working on Openaddresses Machine and publishing new data to data.openaddresses.io. Introducing any kind of reliable automation to a process like this is going to be a bumpy ride.

I like Ivan Sagalaev’s take:

Ever since I made an automatic publishing infrastructure for highlight.js releases… there wasn't a single time when it really worked automatically as planned!
There was always something: directory structure changes that require updates to the automation tool itself, botched release tagging, out of date dependencies on the server, our CDN partners having their own bugs with automatic updates, etc. … You can't really automate anything. You just shift maintenance from your thing to your automation tools. But! It still makes sense because by introducing automation you can do more complex things at the next level and keep maintenance essentially constant.

So that’s where the non-daylight hours of my holiday weekend went: shifting maintenance from the thing to the automation tool so OA can do more complex things. Previously, I was running the OA data process using a multi-step process:

  1. Start up a new EC2 server, stepping through the console wizard.
  2. Clone the machine code to the new server.
  3. Run chef to install all the pieces.
  4. Run openaddr-process and wait until it completes.
  5. Kill the server.

I’ve introduced a new script, openaddr-ec2-run, that pulls the steps above into just one, and learned a bunch of annoying things along the way.

On Monday, I encountered the excitement of Ruby dependency hell (“either you or the maintainer of a gem you depend on will fuck up the dependencies at some point”) for the first time when Opscode released ohai 7.6.0 and ruined a bunch of people’s days. Running an automation process that relies on external services like RubyGems or NPM can be a risky business, but on balance I prefer this type of risk to the delaying strategy of virtualual images, Docker, and friends. It’s a way to keep maintenance constant as Ivan says. Opscode fixed the problem, I removed my workaround, and only my already-frayed trust in the Ruby ecosystem was harmed.

Today, I encountered a set of finicky NPM issues connected to machine’s use of EC2’s user data shell scripts. iconv, or really node-gyp I guess, really wants HOME to be present in the environment variables and will not build if it’s not found. Fixing this took a bit of debugging with env, and I discovered some more Ruby derp along the way:

This error happens because { } has two different meanings in Ruby: Hash value expressions and method blocks. If a procedure is called in poetry mode (no parens) then there is an ambiguity if the parser encounters a { after a method name.

“Poetry mode” is a thing in Ruby, and it will fuck your shit up because Matz didn’t read PEP 20:

There should be one - and preferably only one - obvious way to do it.

Anyway.

Boto and Amazon Web Services are blessedly stable; I was able to copy-paste code from three years ago and have it just work to guess a reasonable EC2 spot request bid and start up a server instance. One interaction I introduced was the result of finding a runaway 8xlarge EC2 instance from earlier this month that I had forgotten about and continued to pay for (it adds up): the instance terminates itself when it’s done, the run script monitors the instance on a loop, and canceling the script with a KeyboardInterrupt at any point will immediately terminate the instance and cancel the reservation. Just because a computer is in the sky doesn’t mean I don’t want a convincing illusion of running it in my own terminal.

Comments

Sorry, no new comments on old posts.

September 2017
Su M Tu W Th F Sa
     

Recent Entries

  1. blog all dog-eared pages: human transit
  2. the levity of serverlessness
  3. three open data projects: openstreetmap, openaddresses, and who’s on first
  4. building up redistricting data for North Carolina
  5. district plans by the hundredweight
  6. baby steps towards measuring the efficiency gap
  7. things I’ve recently learned about legislative redistricting
  8. oh no
  9. landsat satellite imagery is easy to use
  10. openstreetmap: robots, crisis, and craft mappers
  11. quoted in the news
  12. dockering address data
  13. blog all dog-eared pages: the best and the brightest
  14. five-minute geocoder for openaddresses
  15. notes on debian packaging for ubuntu
  16. guyana trip report
  17. openaddresses population comparison
  18. blog all oft-played tracks VII
  19. week 1,984: back to the map
  20. bike eleven: trek roadie

Archives