tecznotes

Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Aug 12, 2010 1:30am

census-tools

The U.S. Census publishes an astonishing volume of data, notably with the most recent 2000 count. The demographic data contained in each of the summary files is precise, detailed, and distributed in a difficult-to-understand text format. The documentation for summary file #1 alone (race, age, sex) is a 637 page PDF file, and the actual data is stored in a maze of zip files all alike.

I've poked at these before, but I recently got a bee in my bonnet about making them available in a more useful form so they could be mapped. I talked to Josh Livni (of Land Summary) quite a while back about his plans for a demographic summary site that would store everything in a database in the cloud. Then Amazon made it available as a public dataset. Still I was not satisfied - both approaches to handling the data seemed a bit ocean-boiling in retrospect.

I've been experimenting with something I'm tentatively calling census-tools that seeks to make this data a bit more accessible. I'm motivated by the idea that predictably-structured zip files stored on a web server and accessed with Python's excellent stream-handling libraries might actually be considered quite a good API, so the first tool in the repository proceeds from there. It does a very simple thing: given an optional U.S. state, a geographic summary level (e.g. census tract or county), and a type of data, it unzips those remote files into memory and converts them to a tab-separated values file.

Here's an example:

python census2text.py ––verbose ––wide ––state=Hawaii ––geography=county ––table=P18 ––output=hawaii-households.txt

It outputs a chatty text file of household data for every county in Hawaii into a file called hawaii-households.txt. It takes about a minute to churn through a 2.8MB zip file and output the results. Omitting the state name gets you every county in the U.S. in about 20 minutes:

python census2text.py ––verbose ––wide ––geography=county ––table=P18 ––output=national-households.txt

I tested with Hawaii because it's small, and immediately discovered the strangely underpopulated Kalawao County:

The county is coextensive with the Kalaupapa National Historical Park, and encompasses the Kalaupapa Settlement where the Kingdom of Hawai'i, the territory, and the state once exiled persons suffering from leprosy (Hansen's disease) beginning in the 1860s. The quarantine policy was lifted in 1969, after the disease became treatable on an outpatient basis and could be rendered non-contagious. However, many of the resident patients chose to remain, and the state has promised they can stay there for the rest of their lives. No new patients, or other permanent residents, are admitted. Visitors are only permitted as part of officially sanctioned tours. State law prohibits anyone under the age of 16 from visiting or living there.

Fascinating.

Anyway, this small amount of information can be quite hard to get to. Between the impenetrable formatting of the geographic record files, the bewildering array of different kinds of geographic entities, and the depth of geographic minutiae, it can take quite a bit of head-scratching to extract even the first bits of information from the U.S. Census.

I hope this first tool makes it a little bit less of a hassle. I'd accept whatever patches people choose to offer: support for summary files beyond SF1, additional geograph summary levels, general patches, and more.

March 2017
Su M Tu W Th F Sa
   
 

Recent Entries

  1. district plans by the hundredweight
  2. baby steps towards measuring the efficiency gap
  3. things I’ve recently learned about legislative redistricting
  4. oh no
  5. landsat satellite imagery is easy to use
  6. openstreetmap: robots, crisis, and craft mappers
  7. quoted in the news
  8. dockering address data
  9. blog all dog-eared pages: the best and the brightest
  10. five-minute geocoder for openaddresses
  11. notes on debian packaging for ubuntu
  12. guyana trip report
  13. openaddresses population comparison
  14. blog all oft-played tracks VII
  15. week 1,984: back to the map
  16. bike eleven: trek roadie
  17. code like you don’t have the time
  18. projecting elevation data
  19. the bike rack burrito n’ beer box
  20. a historical map for moving bodies, moving culture

Archives