The U.S. Census publishes an astonishing volume of data, notably with the most recent 2000 count. The demographic data contained in each of the summary files is precise, detailed, and distributed in a difficult-to-understand text format. The documentation for summary file #1 alone (race, age, sex) is a 637 page PDF file, and the actual data is stored in a maze of zip files all alike.
I've poked at these before, but I recently got a bee in my bonnet about making them available in a more useful form so they could be mapped. I talked to Josh Livni (of Land Summary) quite a while back about his plans for a demographic summary site that would store everything in a database in the cloud. Then Amazon made it available as a public dataset. Still I was not satisfied - both approaches to handling the data seemed a bit ocean-boiling in retrospect.
I've been experimenting with something I'm tentatively calling census-tools that seeks to make this data a bit more accessible. I'm motivated by the idea that predictably-structured zip files stored on a web server and accessed with Python's excellent stream-handling libraries might actually be considered quite a good API, so the first tool in the repository proceeds from there. It does a very simple thing: given an optional U.S. state, a geographic summary level (e.g. census tract or county), and a type of data, it unzips those remote files into memory and converts them to a tab-separated values file.
Here's an example:
python census2text.py ––verbose ––wide ––state=Hawaii ––geography=county ––table=P18 ––output=hawaii-households.txt
It outputs a chatty text file of household data for every county in Hawaii into a file called hawaii-households.txt. It takes about a minute to churn through a 2.8MB zip file and output the results. Omitting the state name gets you every county in the U.S. in about 20 minutes:
python census2text.py ––verbose ––wide ––geography=county ––table=P18 ––output=national-households.txt
I tested with Hawaii because it's small, and immediately discovered the strangely underpopulated Kalawao County:
The county is coextensive with the Kalaupapa National Historical Park, and encompasses the Kalaupapa Settlement where the Kingdom of Hawai'i, the territory, and the state once exiled persons suffering from leprosy (Hansen's disease) beginning in the 1860s. The quarantine policy was lifted in 1969, after the disease became treatable on an outpatient basis and could be rendered non-contagious. However, many of the resident patients chose to remain, and the state has promised they can stay there for the rest of their lives. No new patients, or other permanent residents, are admitted. Visitors are only permitted as part of officially sanctioned tours. State law prohibits anyone under the age of 16 from visiting or living there.
Anyway, this small amount of information can be quite hard to get to. Between the impenetrable formatting of the geographic record files, the bewildering array of different kinds of geographic entities, and the depth of geographic minutiae, it can take quite a bit of head-scratching to extract even the first bits of information from the U.S. Census.
I hope this first tool makes it a little bit less of a hassle. I'd accept whatever patches people choose to offer: support for summary files beyond SF1, additional geograph summary levels, general patches, and more.
Have you considered to access the 2000 U.S. Census using Linked Data? http://www.rdfabout.com/demo/census/
To be honest, I haven't. This all seemed easier and more fun than consuming RDF. Also, the Census data here isn't really "linked", or if it is that's not really its primary characteristic. It's plain old tables, but they're very big and need to be filtered.
Any 2010 plans?
Ben I'd love to know what they have in store for publishing! I'm told there will be Census people at this weekend's State Of The Map conference in Atlanta, I plan to get them drunk and find out. =D
Any update from SOTMUS re: Census 2010 data publishing plans? We're in the midst of converting our 2010 Census Hard To Count mapping site (http://www.censushardtocountmaps.org/) to one that focuses on post-2000 demographic change, and are casting about for better approaches. Things like Census Tools, TileStache, and Polymaps are all quite intriguing.
Sorry, no new comments on old posts.