I was chatting with Sha this weekend about how to get data out of OpenStreetMap and into a database, and realized that it’s possible no one’s really explained a full range of current options in the past few years. Like a lot of things with OSM, information about this topic is plentiful but rarely collected in one place, and often consists of half-tested theories, rumors, and trace amounts of accurate fact based on personal experience earned in a hurry while at the point of loaded yak.
At first glance, OSM data and Postgres (specifically PostGIS) seem like a natural, easy fit for one another: OSM is vector data, PostGIS stores vector data. OSM has usernames and dates-modified, PostGIS has columns for storing those things in tables. OSM is a worldwide dataset, PostGIS has fast spatial indexes to get to the part you want. When you get to OSM’s free-form tags, though, the row/column model of Postgres stops making sense and you start to reach for linking tables or advanced features like hstore.
There are three basic tools for working with OSM data, and they fall along a continuum from raw data on one end to render-ready map linework on the other.
Osmosis is the granddaddy of OpenStreetMap tools:
The tool consists of a series of pluggable components that can be chained together to perform a larger operation. For example, it has components for reading from database and from file, components for writing to database and to file, components for deriving and applying change sets to data sources, components for sorting data, etc.
Mostly Osmosis is used to retrieve full planet dumps or local extracts of OSM data and convert it to various formats, strip it of certain features, or merge multiple parts together. Osmosis can write to Postgres using a snapshot schema, with nodes stored as real PostGIS point geometries and ways optionally stored as Linestrings. OSM tags are stored in hstore columns as key/value mappings. User information goes into a separate table.
The Osmosis snapshot schema is great if you want:
- Original nodes and you don’t care what streets they’re a part of.
- Metadata about OSM such as owners of particular ways.
- Downstream extracts from your database.
Snapshot schema tables: nodes (geometry), ways (optional geometry), relation_members, relations, schema_info, users, way_nodes.
I’m fairly new to Osmosis myself, just starting to get my head around its pitfalls.
Osm2pgsql is in common use for cartography, “often used to render OSM data visually using Mapnik, as Mapnik can query PostgreSQL for map data.” Osm2pgsql is lossy, and you can tweak its terribly-named “style file” to decide which original tags you want to keep for rendering. Tags are converted into Postgres column names, with newer versions of Osm2pgsql also supporting the use of hstore to save complete tag sets. Data ends up in three main tables, grouped into points, lines and polygons.
Osm2pgsql will get you:
- An easy, no-configuration direct import organized by geometry type.
- Reasonably fast selections and renders by Mapnik.
- Automatic spherical mercator projection.
- A list of affected tiles if you use it on a changeset.
I’ve never successfully run Osm2pgsql without using the “slim” option, which stores additional data in Postgres and makes it possible apply changes later. Osm2pgsql tables: point (geometry), line (geometry), polygon (geometry), nodes (data), rels (data), ways (data), roads (never used this one).
I’ve included ready-to-use shapefiles of Osm2pgsql data in the Metro Extracts.
Imposm is at the far end of the scale, producing topic-specific tables that you define.
Imposm is an importer for OpenStreetMap data. It reads XML and PBF files and can import the data into PostgreSQL/PostGIS databases. It is designed to create databases that are optimized for rendering/WMS services.
Where Osm2pgsql gets you a single table for all linestrings, Imposm can give you a table for roads, another for rail, a third for transit points, and so on. It does this by using a mapping file written in Python which selects groups of tags to map to individual tables. Imposm can perform a ton of upfront work to make your downstream Mapnik rendering go faster, and requires significantly more data preparation effort if you want to depart from default schemas. I’ve been working on an all-purpose mapping based on High Road just to make this easier. The cost of upfront work is tempered somewhat by Imposm’s use of concurrency and its separation of database population from deployment, something you normally do manually in Osm2pgsql.
Imposm is great for:
- Faster, smaller, more render-ready data.
- Automatic simplified geometry tables.
Use it if you’re working on cartography and already know how you want to map OSM data. I’ve had good experiences using Osm2pgsql together with Imposm, building up complex queries from generic Osm2pgsql data and then translating them into Imposm mappings. I’ve also been using it to perform street name abbreviations at import time, greatly improving the visual quality of road names in final maps.
Sample tables from my own mapping: green_areas, places, buildings, water_areas, roads and rails divided up by zoom level.
I’ve included ready-to-use shapefiles of Imposm data in the Metro Extracts.
Use Osmosis if you’re interested in data. Start with Osm2pgsql if you want to render maps. Move to Imposm once you know exactly what you want.
Nice article. By reading the OSM wiki, I knew that osm2pgsql was going to be the shortest path to start rendering. Someday I'll get into Osmosis.
Imposm is great, but the killer limitation is the lack of diff update support. So if you design a mapnik style based on Imposm tables, head towards a production environment and then ever think about doing (minutely, hourly, daily) data updates rather than full database reloads, it'll come as a nasty surprise! There's been various talk over the years of improving the customisability of osm2pgsql outputs - the style file and lack of output table control (especially regarding relations) is beginning to show its age. But I don't know whether it is easier to improve osm2pgsql, or to get diff update support built into Imposm.
Good overview of some of the tools out there to load OSM data into PostGIS. Another tool to mention is GO Loader, that we develop here at Snowflake in the UK. GO Loader is aimed at loading large chunks of raw OSM data (countries, continents, planet) at high speeds. Unlike some other solutions, GO Loader has a point&click GUI which makes it easy to use for people not very familiar with command line interfaces. After initially loading the OSM data, GO Loader builds the geometries and does a bit of data cleaning as well, particularly on the polygons. To get around some of the issues with storing the tag/key values, users can select just the tag/key values that they are interested in.
Good point about the changesets, Andy. I suspect that diff support added to Imposm would be identical to that found in Osm2pgsql: a set of additional tables with node/way/rel IDs and their relationships so it’s easy to see what needs to change. In practice, I’ve never even used diff support in Osm2pgsql so I haven't noticed the lack of it in Imposm. Richard: GO Loader looks pretty good!
Thanks for this write up Michal I'm just getting in to this stuff and as you say information is scattered around but hard to put together, as such this is a hugely useful. Do you know any good places to discuss this kind of thing i.e. is there a mailing list or a forum where I can ask stupid questions?
Great overview Mike. I second Andy. I've seen imposm and appears to have a lot going for it and clearer docs (it supports PBF, I don't think osm2pgsql does) but the lack of diff support is holding me back from using it day to day. Tom, the best places for support on osm2pgsql are #osm and #osm-dev on openstreetmap's IRC server - irc.oftc.net dev mailing list http://lists.openstreetmap.org/listinfo isn't a bad idea either.
Sorry, no new comments on old posts.