UPDATE: the post below was based on premature assumptions that my new PBF code is actually working. It turns out it had a number of serious bugs which made reading look faster than it actually is. Here’s a followup post.

For the last couple of days I’ve been working on a PBF file reader for Maperitive. PBF file is a binary file for storing OSM geo data using Google’s protocol buffers.

It’s been a steep learning curve, since I had to learn three things at the same time: protocol buffers, using protobuf-net library for .NET and understanding the PBF format. I’m mostly satisfied with the protobuf-net library, although the lack of any new development activity worries me a little bit.

I’ve finished most of the PBF reading stuff this evening and I was eager to test the new code against the old XML reader. I’ve used Geofabrik’s Denmark data, here are some rough results:

  • PBF file loads 7.6 times quicker than the .OSM.bz2 file. This is a really good result, mostly thanks to the way the PBF format has been designed.
  • Loading of PBF data uses a quarter less memory than the XML file. I’m talking about the memory used in the process of loading, not for storing the loaded OSM data – the data is internally stored in the exactly same way both for PBF and XML reading. This result surprised me a bit, I guess the extra memory consumed by the XML reader is due to the XML parser itself and/or the fact that a lot more strings are generated when reading XML OSM tags. PBF uses string tables and thus saves a lot of space by reusing common strings.