Published by breki on 18 Jan 2011 at 11:03 pm
UPDATE: the post below was based on premature assumptions that my new PBF code is actually working. It turns out it had a number of serious bugs which made reading look faster than it actually is. Here’s a followup post.
It’s been a steep learning curve, since I had to learn three things at the same time: protocol buffers, using protobuf-net library for .NET and understanding the PBF format. I’m mostly satisfied with the protobuf-net library, although the lack of any new development activity worries me a little bit.
I’ve finished most of the PBF reading stuff this evening and I was eager to test the new code against the old XML reader. I’ve used Geofabrik’s Denmark data, here are some rough results:
PBF file loads 7.6 times quicker than the .OSM.bz2 file. This is a really good result, mostly thanks to the way the PBF format has been designed. Loading of PBF data uses a quarter less memory than the XML file. I’m talking about the memory used in the process of loading, not for storing the loaded OSM data – the data is internally stored in the exactly same way both for PBF and XML reading. This result surprised me a bit, I guess the extra memory consumed by the XML reader is due to the XML parser itself and/or the fact that a lot more strings are generated when reading XML OSM tags. PBF uses string tables and thus saves a lot of space by reusing common strings.