Supporting the Liberty (fries?)
Creative Commons License photo credit: Omar Eduardo

My previous post about PBF reading successes was written way too prematurely. It turned out my PBF reading code had some serious bugs which made reading look much faster than it actually was (one of the reasons was that I neglected to read OSM node keys/values when written in PBF dense node format).

I’ve subsequently written some extensive tests, comparing OSM database contents from XML and PBF file of the same area (thanks Geofabrik) on an object by object basis, so I’m now 95% sure the PBF code works OK. Performance-wise the (final?) results are much less glamorous than it looked initially: PBF reading is “only” 2.5 times faster than reading OSM.bz2 files, while in memory consumption terms, they are pretty much the same. I curious what other OSM software like osmosis has to say about these results.

I had hoped I could speed the PBF reading by spreading the work on several processor cores. What I did is to use Microsoft’s Parallel Extensions library to separate the fetching of PBF file blocks from the actual parsing of them into two (or more) cores. This resulted in only about 10% increase of the overall speed (tested on my two-core machine, so on more cores the result could be better).

It actually proved pretty hard to do a decent job of separating work in some balanced fashion. Since the file reading is sequential, this can only be done by one thread/core, so you want to put as little other work to that core as possible. As soon as file block bytes are fetched from the file, they are delegated to another core to parse it (in terms of protocol buffers) and then extract OSM objects from it. The problem is that you don’t want to enqueue too many file blocks at the same time, since this takes up valuable memory (which is already filled with extracted OSM objects). So I ended up using a blocking queue, which means the main thread (which reads the file) will wait until at least one core is available before filling the queue with another file block.

I’ve also tried micro-management strategy – using multiple cores to extract individual OSM objects, but this only really works for ways and relations. Current PBF extracts use dense nodes format, which is delta-encoded and thus forces you to read things sequentially on a single thread of execution. I guess this is the price of having a format that wants to satisfy two different (and inherently conflicting) goals: less space and less CPU.

I’m fairly new to Parallel Extensions and there are probably better ways of handling this, but I’ll leave it for the future.

Anyway, a new Maperitive release is out, grab it from the usual place.