This was discussed on kdevelop-devel initially.
In a nutshell, I am not particularly charmed by the idea that a key/value database of potentially thousands of pairs is stored using individual files in a single directory, in addition using as many mmaps. A comment in the code itself claims that this may be problematic on some platforms. As I said in that ML thread, I am usually all for simple databases that use the filesystem, but get an uncomfortable feeling when that involves this many files (around 6000 for KDevelop's source tree itself). In an application that is supposed to handle lots of files as its main purpose I'd prefer a solution that uses only a single file, without any hard proof that this is better.
On my question if there had been a recent assessment of how suitable modern simple key/value databases are as a backend for this feature, Sven said
I think Milian investigated this once and concluded that no database on the market can do the lookups we need to do in an efficient-enough manner. Also, this is a huge amount of work to change. That said, if there is a solution with the same or superior performance and somebody is willing to do the porting, I'm all for it ...
I had been looking for an excuse to get some experience with database libraries like LMDB so started to work on an approach that would hopefully not be a "huge amount of work". The result is attached, with working backend implementations for the original QFile-based approach, LMDB (with the lmdbxx C++ bindings and additional LZ4 value compression), LevelDB and (LZO-compressed) KyotoCabinet (the latter 2 deactivated).
The backend implementation is based on the QFile API, with a very thin wrapper around QFile or more complex classes for the 3 database backends; the desired backend is currently selected via a preprocessor token in topducontextdynamicdata_p.h (requiring a rebuild of only 2 files from KDevPlatformLanguage).
I've been doing extensive testing in real-life and using the test_topcontextstore unittest/benchmark, and think the patch is now at a stage where it can be presented to the world as (at least) a proof of concept.
In practice, the LMDB backend corresponds best to what I was hoping to achieve: minimal overhead compared to the existing implementation (at least on Linux) with potentially an increase in disk efficiency (in addition to using a single file) and possibly a performance gain on slow media. In short, comparing to the file-based backend: writing TopContexts is slower but not noticeably so in real life; reading is comparably fast (even in the benchmark) on local storage. On slow (remote) storage, writing can be comparably fast while reading is as fast as on local storage. (I'm not really certain how to explain that, btw.)
More complete benchmark results (on Mac and Linux) are included in test_topcontextstore.cpp. The LZ4 compression gives up to a 2x decrease in average of the stored value sizes with a negligible impact on performance (probably because the benchmark first fills the database file with uncompressable random values).
Re: LMDB and NFS: there is a known issue with LMDB on NFS shares. As far as I understand, this concerns different use cases (shared databases) than the one at hand (a private databased used by a single application at a time). The unittest has not shown any problems with XDG_CACHE_HOME points to a directory on an NFS share.
I have not tested LevelDB extensively as it can use multiple (many) files too. In addition, LevelDB builds with TCMalloc by default which can (and has) lead to issues (deadlocks).
I looked into KyotoCabinet because the LevelDB performance notes mention it can be faster for random read access. In my implementation it does not at all live up to that reputation.
For now I have been keeping those backend implementations for reference (cum potential "junk DNA").