After some discussion with Martin Steigerwald, he managed to have a great idea without knowing:
We could add another table to the database that maps FS UUIDs to a monotonically increasing counter, let's call it "FsUuidMappingDB". It will use a FS UUID as key, and the counter as value. Each time we detect a previously unknown UUID, we will increase the counter by one and write a new mapping.
When we now scan for files, we can use the 64-bit inode number XOR'ed with the bit-reversed UUID-mapping value to create a DocID. I'm not sure how likely it is to create any collision at some point but currently I think it would be much better than using only 32-bit values with unstable 32-bit st_dev numbers. Thanks to using bit-reversed values and XOR, with each newly discovered filesystem, we would cut our ino namespace only into half. On a typical system this means we would maybe loose 3-4 bits for the most important filesystems to be indexed. Even if someone swaps a lot of portable disks and those would be indexed by baloo, it is unlikely to collide early because the least changing bits of the counter would only intefere with the oldest inode numbers which may no longer be used at all by the system (because due to file changes, rewrite, package updates, all those inodes have been replaced).
T9805 T8066 T8054
**Update:** Encoding a device ID as outlined above doesn't work quite well because many functions in Baloo currently expect they can do a reverse mapping. Maybe it's better to expand the ID storage to 128 bits. The reverse lookup just compares the device id to the mounted file systems but `st_dev` may be unstable across reboots, and even between unmounts and remounts.