Understanding and reducing bad performance impact of running Baloo under certain conditions
Open, Needs TriagePublic

Description

While the overall design of Baloo seems quite elaborated and well done, and it uses almost any possible kernel feature already to reduce its priority, but still there are some issues how it is negatively impacting real world configurations in the wild.

This task should track some of the observed issues and help understanding it:

  • IO thrashing in low-mem situations D24540
  • Re-indexing files on every reboot due to unstable DocId: Affects multiple filesystems (i.e., btrfs), some filesystem even have unstable inode numbers and baloo should blacklist such filesystems (it does already for tmpfs) BUG:404057 T9805 T11861
  • Kernel memory management overhead due to huge mmap and IO behavior (but there's no real way around it currently) D24540 T9873
  • High-impact transaction behavior (40 files/trx may have a high memory overhead while less files/trx increase the fsync/fdatasync overhead), a more dynamic handling could fix this T9873 BUG:400704 BUG:356357
  • Database design issues (that's always a difficult balance between DB size, lookup latency, and memory usage at both index time and search time) T9805
  • General memory overhead or leaks which can accumulate over time and cause crashes D24502 T9873

Discussion of the individual problems already exists in some other tasks or changesets which can be linked here. This should mostly be seen as an entry point into all the related topics so we hopefully avoid any duplicate efforts.

New ideas and references can be added to the list above and followed up in individual threads.

Some notes taken later (from T13298):

All in all, a lot of investigation has already been done, and most of your ideas have already been implemented almost to perfection when looking at the source code. But a few things stand out:

  1. Baloo cannot work correctly with filesystems that use unstable device numbers, and it is also build around a wrong assumption about device and inode numbers: POSIX does not claim these to be stable across reboots or even umount/mount. NFS as a virtual filesystem (it has no block device associated with it) is problematic here as its device number changes with every mount. This confused Baloo: It reindexes the same files all over again after each remount because the device id changed, but it doesn't purge the files associated to the previous device id from the database because it cannot know if those files are only temporarily unavailable. So the database only grows with every reboot and easily surpasses your RAM size (which is not too bad but read below). The same problem applies to btrfs and ZFS which do not use physical block devices but map their pool of partitions through virtual block devices: The device ids are not stable across reboots. Similar problems may exist for file systems with potentially unstable inode numbers (probably FAT file systems or maybe also NTFS). So all in all, Baloo is currently only compatible with ext and xfs, maybe a few others which are less common.
  1. After having outlined (1), let's see what could be fixed: Baloo should migrate to using its own device id database. That could be easily coded by adding another table to the database which stores a mapping of device id to filesystem uuid. The nice aspect of this approach is that we could just map the currently known device ids at table creation time and all the database entries stay compatible. Newly added devices would then allocate a free device number from this table and remember the uuid, reappearing device would match the uuid and use the device id mapped in the table. This will fix all file systems with unstable device ids (according fstat call). But it will not fix file systems with unstable inode numbers. Also, there's a problem with modern filesystems using 64 bit inode numbers: Baloo only has 32bit (device id) + 32bit (inode id) for file mapping, it currently just discards the upper 32bit if they exist. This is okay most of the times but creates a new set of problems.
  1. While the Baloo database is using lmdb and that database system is very powerful, it is not tuned optimally when used as a background service. Actually, I don't believe lmdb was designed to work as a background system but all it's other properties are very well designed. I already submitted patches to tune some of the parameters, i.e. disable read-ahead as it is mostly useless for Baloo and how it uses the database. This reduces file system cache thrashing. And actually, that is what you want to care most about: Baloo should be able to tell the system early that it no longer needs some data in cache.
  1. When the Baloo database grows very large, this may become problematic for the kernel. While lmdb actually doesn't really use memory (it's memory mapped and works more like the swap file), it still maintains a good amount of page table space. Especially in combination with components that do a lot of memory allocations (like btrfs), this introduces a vast amount of micro latencies and puts the memory manager under pressure. We will probably also see high memory fragmentation due to a lot of small allocations needed from lmdb. Memory compaction, when needed and running, will be observed as severely affecting interactive response of the system. And it may even lead to early out of memory situations for some hardware that needs contiguous memory allocations (mostly GPU applications). So it's important to keep the memory footprint of lmdb as low as possible. But that's not really possible as lmdb needs to map the whole database all the time. But it may be possible to start with a small database file and dynamically grow it when running out of space. But lmdb is not designed to do this while the database is opened, and Baloo probably needs some redesign to do this as far as I understood the source code.

So, all in all, that's a lot more complex than I maybe thought, and most of the points you are suggesting are already implemented. The devil is in the details, like "nice" doing not exactly that what you think it's doing on modern kernels, or memory latency issues, or unstable ids... Actually, when getting Baloo under control (correct filesystem, reduced database size, much RAM to reduce early latency from memory fragmentation) you'd probably not even notice it is running. Memory consumption itself isn't really the problem, I tried it, it's not solved by reducing the memory used by Baloo. The problems come from its interaction with cache, readahead and memory fragmentation. It would probably be easier to solve this with a database system that locks a fixed memory block into memory and only operates within the window (by shifting the mmap around in the file). There are database with very similar design to lmdb that could do that but they have different problems, and it's a lot of effort before even reaching a point where you could compare both implementations and test against each other. So we probably should make the best out of lmdb.

hurikhan77 updated the task description. (Show Details)Oct 11 2019, 6:09 PM
ngraham added subscribers: Baloo, bruns, Frameworks.
hurikhan77 updated the task description. (Show Details)Oct 11 2019, 6:14 PM
hurikhan77 updated the task description. (Show Details)
hurikhan77 renamed this task from Reduce bad performance impact of running Baloo under certain conditions to Understanding and reducing bad performance impact of running Baloo under certain conditions.Oct 11 2019, 6:18 PM
hurikhan77 updated the task description. (Show Details)
hurikhan77 updated the task description. (Show Details)Oct 11 2019, 6:25 PM
hurikhan77 updated the task description. (Show Details)Oct 11 2019, 11:17 PM
hurikhan77 updated the task description. (Show Details)Oct 12 2019, 1:00 PM
hurikhan77 updated the task description. (Show Details)Jun 20 2020, 1:51 AM
hurikhan77 updated the task description. (Show Details)