[balooctl] Add possibility to create a copy of the index without freelist
Needs ReviewPublic

Authored by poboiko on Nov 14 2018, 2:10 PM.

Details

Reviewers
None
Group Reviewers
Frameworks
Baloo
Summary

Currently, after the indexing is done, there is a huge difference between Actual Size and Expected Size in balooctl indexSize.
(I would estimate it as ~30%, based on personal experience). The difference because how LMDB works: it never shrinks, and if we ask
it to remove or replace something, free'd pages go to freelist (i.e. useless empty pages waiting to be reused). That wasted space is
indeed a freelist (can be checked via mdb_stat -nf ~/.local/share/baloo/index -- see Free pages).

LMDB doesn't allow to compress the DB and remove all free pages "in situ", but it allows to create a copy of the DB without freelist
(see mdb_env_copy2 with MDB_CP_COMPACT flag).

Because it creates another copy (which might eat a lot of space), I would not suggest to do it automatically, but instead add a possibility to do it manually.
This patch adds a copy command to balooctl tool, which does it. It allows to create a backup of index (either in a desired place or in temporary directory),
and its main usage is to shrink the DB, in a following way:

$ balooctl copy /tmp/index
$ balooctl stop
$ mv /tmp/index ~/.local/share/baloo/index
$ balooctl start
Test Plan

It works as expected: before copy, indexSize gave me ~500MB, after copy - 327MB
(funny, the actual size after copy is even smaller than expected size, that is 389MB).
DB seems to be fine, i.e. searching works

Diff Detail

Repository
R293 Baloo
Branch
freepages (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 4931
Build 4949: arc lint + arc unit
poboiko created this revision.Nov 14 2018, 2:10 PM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptNov 14 2018, 2:10 PM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
poboiko requested review of this revision.Nov 14 2018, 2:10 PM
bruns added a subscriber: bruns.Nov 14 2018, 6:13 PM

You are replicating mdb_copy -c here.

According to the man page:

Compact while copying. Only current data pages will be copied; freed or unused pages will be omitted from the copy. [...] Currently it fails if the environment has suffered a page leak

This is e.g. the case for the dreaded assertion in mdb_page_dirty.

There is one method to avoid the freelist issues, by doing what mdb_dump ... | mdb_load ... is doing - reading the contents from the db and writing it to a new one. This completely ignores the freelist.

You are replicating mdb_copy -c here.
[...]

Yep, I know. It's just there is no proper documentation for this issue (well, this applies for the whole baloo - T7843: Documentation), and so I just thought users are more likely to discover it if it would such a feature in balooctl (which people use sometimes, hopefully).