Resolve symlinks in exclude folders
Needs ReviewPublic

Authored by poboiko on Nov 14 2018, 3:14 PM.

Details

Reviewers
None
Group Reviewers
Frameworks
Baloo
Summary

Assume user has ~/stuff folder, that is symlinked to i.e. /storage/stuff.
User don't want it to be indexed, so he adds ~/stuff in the KCM. However, this entry gets silently ignored when indexer
runs over /storage (trivially because it does not match ~/stuff), resulting to excluded folder being indexed.

Instead I propose to resolve symlinks right when loading exclude folder list from config.

Test Plan

It compiles, it seems to be working

Diff Detail

Repository
R293 Baloo
Branch
resolve-exclude-symlinks (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 4935
Build 4953: arc lint + arc unit
poboiko created this revision.Nov 14 2018, 3:14 PM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptNov 14 2018, 3:14 PM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
poboiko requested review of this revision.Nov 14 2018, 3:14 PM
bruns added a subscriber: bruns.Nov 14 2018, 5:55 PM

IMHO we should just disallow specifying symlinks in both include/excludeFolders. The user can just use exludeFolders = /storage/stuff if he wants to exclude it.

IMHO we should just disallow specifying symlinks in both include/excludeFolders. The user can just use exludeFolders = /storage/stuff if he wants to exclude it.

Why not? This kind of (partial) symlink support is quite easy to implement. And the more restrictions users have, the worse their experience is.
(I've just stumbled upon this issue randomly, when added some folder - not even thinking at that time that it is a symlink - and, after some time, found out that files there were indexed and my exclusion rule was silently ignored)

bruns added a comment.Nov 15 2018, 3:17 PM

Because it can never be consistent.
What happens when I create two symlinks to the same folder, and put one link into includeFolders, the other one in excludeFolders?

What really should happen, the indexer should never follow symlinks, but only add files by their canonical path. This avoids a bunch of problems, symlink loops, nondeterministic pathes for files when these are added to the index, ...

I believe can do something better here.
I think if we stick to canonical paths everywhere, and resolve symlinks ASAP (but still follow them), that might solve all the problems.

Because it can never be consistent.
What happens when I create two symlinks to the same folder, and put one link into includeFolders, the other one in excludeFolders?

Here the behavior would be the same as if user just add the same folder to both lists. Not quite sure how it works now (I guess, one of two rules will just pop first).
But here the undefined behavior would be acceptable, because it seems like user tried to shoot his leg intentionally :)

What really should happen, the indexer should never follow symlinks, but only add files by their canonical path. This avoids a bunch of problems, symlink loops, nondeterministic pathes for files when these are added to the index, ...

As for symlink loops, QDirIterator with FollowSymlinks seems to be able to handle this nicely (at least according to Qt docs).
As for nondeterministic pathes - just add canonical path, and that's it.

The only problem I see here is that if we try baloosearch -d folder/ and inside there will be symlink folder/subfolder -> /somewhere/else, we won't find any results inside subfolder.
This is less trivial, to work with it we need to store the whole FS graph (which is not tree anymore) with symlinks. But for now we can live without it, I suppose.