[FilteredDirIterator] Combine all suffixes into one large RegExp
ClosedPublic

Authored by bruns on Tue, Jun 4, 6:08 PM.

Details

Summary

Most of the about 50 default filters are suffix matches, which can be
easily combined into one large regular expression. Doing so reduces
the matching cost by about 80%.

Test Plan

valgrind --tool=callgrind baloo_file

Diff Detail

Repository
R293 Baloo
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
bruns created this revision.Tue, Jun 4, 6:08 PM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptTue, Jun 4, 6:08 PM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
bruns requested review of this revision.Tue, Jun 4, 6:08 PM
bruns updated this revision to Diff 59148.Tue, Jun 4, 6:13 PM

replace one QLatin1String with QStringLiteral

Would it be make more sense to just have a single file type regex in the config file rather than parsing the existing data and re-processing it in the code?

bruns added a comment.Tue, Jun 4, 11:01 PM

Would it be make more sense to just have a single file type regex in the config file rather than parsing the existing data and re-processing it in the code?

The config is quite messy in this regard - there are actually two distinct filters in the sources, one for directories and one for files. Currently, both are fed into the same regex, and both are merged into one config key. Also, the config does not diffferentiate between filters deleted from the defaults, or filters added to the defaults, everything is flattened into one config key. This makes config updates quite problematic.

I think having file globs in the config is fine, converting these into regexes wouldn' t be a benefit. Converting the globs into regexes is a insignificant overhead, it is just an implementation detail. Doing the conversion in a low overhead way is nice, but out of scope for the config file syntax.

Ideally, the filter were "default - deleted + added", instead of a flattened list (the config would only save deleted and added filters). File globs are sufficient here.

ngraham accepted this revision.Wed, Jun 5, 2:43 AM

All right, sounds good for now then. We can refactor the config file later.

This revision is now accepted and ready to land.Wed, Jun 5, 2:43 AM
This revision was automatically updated to reflect the committed changes.

This change unfortunately does not compile on any platform tracked by the CI system.
Please see https://build.kde.org/job/Frameworks/job/baloo/

This is something which needs correcting relatively urgently, as without it running any Dependency Build job is impossible.

bruns added a comment.Thu, Jun 6, 1:37 PM

This change unfortunately does not compile on any platform tracked by the CI system.
Please see https://build.kde.org/job/Frameworks/job/baloo/

This is something which needs correcting relatively urgently, as without it running any Dependency Build job is impossible.

Its always nice to see how easy it is to merge the wrong branch with phabricator ...

This fixes it for me:

diff --git a/src/file/regexpcache.cpp b/src/file/regexpcache.cpp
index f47757ff..dce5cdcb 100644
--- a/src/file/regexpcache.cpp
+++ b/src/file/regexpcache.cpp
@@ -78,9 +78,9 @@ void RegExpCache::rebuildCacheFromFilterList(const QStringList& filters)
     }
 
     // Combine all suffixes into one large RE: "^.*(foo|bar|baz)$"
-    auto suffixMatch = QLatin1String("^.*\\.(");
-    suffixMatch += suffixes.join(QChar('|'));
-    suffixMatch += QLatin1String(")$");
+    auto suffixMatch = QLatin1String("^.*\\.(")
+        + suffixes.join(QChar('|'))
+        + QLatin1String(")$");
     qCDebug(BALOO) << suffixMatch;
     m_regexpCache.prepend(QRegularExpression(suffixMatch));
 }