Most of the about 50 default filters are suffix matches, which can be
easily combined into one large regular expression. Doing so reduces
the matching cost by about 80%.
Details
- Reviewers
ngraham poboiko - Group Reviewers
Baloo - Commits
- R293:3282105c6cc6: [FilteredDirIterator] Combine all suffixes into one large RegExp
valgrind --tool=callgrind baloo_file
Diff Detail
- Repository
- R293 Baloo
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Would it be make more sense to just have a single file type regex in the config file rather than parsing the existing data and re-processing it in the code?
The config is quite messy in this regard - there are actually two distinct filters in the sources, one for directories and one for files. Currently, both are fed into the same regex, and both are merged into one config key. Also, the config does not diffferentiate between filters deleted from the defaults, or filters added to the defaults, everything is flattened into one config key. This makes config updates quite problematic.
I think having file globs in the config is fine, converting these into regexes wouldn' t be a benefit. Converting the globs into regexes is a insignificant overhead, it is just an implementation detail. Doing the conversion in a low overhead way is nice, but out of scope for the config file syntax.
Ideally, the filter were "default - deleted + added", instead of a flattened list (the config would only save deleted and added filters). File globs are sufficient here.
This change unfortunately does not compile on any platform tracked by the CI system.
Please see https://build.kde.org/job/Frameworks/job/baloo/
This is something which needs correcting relatively urgently, as without it running any Dependency Build job is impossible.
This fixes it for me:
diff --git a/src/file/regexpcache.cpp b/src/file/regexpcache.cpp index f47757ff..dce5cdcb 100644 --- a/src/file/regexpcache.cpp +++ b/src/file/regexpcache.cpp @@ -78,9 +78,9 @@ void RegExpCache::rebuildCacheFromFilterList(const QStringList& filters) } // Combine all suffixes into one large RE: "^.*(foo|bar|baz)$" - auto suffixMatch = QLatin1String("^.*\\.("); - suffixMatch += suffixes.join(QChar('|')); - suffixMatch += QLatin1String(")$"); + auto suffixMatch = QLatin1String("^.*\\.(") + + suffixes.join(QChar('|')) + + QLatin1String(")$"); qCDebug(BALOO) << suffixMatch; m_regexpCache.prepend(QRegularExpression(suffixMatch)); }