less expensive findByUrl in KCoreDirListerCache
AbandonedPublic

Authored by jtamate on Mar 13 2018, 11:09 AM.

Details

Reviewers
dfaure
Group Reviewers
Frameworks
Summary

Changed the Data Structure used by findByUrl from a QList to a QHash.

CCBUG: 320231

Test Plan

findByUrl was slow, for example, renaming 50.000 small files, it has to go through a list of 50.000 items 50.000 times, so renaming that number of files takes more than an hour, now it takes less time, but baloo re-scanning and dirlister re-scanning the directory doesn't help to reduce the time.
Renaming 3.000 small local files is done in less than 10 seconds.

Moving 50.000 small files from sftp://127.0.0.1/borrar to /borrar1, the first step, fetching data from the dirlister took more than 1 minute, now is instantaneous.

Deleting 30.000 small local files from dolphin takes less than a second.

Diff Detail

Repository
R241 KIO
Lint
Lint Skipped
Unit
Unit Tests Skipped
jtamate created this revision.Mar 13 2018, 11:09 AM
Restricted Application added a project: Frameworks. · View Herald TranscriptMar 13 2018, 11:09 AM
jtamate requested review of this revision.Mar 13 2018, 11:09 AM
mwolff added a subscriber: mwolff.Mar 13 2018, 3:03 PM

have you considered using a hash map instead?

jtamate updated this revision to Diff 29574.Mar 15 2018, 9:18 AM
jtamate edited the summary of this revision. (Show Details)
jtamate edited the test plan for this revision. (Show Details)

Depends on https://phabricator.kde.org/D10742, including the diff.

I'm not into the code base, just adding some comments to ensure everything is taken into account - maybe your initial attempt was better after all...

src/core/kcoredirlister.cpp
852

is it OK that you operate on a copy of the item here? the item in the hash won't be modified, is that on purpose?

1970

O(N) iteration over a large hash is extremely slow, is this done elsewhere? if so, then you may need to find an alternative approach - potentially via multiple containers or by using a sorted vector after all like you proposed initially

2532

this can be slow, see above

dfaure requested changes to this revision.Apr 28 2018, 10:37 AM

This definitely needs a benchmark. (use Q_BENCHMARK in a qtestlib-based unittest)

src/core/kcoredirlister.cpp
852

That was the case already.... hmm but not in my version of kio. It looks like this patch is on top of other patches.

This revision now requires changes to proceed.Apr 28 2018, 10:37 AM
bruns added a subscriber: bruns.May 17 2018, 5:17 PM

@jtamate - can you use arc --diff next time you create a revision, instead of using the web upload?

Not using arc results in phabricator being unable to show the context ("Context not available"), which makes reviewing considerably harder.

See https://secure.phabricator.com/T5029 for background info.

Restricted Application added a subscriber: kde-frameworks-devel. · View Herald TranscriptMay 17 2018, 5:17 PM
bruns added inline comments.May 17 2018, 6:06 PM
src/core/kcoredirlister.cpp
852

D10742 - this makes it even harder to review ...

1970

My proposal:
A few lines earlier, a KFileItemList (~QList<KFileItem) is created.
Sort this list by name.

Use a ordered container (e.g. QMap) instead of QHash.

You can iterate the QMap and the KFileItemList in parallel, which makes erasing an O(m + n) iteration, instead of O(m * n). (Of course, on top you have sorting of the KFileItemList - O(n log n) - and the overhead when inserting into the QMap<QUrl, KFileItem>)

bruns added inline comments.May 17 2018, 11:41 PM
src/core/kcoredirlister.cpp
2532

To make this more efficient, a little bit of reshuffling is needed IMHO:

  1. Move the filter implementation from KDL::Private::addNewItem(...) to KDL::Private::addNewItems(...), i.e. sort the items based on ->matchesMimeFilter(item) into two new QHashes
  2. For each partition, use QHash::unite() to merge it into the respective list
  1. Make KDL::Private::addNewItem(...) a wrapper function for KDL::Private::addNewItems(...).

About the KCoreDirLister::Private::addNewItems method, benchmarking the current and sorted list implementation with 5000 fileItems:

It only takes 1.5 msecs per iteration without filters and 1.9 msecs with a nameFilter and a mimeFilter.
For reference, doing a std::partition of a copy of the list with isItemVisible also takes 1.5 msecs per iteration without filters and 1.4 with filters.

src/core/kcoredirlister.cpp
852

The item in the hash will be refreshed, but not the item returned.
This is not called when a rename is detected, therefore the key(url) is the same.

jtamate abandoned this revision.Oct 10 2018, 6:42 AM