Changed the Data Structure used by findByUrl from a QList to a QHash.
CCBUG: 320231
dfaure |
Frameworks |
Changed the Data Structure used by findByUrl from a QList to a QHash.
CCBUG: 320231
findByUrl was slow, for example, renaming 50.000 small files, it has to go through a list of 50.000 items 50.000 times, so renaming that number of files takes more than an hour, now it takes less time, but baloo re-scanning and dirlister re-scanning the directory doesn't help to reduce the time.
Renaming 3.000 small local files is done in less than 10 seconds.
Moving 50.000 small files from sftp://127.0.0.1/borrar to /borrar1, the first step, fetching data from the dirlister took more than 1 minute, now is instantaneous.
Deleting 30.000 small local files from dolphin takes less than a second.
Lint Skipped |
Unit Tests Skipped |
I'm not into the code base, just adding some comments to ensure everything is taken into account - maybe your initial attempt was better after all...
src/core/kcoredirlister.cpp | ||
---|---|---|
852 | is it OK that you operate on a copy of the item here? the item in the hash won't be modified, is that on purpose? | |
1970 | O(N) iteration over a large hash is extremely slow, is this done elsewhere? if so, then you may need to find an alternative approach - potentially via multiple containers or by using a sorted vector after all like you proposed initially | |
2532 | this can be slow, see above |
This definitely needs a benchmark. (use Q_BENCHMARK in a qtestlib-based unittest)
src/core/kcoredirlister.cpp | ||
---|---|---|
852 | That was the case already.... hmm but not in my version of kio. It looks like this patch is on top of other patches. |
@jtamate - can you use arc --diff next time you create a revision, instead of using the web upload?
Not using arc results in phabricator being unable to show the context ("Context not available"), which makes reviewing considerably harder.
See https://secure.phabricator.com/T5029 for background info.
src/core/kcoredirlister.cpp | ||
---|---|---|
852 | D10742 - this makes it even harder to review ... | |
1970 | My proposal: Use a ordered container (e.g. QMap) instead of QHash. You can iterate the QMap and the KFileItemList in parallel, which makes erasing an O(m + n) iteration, instead of O(m * n). (Of course, on top you have sorting of the KFileItemList - O(n log n) - and the overhead when inserting into the QMap<QUrl, KFileItem>) |
src/core/kcoredirlister.cpp | ||
---|---|---|
2532 | To make this more efficient, a little bit of reshuffling is needed IMHO:
|
About the KCoreDirLister::Private::addNewItems method, benchmarking the current and sorted list implementation with 5000 fileItems:
It only takes 1.5 msecs per iteration without filters and 1.9 msecs with a nameFilter and a mimeFilter.
For reference, doing a std::partition of a copy of the list with isItemVisible also takes 1.5 msecs per iteration without filters and 1.4 with filters.
src/core/kcoredirlister.cpp | ||
---|---|---|
852 | The item in the hash will be refreshed, but not the item returned. |