get rid of the raw KFileItem pointers in KCoreDirListerCache
ClosedPublic

Authored by jtamate on Feb 22 2018, 12:25 PM.

Details

Summary

Do a TODO introduced in 4b498196899223692e8a7d334618b2874bb6c77e in 2014.
Don't depend on the internal memory layout of the containers, and get rid of NonMovableFileItem.
Use the new 2 operands < of KFileItem to sort by url.

Sorting lstItems and the use of binary search in findByUrl gets the following results:
adding items to lstItems is almost as fast as before, only 3 msecs slower on average.
findByName is as slow as before.
findByUrl is on average >200x faster.

Test Plan

Pass kio unittests (behind a proxy, therefore ktcpsockettest fails at 261 sec).

Before:
Total Test time (real) = 706.64 sec
After:
Total Test time (real) = 352.63 sec

Using dolphin, in a folder with 15000 files, select them all and rename them to a# (second time to b#), and as soon as it start renaming, go up two directories to not interfere with the refreshes.
Before: 4m 30s to finish
After: 1m 12s to finish

Diff Detail

Repository
R241 KIO
Branch
dirlister (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
jtamate created this revision.Feb 22 2018, 12:25 PM
Restricted Application added a project: Frameworks. · View Herald TranscriptFeb 22 2018, 12:25 PM
jtamate requested review of this revision.Feb 22 2018, 12:25 PM
markg added a subscriber: markg.Feb 22 2018, 3:25 PM

I unfortunately have no clue how to answer your questions.

src/core/kcoredirlister.cpp
825–829 ↗(On Diff #27768)

Hmm, this looks weird to me.
Sure, it works. "KFileItem retKFileItem = *it;" makes a copy.

A more efficient way (but requires you to change the lists this is backed by to a std::vector) is to:

  1. Take the element out of the vector. Something like "KFileItem item = std::move(*it);
  2. Now the vector would be in a valid state but with one invalid object (will post a problem if you iterate over it later on) so you have to remove that element from the vector like you did.

I'm not sure if the benefits of this justify the path of changing the list (dirItem->lstItems) to a std::vector, if possible at all.
That's up to you :)

jtamate updated this revision to Diff 27909.Feb 24 2018, 12:13 PM
jtamate edited the test plan for this revision. (Show Details)

The first patch had 12 unit test failing.
This one has 2 1/2 unit test failing.

I've changed findByUrl, instead of removing the item from the list,
tell the item to refresh.

I do not know why the refreshed mimetype of a file ending in .html becomes
application/x-extension-html instead of text/html.

I couldn't resist, I've removed also the NonMovableFileItem class.

Can someone explain to me how switching from pointers to values is making anything faster, or is a first step towards making anything faster? This step in itself can only make things slower due to more "copying" (refcount updating).

markg added a comment.Mar 4 2018, 12:30 PM

Can someone explain to me how switching from pointers to values is making anything faster, or is a first step towards making anything faster? This step in itself can only make things slower due to more "copying" (refcount updating).

I don't know what the ultimate goal of @jtamate is here (speeding things up, that i do know).
But in general, putting something on the stack (aka, no new) is measurable faster. For small objects and in small quantities that doesn't matter much. But for large objects (KFileItem is large) and in large quantities (also fitting for KFileItem) it could very well be a nice speedup!

The other side of this optimizing story is copies where you don't intent them to happen but they just do. For instance, a std::vector<KFileItem> allocates a bunch ahead of time and then copies it in. Or moves since my move semantics commits and where applicable. If this were a std::vector<KFileItem *> you'd had no such copy or move.

Back on this patch. It might allow faster routes, but it really depends on @jtamate next plans here if the current patch is worth it.
Sounds like we need more info :)

Can someone explain to me how switching from pointers to values is making anything faster, or is a first step towards making anything faster? This step in itself can only make things slower due to more "copying" (refcount updating).

I don't know what the ultimate goal of @jtamate is here (speeding things up, that i do know).

Hi,

To speedup findByUrl, there are strategies like using a QHash for DirItem* or just having it sorted and search using std::binary_search.
I guess this will be easier to do if there are less pointers in the path, therefore  I'm just trying to remove a TODO: introduced in 4b498196899223692e8a7d334618b2874bb6c77e in 2014.
I haven't done any benchmarks on this code yet, first it needs to pass all the tests.
jtamate updated this revision to Diff 29572.Mar 15 2018, 8:06 AM
jtamate edited the summary of this revision. (Show Details)
jtamate edited the test plan for this revision. (Show Details)

Pass all the unittests, without modifications, finally.
Use as much as possible the recently added move semantics in KFileItem.

mwolff added a subscriber: mwolff.Mar 15 2018, 10:09 AM

no real review, just want to mention that you'll still have heap allocations for every item - now once per container it is in (due to QList and QSet and sizeof the KFileItem)

jtamate planned changes to this revision.Mar 19 2018, 2:59 PM

I've noticed this creates/exposes some crashes in other code/tests, for example in kdirmodeltest the second and later times the test is run.
I'll check all the failing tests (given enough time).

jtamate updated this revision to Diff 34708.May 23 2018, 1:47 PM
jtamate edited the summary of this revision. (Show Details)
jtamate edited the test plan for this revision. (Show Details)

Based on the tests done in D12945 and D11282, the best solution is to have the result of qHash(url) in KFileItem to compare items in the binary search.
In two cases, the KFileItem in the list has to be moved to the right position, this is still faster than before.
Introduce two methods to insert an item into the list and to move the item to the right position.

The crash in kdirmodeltest was due to getting twice the signal of a file changed into a directory.

Restricted Application added a subscriber: kde-frameworks-devel. · View Herald TranscriptMay 23 2018, 1:47 PM
bruns added a subscriber: bruns.May 23 2018, 1:54 PM
bruns added inline comments.
src/core/kfileitem.cpp
1248 ↗(On Diff #34708)

This is incomplete for two cases:

  1. Same URL
  2. Hash collision
src/core/kfileitem.h
490 ↗(On Diff #34708)

The description does not match the implementation ...

bruns added inline comments.May 23 2018, 2:30 PM
src/core/kcoredirlister_p.h
302 ↗(On Diff #27768)

This can be better implemented with std::move and std::move_backward from <algorithm>
http://www.cplusplus.com/reference/algorithm/move/

  1. calculate start and end iterators from the old and new URL
  2. move old item to tmp
if (old_it < new_it) {
   std::move(old_it + 1, new_it, old_it); // move downwards
} else {
   std::move_backward(new_it , old_it - 1, old_it);
}
*new_it = tmp;
src/core/kfileitem.cpp
180 ↗(On Diff #34708)

You can probably leave this out if you use the following for ordering:

bool operator< (const KFileItem& other) {
  if m_url.size() != other.m_url.size() {
    return m_url.size() < other.m_url.size()
  }
  return m_url < other.m_url;
}

You don't need lexicographical sorting, but just a total ordering for the lookup.

jtamate updated this revision to Diff 34775.May 24 2018, 7:26 AM
jtamate marked 5 inline comments as done.
jtamate edited the summary of this revision. (Show Details)

Removed m_hash, after implementing the right checks it was slower than comparing QUrls.

dfaure requested changes to this revision.May 28 2018, 8:01 AM
dfaure added inline comments.
src/core/kcoredirlister.cpp
963 ↗(On Diff #27768)

Where did the old code call refresh() (which you now call inside findByUrl)? I don't see it, I only see more specific calls in more specific cases. So this looks slower and possibly incorrect (for non-local-files).

1004 ↗(On Diff #27768)

This used to modify the fileitem in dirItem->lstItems, now it's modifying a copy.
Same for all other fileitem.setFoo calls below.

Is there a call to reinsert missing?

1843

This used to modify the item in dir->lstItems, so you need to reinsert in order to not lose the changes.

2042 ↗(On Diff #27768)

Needs to be reinserted afterwards.

This revision now requires changes to proceed.May 28 2018, 8:01 AM
jtamate updated this revision to Diff 35305.Jun 1 2018, 8:57 AM
jtamate marked 4 inline comments as done.
jtamate edited the summary of this revision. (Show Details)
jtamate edited the test plan for this revision. (Show Details)

The change in kdirmodel is not needed anymore.
The methods that were const are const again.
Better documented the new reinsert method.

dfaure requested changes to this revision.EditedJul 6 2018, 8:33 AM

Since the goal is to speed up findByUrl, can you add a Q_BENCHMARK for findByUrl?
(there is https://phabricator.kde.org/D12945 but that's more experimental/research, not a benchmark for the production code -- which can still evolve over time)

I do expect it to be faster of course --- a little bit at the expense of the code which modifies items and now has to reinsert them (and that code is hard to benchmark, it's mostly called from async slots...).

Thanks.

src/core/kcoredirlister.cpp
2463 ↗(On Diff #27768)

The parenthesis around lstNewItems can be removed now.

2476 ↗(On Diff #27768)

end() inconsistent with cbegin() on the line just above.

Works because the container is const, but still, better be consistent.

2489 ↗(On Diff #27768)

The parenthesis around lstNewItems can be removed now.

2691 ↗(On Diff #27768)

Is this loop necessary? Can't we return KFileItemList(*allItems) ? There's such a ctor in KFileItemList.

src/core/kcoredirlister_p.h
308 ↗(On Diff #27768)

This is invalid code in case lower_bound() returns end().

If you're sure it "shouldn't happen" because reinsert is always called for items that are already in lstItems, then add a Q_ASSERT. Otherwise if().

This revision now requires changes to proceed.Jul 6 2018, 8:33 AM
jtamate updated this revision to Diff 42345.Sep 26 2018, 6:46 AM
jtamate marked 5 inline comments as done.
jtamate edited the summary of this revision. (Show Details)
jtamate edited the test plan for this revision. (Show Details)

I've tried to do an automatic benchmark of the rename case, without success.
Changed code as requested.

dfaure accepted this revision.Oct 4 2018, 2:10 AM

I think I'm fine with it now, but please wait until next Monday (Oct 8) before pushing, so it doesn't break the upcoming KF5 release. I'm not 100% confident.... (given that earlier versions of the patch had obvious bugs that were not detected by the unittests).

This revision is now accepted and ready to land.Oct 4 2018, 2:10 AM
This revision was automatically updated to reflect the committed changes.
elvisangelaccio reopened this revision.Dec 14 2018, 10:17 PM
elvisangelaccio added a subscriber: elvisangelaccio.
elvisangelaccio added inline comments.
src/core/kcoredirlister.cpp
1576–1577 ↗(On Diff #27768)

Stacktrace from bug #401552 points here. Reverting this commit seems to fix the crash.

@jtamate Can you have a look?

This revision is now accepted and ready to land.Dec 14 2018, 10:17 PM

I've been able to reproduce the bug:

#10 0x00007f44d257ae1d in qt_assert (assertion=assertion@entry=0x7f44d456fc43 "it != dirItem->lstItems.end()", file=file@entry=0x7f44d456fc10 "/g/5kde/frameworks/kio/src/core/kcoredirlister_p.h", line=line@entry=308) at ../../include/QtCore/../../src/corelib/global/qlogging.h:91
#11 0x00007f44d4535c7a in KCoreDirListerCache::reinsert (this=this@entry=0x7f44d45a5440 <(anonymous namespace)::Q_QGS_kDirListerCache::innerFunction()::holder>, item=..., oldUrl=...) at /g/5kde/frameworks/kio/src/core/kcoredirlister_p.h:310
#12 0x00007f44d452825c in KCoreDirListerCache::renameDir (this=this@entry=0x7f44d45a5440 <(anonymous namespace)::Q_QGS_kDirListerCache::innerFunction()::holder>, oldUrl=..., newUrl=...) at /g/5kde/frameworks/kio/src/core/kcoredirlister.cpp:1584
#13 0x00007f44d452aa29 in KCoreDirListerCache::slotFileRenamed (this=0x7f44d45a5440 <(anonymous namespace)::Q_QGS_kDirListerCache::innerFunction()::holder>, _src=..., _dst=..., dstPath=...) at /g/5kde/frameworks/kio/src/core/kcoredirlister.cpp:969

Let's see if this assert/crash can be avoided without reverting all the patch. Any help is welcome.

src/core/kcoredirlister.cpp
825–829 ↗(On Diff #27768)

Changed, using the reinsert semantic.

963 ↗(On Diff #27768)

Changed to use the reinsert semantic.

1004 ↗(On Diff #27768)

Thanks, reinsert has a better semantics than refresh.

src/core/kcoredirlister_p.h
302 ↗(On Diff #27768)

I've looked at those methods, they create a new "undefined" status in the list. I don't want to worry about this new status.

src/core/kfileitem.cpp
180 ↗(On Diff #34708)

I've tried something similar. As QUrl doesn't have a size method, I've tried with path().size(). With findByUrl it is no problem, it gives me 28 msecs vs. 1795 msecs, but inserting in the list gives me 37 msecs vs. the original 19 msecs.
Only checking m_url < other.m_url (without m_hash) gives me 21 msecs inserting and 15 msecs in findByUrl, faster than using m_hash with the right checks for collisions (22-23 msecs inserting).

src/core/kfileitem.h
490 ↗(On Diff #34708)

Changed, thanks. Now they match.

I'm able to reproduce the bug with the following steps:

  • Create a folder structure X/X1/X2/X3/X4 and add a new empty text file into each folder.
  • Within Dolphin, open a tab for each folder.
  • In the tab showing X contents, rename X1 to X1_ and the crash is there.

Next step, transform this into a unit test that always crashes.

Let's see if this assert/crash can be avoided without reverting all the patch. Any help is welcome.

Does this valgrind log help?

Invalid read of size 8
   at 0x56B5628: QSharedDataPointer<KFileItemPrivate>::QSharedDataPointer(QSharedDataPointer<KFileItemPrivate> const&) (qshareddata.h:92)
   by 0x56AF5D6: KFileItem::KFileItem(KFileItem const&) (kfileitem.h:47)
   by 0x575FDFB: KCoreDirListerCache::renameDir(QUrl const&, QUrl const&) (kcoredirlister.cpp:1576)
   by 0x575B041: KCoreDirListerCache::slotFileRenamed(QString const&, QString const&, QString const&) (kcoredirlister.cpp:969)
   by 0x577BF39: QtPrivate::FunctorCall<QtPrivate::IndexesList<0, 1, 2>, QtPrivate::List<QString const&, QString const&, QString const&>, void, void (KCoreDirListerCache::*)(QString const&, QString const&, QString const&)>::call(void (KCoreDirListerCache::*)(QString const&, QString const&, QString const&), KCoreDirListerCache*, void**) (qobjectdefs_impl.h:152)
   by 0x577A630: void QtPrivate::FunctionPointer<void (KCoreDirListerCache::*)(QString const&, QString const&, QString const&)>::call<QtPrivate::List<QString const&, QString const&, QString const&>, void>(void (KCoreDirListerCache::*)(QString const&, QString const&, QString const&), KCoreDirListerCache*, void**) (qobjectdefs_impl.h:185)
   by 0x5777250: QtPrivate::QSlotObject<void (KCoreDirListerCache::*)(QString const&, QString const&, QString const&), QtPrivate::List<QString const&, QString const&, QString const&>, void>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void**, bool*) (qobjectdefs_impl.h:414)
   by 0x75C93BF: QMetaObject::activate(QObject*, int, int, void**) (in /usr/lib/libQt5Core.so.5.12.0)
   by 0x5798ED3: OrgKdeKDirNotifyInterface::FileRenamedWithLocalPath(QString const&, QString const&, QString const&) (moc_kdirnotify.cpp:224)
   by 0x579889D: OrgKdeKDirNotifyInterface::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) (moc_kdirnotify.cpp:103)
   by 0x5798DAB: OrgKdeKDirNotifyInterface::qt_metacall(QMetaObject::Call, int, void**) (moc_kdirnotify.cpp:203)
   by 0x72B19EE: ??? (in /usr/lib/libQt5DBus.so.5.12.0)
 Address 0x11350758 is 72 bytes inside a block of size 128 free'd
   at 0x4839D7B: realloc (vg_replace_malloc.c:826)
   by 0x7439992: QListData::realloc_grow(int) (in /usr/lib/libQt5Core.so.5.12.0)
   by 0x7439A3B: QListData::append(int) (in /usr/lib/libQt5Core.so.5.12.0)
   by 0x576BE43: QList<KFileItem>::insert(QList<KFileItem>::iterator, KFileItem const&) (qlist.h:521)
   by 0x576B1EC: KCoreDirListerCache::DirItem::insert(KFileItem const&) (kcoredirlister_p.h:422)
   by 0x576AC8A: KCoreDirListerCache::reinsert(KFileItem const&, QUrl const&) (kcoredirlister_p.h:310)
   by 0x5760053: KCoreDirListerCache::renameDir(QUrl const&, QUrl const&) (kcoredirlister.cpp:1584)
   by 0x575B041: KCoreDirListerCache::slotFileRenamed(QString const&, QString const&, QString const&) (kcoredirlister.cpp:969)
   by 0x577BF39: QtPrivate::FunctorCall<QtPrivate::IndexesList<0, 1, 2>, QtPrivate::List<QString const&, QString const&, QString const&>, void, void (KCoreDirListerCache::*)(QString const&, QString const&, QString const&)>::call(void (KCoreDirListerCache::*)(QString const&, QString const&, QString const&), KCoreDirListerCache*, void**) (qobjectdefs_impl.h:152)
   by 0x577A630: void QtPrivate::FunctionPointer<void (KCoreDirListerCache::*)(QString const&, QString const&, QString const&)>::call<QtPrivate::List<QString const&, QString const&, QString const&>, void>(void (KCoreDirListerCache::*)(QString const&, QString const&, QString const&), KCoreDirListerCache*, void**) (qobjectdefs_impl.h:185)
   by 0x5777250: QtPrivate::QSlotObject<void (KCoreDirListerCache::*)(QString const&, QString const&, QString const&), QtPrivate::List<QString const&, QString const&, QString const&>, void>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void**, bool*) (qobjectdefs_impl.h:414)
   by 0x75C93BF: QMetaObject::activate(QObject*, int, int, void**) (in /usr/lib/libQt5Core.so.5.12.0)
 Block was alloc'd at
   at 0x4839D7B: realloc (vg_replace_malloc.c:826)
   by 0x7439992: QListData::realloc_grow(int) (in /usr/lib/libQt5Core.so.5.12.0)
   by 0x7439C1A: QListData::insert(int) (in /usr/lib/libQt5Core.so.5.12.0)
   by 0x576BE43: QList<KFileItem>::insert(QList<KFileItem>::iterator, KFileItem const&) (qlist.h:521)
   by 0x576B1EC: KCoreDirListerCache::DirItem::insert(KFileItem const&) (kcoredirlister_p.h:422)
   by 0x575D78C: KCoreDirListerCache::slotEntries(KIO::Job*, QList<KIO::UDSEntry> const&) (kcoredirlister.cpp:1252)
   by 0x577C268: QtPrivate::FunctorCall<QtPrivate::IndexesList<0, 1>, QtPrivate::List<KIO::Job*, QList<KIO::UDSEntry> const&>, void, void (KCoreDirListerCache::*)(KIO::Job*, QList<KIO::UDSEntry> const&)>::call(void (KCoreDirListerCache::*)(KIO::Job*, QList<KIO::UDSEntry> const&), KCoreDirListerCache*, void**) (qobjectdefs_impl.h:152)
   by 0x577AAC8: void QtPrivate::FunctionPointer<void (KCoreDirListerCache::*)(KIO::Job*, QList<KIO::UDSEntry> const&)>::call<QtPrivate::List<KIO::Job*, QList<KIO::UDSEntry> const&>, void>(void (KCoreDirListerCache::*)(KIO::Job*, QList<KIO::UDSEntry> const&), KCoreDirListerCache*, void**) (qobjectdefs_impl.h:185)
   by 0x5777D4A: QtPrivate::QSlotObject<void (KCoreDirListerCache::*)(KIO::Job*, QList<KIO::UDSEntry> const&), QtPrivate::List<KIO::Job*, QList<KIO::UDSEntry> const&>, void>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void**, bool*) (qobjectdefs_impl.h:414)
   by 0x75C93BF: QMetaObject::activate(QObject*, int, int, void**) (in /usr/lib/libQt5Core.so.5.12.0)
   by 0x5703C23: KIO::ListJob::entries(KIO::Job*, QList<KIO::UDSEntry> const&) (moc_listjob.cpp:236)
   by 0x57027CE: KIO::ListJobPrivate::slotListEntries(QList<KIO::UDSEntry> const&) (listjob.cpp:154)

Let's continue on D17619

aacid added a subscriber: aacid.Jul 21 2019, 5:22 PM

@jtamate Does that "let's continue in that other review" mean we should cancel this one? Still shows on the list of reviews to consider

jtamate closed this revision.Jul 21 2019, 6:11 PM

@jtamate Does that "let's continue in that other review" mean we should cancel this one? Still shows on the list of reviews to consider

Yes, it should be closed. It was already commited, unfortunately with a bug associated to it, fixed in the other review.