Fix Duplicates on Quick Open File
ClosedPublic

Authored by tcanabrava on Sep 27 2018, 5:28 PM.

Details

Summary

The logic on the quick open was overcomplex, and it led to
bugs while filling the document list.
The new logic fills a vector with all the files,
organizes them by Url, remove the files based on the
duplicated url's (so name matching doesn't matter) and only
then starts to populate the view.

I belive this is also faster than the old code as I minimized
the amount of memory allocations and deallocations.

I'm unsure about the use of QFileInfo in the end of the code as
we can have too many files there and creating a temporary
QFileInfo just to get the name is a huge overhead, we can do a split('/').last()
for the file name. I'll do some measurements with that.

Diff Detail

Repository
R40 Kate
Branch
fixDuplicatesQuickOpen2
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 3949
Build 3967: arc lint + arc unit
tcanabrava created this revision.Sep 27 2018, 5:28 PM
Restricted Application added a project: Kate. · View Herald TranscriptSep 27 2018, 5:28 PM
Restricted Application added a subscriber: kwrite-devel. · View Herald Transcript
tcanabrava requested review of this revision.Sep 27 2018, 5:28 PM
tcanabrava edited the summary of this revision. (Show Details)Sep 28 2018, 7:20 AM
tcanabrava added reviewers: cullmann, brauch, neundorf.

In principle the code looks even cleaner this way, thought the speed should be profiled.
I assume it is a bit ugly to write a unit test for this.

Given I didn't touch this code since long, a second opinion here would be good.
Sven did some changes here in the past, perhaps he can take a look.

About the QFileInfo: I am not sure, how expensive the pure construction and that one function is.
Should be profiled.

In principle the code looks even cleaner this way, thought the speed should be profiled.
I assume it is a bit ugly to write a unit test for this.

Given I didn't touch this code since long, a second opinion here would be good.
Sven did some changes here in the past, perhaps he can take a look.

About the QFileInfo: I am not sure, how expensive the pure construction and that one function is.
Should be profiled.

Profilling.

anthonyfieroni added inline comments.
kate/katequickopen.cpp
148

Why from latin?

Profilling done:
New Code:

takes ~90 millisecconds to load 73.000 files (my work project)
takes ~30 milliseconds to load 15.000 files (GammaRay)

Old Code:

takes ~70 milliseconds to load 73.000 files
takes ~25 milliseconds to load 15.000 files

Bottom line is: New code is slower, but not that much, and it feels instant even when the folder has a huge file tree as my work project does, but the code is much more readable and does not have the bug that displayed duplicated files.
Also:

replacing QVector by std::vector improved the speed by around 10 milliseconds in the worst case.

kate/katequickopen.cpp
148

this is from the old code and I didn't changed.

Could you do some perf profile with e.g. hotspot to see what is the "slow" part?
Nothing against std::vector instead of QVector, we need no sharing here anyways.

Could you do some perf profile with e.g. hotspot to see what is the "slow" part?
Nothing against std::vector instead of QVector, we need no sharing here anyways.

I did. the slow part was copying the QStrings into the allDocuments.
I tried to use QStringRef but since the QStringifycation comes from the QUrl::toString() I didn't managed to get a ref of it.

Could you do some perf profile with e.g. hotspot to see what is the "slow" part?
Nothing against std::vector instead of QVector, we need no sharing here anyways.

I did. the slow part was copying the QStrings into the allDocuments.
I tried to use QStringRef but since the QStringifycation comes from the QUrl::toString() I didn't managed to get a ref of it.

This is the slow hotspot.

foreach(const QString& file, projectDocs) {
      QFileInfo fi(file);
      allDocuments.push_back({ fi.fileName(), QUrl::fromLocalFile(file).toString(QUrl::NormalizePathSegments), false });

Followed by the sort + erase.
But this is not that slow anyway (remember, to be noticeable it needs to be more than 70.000 files in my 5y old machine)

We have some bugs complaining about the quick open speed, like https://bugs.kde.org/show_bug.cgi?id=399151

Therefore I was a bit concerned ;=)

Perhaps somebody else could take a look, too.
I would be ok with this changes.

btw. linecount += 1; is easier to read written as either ++linecount or linecount++; Not that it matters for the compiler.

We have some bugs complaining about the quick open speed, like https://bugs.kde.org/show_bug.cgi?id=399151

Therefore I was a bit concerned ;=)

Perhaps somebody else could take a look, too.
I would be ok with this changes.

I belive that this will fix some of the slowness in cornercases.
I tested in *huge* directories (as I told you, 70.000 files), I can try with bigger ones if you are interested.

The other thing we can do is to actually change this to a QTableModel instead of manually creating QStandardItems.

The biggest source tree that I have is my

btw. linecount += 1; is easier to read written as either ++linecount or linecount++; Not that it matters for the compiler.

We disagree in the last bit (it's clear that's incrementing by one, while '++' makes no sense in tons of languages), but if you want I can remove.

Indeed I just ported a QStandardItemModel to a QAbstractItemModel. The speed gain was huge since many QString -> QVariant conversions are not needed anymore. Do you see this in perf as well?

I can't say. I'll port this to a model tomorrow and test. Either way this
was always supposed to be a model so it's a win win situation.

Em sex, 28 de set de 2018 às 22:50, Dominik Haumann <
noreply@phabricator.kde.org> escreveu:

dhaumann added a comment. View Revision
https://phabricator.kde.org/D15804

Indeed I just ported a QStandardItemModel to a QAbstractItemModel. The
speed gain was huge since many QString -> QVariant conversions are not
needed anymore. Do you see this in perf as well?

*REPOSITORY*
R40 Kate

*REVISION DETAIL*
https://phabricator.kde.org/D15804

*To: *tcanabrava, cullmann, brauch, neundorf
*Cc: *dhaumann, anthonyfieroni, kwrite-devel, michaelh, kevinapavew,
ngraham, demsking, cullmann, sars

tcanabrava updated this revision to Diff 43117.Oct 8 2018, 9:51 AM
tcanabrava edited the summary of this revision. (Show Details)
  • Implement the QuickOpen as a TableModel

Sorry ;=)
I need some time to test this.
I appreciate the improvements but want to give i a try on some large projects locally.

I think the main performance issue here was fixed in 90f51f5830e32998d41a710c448212e49e1be04a, in my profiles that took >95% of the time. It's still worth optimizing, but I think no matter what you do there should not be complaints like the one linked in the bug report any more ...

@brauch: have you tried this patch and played bit if it makes it better or worse?

But you are right, after the fix that already landed, it was again very usable anyways.

@brauch: have you tried this patch and played bit if it makes it better or worse?

But you are right, after the fix that already landed, it was again very usable anyways.

The main point of this patch is not the speed, but the bug that shows duplicates. Picture Attached.

I can try it out.

I didn't mean to speak against the patch with the speed point, I was in contrast saying that maybe the performance point is not so relevant any more ;)

Code-wise this looks fine to me otherwise.

kate/katequickopenmodel.cpp
18

I'd put if parent.isValid() { return 0; } instead.

87

strange indent

tcanabrava updated this revision to Diff 43463.Oct 12 2018, 2:40 PM
  • Implement the QuickOpen as a TableModel
  • Code Cleanup
  • Fix displayign in bold the open files
tcanabrava marked 2 inline comments as done.Oct 12 2018, 2:40 PM
cullmann accepted this revision.Oct 13 2018, 2:09 PM

Hi,

had time to try it with my linux.git clone.
Works reasonable fast, like the last implementation.
If you add copyright headers to the new files, I think this can go in.

Thanks for the contribution!

This revision is now accepted and ready to land.Oct 13 2018, 2:09 PM
shubham added inline comments.
kate/katequickopenmodel.cpp
1 ↗(On Diff #43463)

Please add a copyright header.

kate/katequickopenmodel.h
1 ↗(On Diff #43463)

Please add a copyright header.

shubham removed a subscriber: shubham.Oct 13 2018, 2:18 PM
gregormi added inline comments.
kate/katequickopenmodel.h
14 ↗(On Diff #43463)

I wonder if there is a good way to make it more clear that these enum symbols are used as implicitly casted column index numbers later.

dhaumann added inline comments.Oct 15 2018, 7:34 PM
kate/katequickopenmodel.h
14 ↗(On Diff #43463)

Could be done with enum class. Then you'd have to write Column::FileName etc (along with static_cast<int>(...)).

gregormi added inline comments.Oct 15 2018, 7:44 PM
kate/katequickopenmodel.h
14 ↗(On Diff #43463)

Hmm, that would be 11 instances of std::get<...> which then will look not so nice. The current version seems like a good compromise.

tcanabrava updated this revision to Diff 43737.Oct 16 2018, 2:09 PM
  • Implement the QuickOpen as a TableModel
  • Code Cleanup
  • Fix displayign in bold the open files
  • Add License
  • Use struct instead of std::tuple

There, no more std::get nor cast from enum ;)

tcanabrava marked 5 inline comments as done.Oct 16 2018, 2:13 PM

now I need a re-acceptance.

cullmann accepted this revision.Oct 16 2018, 5:40 PM

Here you go :)

This revision was automatically updated to reflect the committed changes.
gregormi added inline comments.Oct 17 2018, 7:16 PM
kate/katequickopenmodel.h
40 ↗(On Diff #43463)

Cool. Now this could be an "enum class". And Bold is not needed anymore.