Fill lists of default-constructed types directly, not append any by value
ClosedPublic
Actions

Authored by kossebau on Dec 7 2017, 8:45 AM.

Details

Reviewers

aaronpuchert
brauch

Group Reviewers

KDevelop

Commits

R32:4b1484607578: Fill lists of default-constructed types directly, not append any by value

Summary

When creating lists of known sizes for types whose instances are set up each
by initially creating a default contructed one, which then is further
modified before being appended by value to the list, it should be quicker to
instead fill the list in one go with all default constructed instances and
then walk over each for the further modification.

Diff Detail

Repository

R32 KDevelop

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

kossebau created this revision.Dec 7 2017, 8:45 AM

Restricted Application added a subscriber: kdevelop-devel. · View Herald TranscriptDec 7 2017, 8:45 AM

kossebau requested review of this revision.Dec 7 2017, 8:45 AM

Not exactly sure if readability here is not reduced, especially when it comes to list-filling-code patterns people might be used to.
So more a RFC of some minimal (unmeasured) performance improvement code I had found possible when looking at the QList->QVector stuff, asking whether you are fine with such code in KDevelop or prefer the current pattern for (which?) reason.

Is this actually faster? Why?

In D9236#177318, @brauch wrote:

Is this actually faster? Why?

It saves a copy constructor and destructor call per list item.

Old code:

alloc heap memory for list items
for each item
    default constructor item on stack
    setup item on stack
    copy constructor for item on heap from item on stack
    destructor item on stack

New code:

alloc heap memory for list items
for each item
    default constructor item on heap
for each item
    setup item on heap

It might not actually be faster though, because you go through the memory twice. If the vector doesn't fit into the cache, that could introduce quite some slowdown.

On the other hand, copying deep objects can obviously be costly. Some of the vector element types seem to carry around some baggage.

The C++11 way would be to move such objects, can we do that? It seems to be supported via QVector<T>::push_back(T&& t).

Edit: Tested it for plugins/quickopen/projectfilequickopen.cpp, and it compiles. It could be faster, because ProjectFiles contain two KDevelop::Paths, which is essentially a QVector<QString>, which has a move constructor. However, since the Paths have a user-declared copy constructor, the move constructor has to be explicitly activated using = default.

Edit: This should also do the job

--- a/kdevplatform/util/path.h
+++ b/kdevplatform/util/path.h
@@ -122,7 +122,7 @@ public:
      *
      * @sa addPath()
      */
-    Path(const Path& base, const QString& subPath = QString());
+    Path(const Path& base, const QString& subPath);
 
     /**
      * Equality comparison between @p other and this Path.

By the way, it's usually a good idea to have vector elements nothrow-move constructible. This can be checked with

static_assert(std::is_nothrow_move_constructible<Type>::value, "...");

for a QVector<T>. This doesn't always seem to be the case. Maybe I'll have a look what can be done there.

In D9236#177743, @aaronpuchert wrote:

It might not actually be faster though, because you go through the memory twice. If the vector doesn't fit into the cache, that could introduce quite some slowdown.

I see. (Thanks for reply, really interested in this but never looked at details in the last years, so happy about input and discussion. Please bear with me being naive and uninformed for now)

On the other hand, copying deep objects can obviously be costly. Some of the vector element types seem to carry around some baggage.

The C++11 way would be to move such objects, can we do that? It seems to be supported via QVector<T>::push_back(T&& t).

I still have to complete my C++11 classes sadly. In my naive mind I would have hoped something like this would be possible, in general:

alloc heap memory for list items
for each item
    default constructor item on heap
    setup item on heap

I would not see any principal need to prepare the item on the stack. Is there? Why would C++11 still prefer a move operator, instead of going directly for the final memory destination?

With QVector it seems there is some practical need for a separate item instance only because the API does not have any hooks for some iterator-based filling, any item construction is hardcoded to the default constructor (Edit: besides the append() logic which does copy/move constructor, which though does not gain us something for avoiding the temp copy on the stack). One could imagine some fill(int count, functor constructor) or some T& appendNew(). But it isn't there after all the years, despite list filling from a loop surely being a common need. Would that be an anti-pattern for some reason?

No real clue yet about move operators, but how does it spare us here any constructor and deconstructor calls? And with the QType-sub-datastructures being implicitly-shared usually, does the move operator make a difference when it comes to unneeded temporary allocations?

In D9236#177787, @kossebau wrote:
I still have to complete my C++11 classes sadly. In my naive mind I would have hoped something like this would be possible, in general:
alloc heap memory for list items
for each item
    default constructor item on heap
    setup item on heap
I would not see any principal need to prepare the item on the stack. Is there? Why would C++11 still prefer a move operator, instead of going directly for the final memory destination?

You are completely right, I would just advise against going through the vector twice, as you'd do with resize(). So if you still use reserve(), then add single elements and initialize them that's fine.

With QVector it seems there is some practical need for a separate item instance only because the API does not have any hooks for some iterator-based filling, any item construction is hardcoded to the default constructor (Edit: besides the append() logic which does copy/move constructor, which though does not gain us something for avoiding the temp copy on the stack). One could imagine some fill(int count, functor constructor) or some T& appendNew(). But it isn't there after all the years, despite list filling from a loop surely being a common need. Would that be an anti-pattern for some reason?

STL's std::vector has emplace_back that constructs a new vector element in place. Sadly QVector doesn't have it yet, and I've seen Qt developers express their doubts it's going to happen soon.

What I was trying to say is that compilers have a lot of freedom with local variables - they might not ever be written to actual memory, sometimes not even into registers. The only requirement imposed is that the observed behavior is not allowed to change. This is why debugging optimized code can be so hard. Eliminating copies of local variables may not actually improve performance.

But even if the compiler is not able to optimize out copies, all that stuff is still happening in the L1 cache. Drastic performance degradation usually happens when there is too much data and stuff gets evicted to main memory. Hence it's preferable to do everything in one pass.

No real clue yet about move operators, but how does it spare us here any constructor and deconstructor calls? And with the QType-sub-datastructures being implicitly-shared usually, does the move operator make a difference when it comes to unneeded temporary allocations?

The benefit for implicitly-shared data types might be small, but I think not all types are. So maybe I should clarify that moves might not always be cheaper than copies (for simple data types they do the same), but sometimes there is a considerable benefit, usually when types own (meaning it's not shared) a chunk of data on the heap.

In D9236#177745, @aaronpuchert wrote:
By the way, it's usually a good idea to have vector elements nothrow-move constructible. This can be checked with
static_assert(std::is_nothrow_move_constructible<Type>::value, "...");
for a QVector<T>. This doesn't always seem to be the case. Maybe I'll have a look what can be done there.

Well, "QVector still doesn't move elements on reallocations."

Thanks for detailed reply @aaronpuchert Does not conflict with what I heard elsewhere, so big picture got more clear to me.

So my take-away is: default to one-pass for best practice, unless actual measurements are done.

So will discard this patch or, well, reduce it to the changes which are still one-pass afterwards (the ones where no further modification besides default constructor is done).

reduce patch to those single-pass list filling where default constructed
value is added

aaronpuchert accepted this revision.Dec 14 2017, 12:32 AM

This revision is now accepted and ready to land.Dec 14 2017, 12:32 AM

Not much is left of the original patch but this change makes sense to me ;) thanks!

Closed by commit R32:4b1484607578: Fill lists of default-constructed types directly, not append any by value (authored by kossebau). · Explain WhyDec 14 2017, 10:17 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

			Path	Packages
M			plugins/debuggercommon/registers/registercontroller_arm.cpp (10 lines)
M			plugins/debuggercommon/registers/registercontroller_x86.cpp (10 lines)

Diff	ID	Base	Description	Created	Lint	Unit
Base			Base
Diff 1	23601	e377315		Dec 7 2017, 8:45 AM	★	★
Diff 2	23875	9b140d2	reduce patch to those single-pass list filling where default constructed	Dec 13 2017, 5:21 PM	★	★
Diff 3	23893	9956d5c	R32:4b14846075780e797c9bbd8cfa2902552022669e	Dec 14 2017, 10:16 AM	★	★

Fill lists of default-constructed types directly, not append any by valueClosedPublicActions

Details

Diff Detail

Revision ContentsChangeset List

Diff 23893

plugins/debuggercommon/registers/registercontroller_arm.cpp

plugins/debuggercommon/registers/registercontroller_x86.cpp

Fill lists of default-constructed types directly, not append any by value
ClosedPublic
Actions

Revision Contents
Changeset List