Details

Reviewers

Group Reviewers

Commits

R241:01b2e536aeb3: two new UDS structures

Summary

The two new UDS structures are similar to Frank's, but instead of using two vectors, use only one, with the index next to the data. The first structure uses linear access and in the autotests is the fastest.
The second structure uses binary access and scales better in the number of fields.

I've added to the last 3 test a new one to measure read time as used in KFileItemPrivate::cmp.

If you like one of the new structures, it can replace the one currently used in KIO::UDSEntryPrivate.

Test Plan

run the test several times

The interesing results in my pc: (using parameter -vb)

0.00056 msecs per iteration (total: 74, iterations: 131072)
0.00032 msecs per iteration (total: 86, iterations: 262144)
0.00053 msecs per iteration (total: 70, iterations: 131072)

0.00036 msecs per iteration (total: 95, iterations: 262144)
0.00014 msecs per iteration (total: 77, iterations: 524288)
0.00048 msecs per iteration (total: 64, iterations: 131072)

0.00041 msecs per iteration (total: 54, iterations: 131072)
0.00021 msecs per iteration (total: 57, iterations: 262144)
0.00048 msecs per iteration (total: 64, iterations: 131072)

Diff Detail

Repository

R241 KIO

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

jtamate created this revision.May 2 2018, 11:05 AM

Restricted Application added a project: Frameworks. · View Herald TranscriptMay 2 2018, 11:05 AM

jtamate requested review of this revision.May 2 2018, 11:05 AM

Thanks for that investigation. Interesting that linear search is faster than binary search in the same data structure... maybe the compiler optimizes it better? Did you profile V2 to find out where the time is spent, or do you have a better explanation?
But even if both were equal performance-wise I'd favor linear search because sorted insertion is easy to get wrong - as this patch shows ;)

When you say "scales better", we're talking about the number of fields in the udsentry, not the number of items. But kioslaves don't fill in 1000 fields, so I have the feeling that scaling with the number of fields isn't a requirement.

Are those benchmarks run in Release (or RelWithDebInfo) mode, rather than Debug (which is a big no no for benchmarks)? Qt should be compiled with optimizations enabled too.

PS: for all alternatives where the API is the same, we could use a template function (templated on the type of UDSEntry implementation) to perform the tests; this would ensure that all tests are actually testing the same thing ;-)

autotests/udsentry_benchmark.cpp
464	m_long should be initialized too (in-place initialized in the member declaration would help not forgetting that)
468	const
487	wouldn't it be faster to assign to index->m_str here? index->m_str = value;
491	std::vector with emplace_back would be interesting here, performance-wise. If you keep using QVector, though, then Q_DECLARE_MOVABLE(Field) would help, especially when reserving too small.
498	wouldn't it be faster to assign to index->m_long here? index->m_long = value;
510	alternative: std::find_if.
560	QVERIFY(equal)
593	const
619	This relies on insert being called in ascending "field" order, for lower_bound to work. But that is not necessarily the case in kioslaves, so you'd have to insert at "index" here, instead of appending.
634	const
642	const

In D12659#257098, @dfaure wrote:

Thanks for that investigation. Interesting that linear search is faster than binary search in the same data structure... maybe the compiler optimizes it better? Did you profile V2 to find out where the time is spent, or do you have a better explanation?

Binary search has a (small) overhead, you can hardly beat linear search when the number of entries is small. When you use binary search, you pay when inserting - find the right position, probably move other items. When you do inserts one item a time, you have O(n^2) behavior.

But even if both were equal performance-wise I'd favor linear search because sorted insertion is easy to get wrong - as this patch shows ;)

When you say "scales better", we're talking about the number of fields in the udsentry, not the number of items. But kioslaves don't fill in 1000 fields, so I have the feeling that scaling with the number of fields isn't a requirement.

Using one list is better than two lists (one heap allocation instead of two), one size check when inserting ...

A good data structure here is one contigous vector storing both key and (small value or d-pointer). Inserting is O(1) when the space has been reserved, lookup is O(n). (For binary search, lookup is O(log n) - for 8 entries, this is == 3, linear search is 8/2 == 4, but no overhead).

I would suggest QVector<QPair<uint, QVariant>>. Each element, taking alignment into account, is 24 bytes.

QMap and QHash are clearly overkill here. The involved pointer chasing ruins any performance benefit from the asymtotically faster lookup.

In D12659#257292, @bruns wrote:

In D12659#257098, @dfaure wrote:

Thanks for that investigation. Interesting that linear search is faster than binary search in the same data structure... maybe the compiler optimizes it better? Did you profile V2 to find out where the time is spent, or do you have a better explanation?

Binary search has a (small) overhead, you can hardly beat linear search when the number of entries is small. When you use binary search, you pay when inserting - find the right position, probably move other items. When you do inserts one item a time, you have O(n^2) behavior.

I know that but that's not what the code was doing, he is appending in (hopefully) sorted order, so the difference was only at lookup time, where I can't see any overhead due to binary search.

When you say "scales better", we're talking about the number of fields in the udsentry, not the number of items. But kioslaves don't fill in 1000 fields, so I have the feeling that scaling with the number of fields isn't a requirement.

Using one list is better than two lists (one heap allocation instead of two), one size check when inserting ...

Sure, but I'm only comparing "Another" and "AnotherV2", which both use a single vector.

A good data structure here is one contigous vector storing both key and (small value or d-pointer). Inserting is O(1) when the space has been reserved, lookup is O(n). (For binary search, lookup is O(log n) - for 8 entries, this is == 3, linear search is 8/2 == 4, but no overhead).
I would suggest QVector<QPair<uint, QVariant>>. Each element, taking alignment into account, is 24 bytes.

If anyone attempts this, please name the struct and its members, don't use QPair ;-)
But no, that cannot possibly be faster. QVariant has lots of overhead itself.

If anyone attempts this, please name the struct and its members, don't use QPair ;-)
But no, that cannot possibly be faster. QVariant has lots of overhead itself.

I've tried, just before reading your comment :-)
Three tests: fill the structure, compare two structures and read 3 values.

AnotherV2 (If someone finds a better name, it will be welcome).

0.00041 msecs per iteration (total: 55, iterations: 131072)
0.00022 msecs per iteration (total: 59, iterations: 262144)
0.00048 msecs per iteration (total: 64, iterations: 131072)

QPair+QVariant:

0.00056 msecs per iteration (total: 74, iterations: 131072)
0.00020 msecs per iteration (total: 55, iterations: 262144)
0.00049 msecs per iteration (total: 65, iterations: 131072)

autotests/udsentry_benchmark.cpp
619	Yes, it was badly done. Changed the fill order to detect this problems.

In D12659#257294, @dfaure wrote:

In D12659#257292, @bruns wrote:

In D12659#257098, @dfaure wrote:

Thanks for that investigation. Interesting that linear search is faster than binary search in the same data structure... maybe the compiler optimizes it better? Did you profile V2 to find out where the time is spent, or do you have a better explanation?

Binary search has a (small) overhead, you can hardly beat linear search when the number of entries is small. When you use binary search, you pay when inserting - find the right position, probably move other items. When you do inserts one item a time, you have O(n^2) behavior.

I know that but that's not what the code was doing, he is appending in (hopefully) sorted order, so the difference was only at lookup time, where I can't see any overhead due to binary search.

Even when appending in sorted order, the correct position has to be checked, which is O(n), the appending itself is O(1), as no movement is involved. Appending unsorted is always O(1) (when capacity is sufficient).

When you say "scales better", we're talking about the number of fields in the udsentry, not the number of items. But kioslaves don't fill in 1000 fields, so I have the feeling that scaling with the number of fields isn't a requirement.

Using one list is better than two lists (one heap allocation instead of two), one size check when inserting ...

Sure, but I'm only comparing "Another" and "AnotherV2", which both use a single vector.

The comment was addressing the difference between this approach and the "FrankUDSEntry".

A good data structure here is one contigous vector storing both key and (small value or d-pointer). Inserting is O(1) when the space has been reserved, lookup is O(n). (For binary search, lookup is O(log n) - for 8 entries, this is == 3, linear search is 8/2 == 4, but no overhead).
I would suggest QVector<QPair<uint, QVariant>>. Each element, taking alignment into account, is 24 bytes.

If anyone attempts this, please name the struct and its members, don't use QPair ;-)
But no, that cannot possibly be faster. QVariant has lots of overhead itself.

QVariant has hardly any overhead in this case - the struct here stores one uint and one QString (which is justs its d-ptr), while the QVariant uses one uint for the type info and one union for the data. For both the uint case and QString, the data is stored inline in the QVariant data union.

Using QVariant has the benefit of avoiding a homegrown structure. It is not faster, but it is also not slower.

bruns added inline comments.May 2 2018, 4:32 PM

autotests/udsentry_benchmark.cpp
467	The default `==` will not only compare the key, but also the value (thats the reason for being slow), so you have completely different behaviour. When comparing only the index, you will update the value for an existing key, otherwise you will append a new value to the same key.

Fixed the ordered insertion.
Using std::vector and std::find_if.
Initialize everything to try to detect a change of type in a "insert" over a different type.

Using templates to reduce somehow the code size.
The last 3 structures are tested 3 times:

Fill the structure
Compare two structures
Read 3 values

When you say "scales better", we're talking about the number of fields in the udsentry, not the number of items. But kioslaves don't fill in 1000 fields, so I have the feeling that scaling with the number of fields isn't a requirement.

Yes, I was talking about the number of fields in the udsentry. I had to test it, just in case.

Are those benchmarks run in Release (or RelWithDebInfo) mode, rather than Debug (which is a big no no for benchmarks)? Qt should be compiled with optimizations enabled too.

Yes, since the last comment of D11487 everything is compiled with -O2 -mtune=native
Qt is the one provided by OpenSuse.

Another way to improve insertion speed in release could be:
Transform "insert" to really insert only, with ASSERTS if it used as "replaceOrInsert" and add new methods "replaceOrInsert".

The measurements in the first structure has been done commenting the asserts.
before: 0.00039 msecs per iteration (total: 52, iterations: 131072)
after: 0.00033 msecs per iteration (total: 88, iterations: 262144)

bruns added inline comments.May 2 2018, 9:32 PM

autotests/udsentry_benchmark.cpp
503	This comment is still wrong - you want to compare the key only, not the whole entry Also, you no longer need it, as you use a lambda for the comparision now.
508	non-POD types should not be initialized explicitly
510	`unsigned int` and -1?
518	bad naming, you use field and class Field for different things
526	bad naming again, index is not an index but an entry or field

I like the replaceOrInsert idea, and the assert in insert... we might have to fix some kioslaves, but in general they have no good reason to insert twice for the same key.

In general I like the std::vector + find_if + emplace_back solution. Fast and readable.

autotests/udsentry_benchmark.cpp
520	storage.constBegin(), storage.constEnd() -- same in next method
521	constEnd() here too
522	storage.emplace_back(field, value);
533	emplace_back
558	call it "it", find_if returns an iterator.
621	constBegin() etc. to avoid detaching it = ...
645	!index->m_str.isEmpty() QStringLiteral() without arguments never makes sense, there's no literal...

I've included some methods that will be needed if AnotherUDSEntry replaces the current one.
I hope to have addressed all your comments.

Removed some methods that are not used in the benchmarks.
Added the asserts to check the type of the data at the beginning of every insert and replaceOrInsert.

FYI: Checking the new implementation in live, so far only two asserts in listjob running kfind.

Thanks

This revision is now accepted and ready to land.May 4 2018, 8:33 AM

Closed by commit R241:01b2e536aeb3: two new UDS structures (authored by jtamate). · Explain WhyMay 4 2018, 9:03 AM

This revision was automatically updated to reflect the committed changes.

Diff	ID	Base	Description	Created	Lint	Unit
Base			Base
Diff 1	33480			May 2 2018, 10:33 AM	★	★
Diff 2	33505			May 2 2018, 4:31 PM	★	★
Diff 3	33507			May 2 2018, 6:01 PM	★	★
Diff 4	33543			May 3 2018, 9:09 AM	★	★
Diff 5	33607			May 4 2018, 8:05 AM	★	★
Diff 6	33608	1b721c4	R241:01b2e536aeb3b862203e60f60f858fb39356b57a	May 4 2018, 9:03 AM	★	★

two new UDS structures
ClosedPublic
Actions

Details

Diff Detail

Revision Contents
Changeset List

Diff 33608

autotests/udsentry_benchmark.cpp

two new UDS structuresClosedPublicActions

Details

Diff Detail

Revision ContentsChangeset List

Diff 33608

autotests/udsentry_benchmark.cpp

two new UDS structures
ClosedPublic
Actions

Revision Contents
Changeset List