Use Key API in DataStoreQuery
ClosedPublic

Authored by rnicole on Jul 13 2018, 4:11 PM.

Details

Summary

Depends on D13902

Summary

Key modifications:

  • Add default constructor but it should not be used -> deprecated (needed to store Keys in Qt containers)

EntityStore modifications:

  • Add readLatest functions with Identifier as argument
  • Change first readRevision to use a callback taking a Key as argument
  • Change all readPrevious (except the templated one) to take an Identifier as argument
  • Change one indexLookup to return a vector of Identifiers

fixes "bug" of whole keys as argument of readLatest (whole keys were passed to Source::add -> mIncrementalIds -> DataStoreQuery::readEntity -> EntityStore::readLatest)

TypeIndex modifications

  • Change lookup to return a vector of Identifiers

Benchmarks

Run benchmarks:

DevelopD13902This patch
Current Rss usage [kb]: 40700Current Rss usage [kb]: 38484Current Rss usage [kb]: 38564
Peak Rss usage [kb]: 40700Peak Rss usage [kb]: 38484Peak Rss usage [kb]: 38564
Rss growth [kb]: 15920Rss growth [kb]: 13660Rss growth [kb]: 13352
Rss growth per entity [byte]: 3260Rss growth per entity [byte]: 2797Rss growth per entity [byte]: 2734
Rss without db [kb]: 29736Rss without db [kb]: 29504Rss without db [kb]: 29248
Percentage peak rss error: 0Percentage peak rss error: 0Percentage peak rss error: 0
On disk [kb]: 10788On disk [kb]: 8804On disk [kb]: 9140
Buffer size total [kb]: 898Buffer size total [kb]: 898Buffer size total [kb]: 898
Write amplification: 12.0075Write amplification: 9.79923Write amplification: 10.1732

Note: "On disk" has increased and I'm not sure why

Test Disk Usage:

DevelopD13902This patch
Free pages: 412Free pages: 312Free pages: 309
Total pages: 760Total pages: 607Total pages: 599
Used size: 1425408Used size: 1208320Used size: 1187840
Calculated key + value size: 856932Calculated key + value size: 702866Calculated key + value size: 702866
Calculated total db sizes: 970752Calculated total db sizes: 950272Calculated total db sizes: 954368
Main store on disk: 3112960Main store on disk: 2486272Main store on disk: 2453504
Total on disk: 3293184Total on disk: 2666496Total on disk: 2633728
Used size amplification: 1.66339Used size amplification: 1.71913Used size amplification: 1.68999
Write amplification: 3.63268Write amplification: 3.53733Write amplification: 3.49071

Note: pages has increased and I'm not sure why (maybe the benchmark ran more samples?)
Like above, disk space has also increased

Diff Detail

Repository
R9 Sink
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
rnicole requested review of this revision.Jul 13 2018, 4:11 PM
rnicole created this revision.

fixes "bug" of whole keys as argument of readLatest (whole keys were passed to Source::add -> mIncrementalIds -> DataStoreQuery::readEntity -> EntityStore::readLatest)

This changes the bahviour slightly in that instead of replaying the exact revision we'd be replaying the latest revision. This should be fine I think, but we should double-check.

Looks good other than that. I suppose the end goal would be to get rid of pretty much all fromDisplayByteArray conversions. Internally we shouldn't have to convert at all.
But it's fine if you want to stage that with another patch.

common/datastorequery.cpp
53

Should we perhaps replace default construction with a static createIdentifier() method?

299

Most places here should change as well to avoid unnecessarily converting back and forth.
This is not to say that you have to add this to the same patch, it's fine if you have it planned for another patch.

common/storage/entitystore.cpp
505

Empty method?

rnicole marked an inline comment as done.Jul 16 2018, 8:58 AM
rnicole added inline comments.
common/datastorequery.cpp
53

Replacing the behavior of the default construction is definitely a consideration. I initially wanted to avoid "null" Keys in the database: in the case something goes wrong and the default constructor is called without our knowledge, we will have new unique keys for each elements, which is arguably better than multiple elements with the same "null" key.

299

This conversion is here since it will be forwarded to the callback in next, which takes a ResultSet::Result as argument. But ResultSet is used in many places, and here seemed like the most straightforward place to convert it. Should we plan on doing a patch for ResultSet?

common/storage/entitystore.cpp
505

I can't believe I missed it. Thanks and sorry about that…

rnicole updated this revision to Diff 37856.Jul 16 2018, 9:05 AM
rnicole marked an inline comment as done.

Fix empty function

rnicole marked an inline comment as done.Jul 16 2018, 9:05 AM
cmollekopf added inline comments.Jul 17 2018, 8:16 AM
common/datastorequery.cpp
53

Empty keys are much easier to catch though. We can easily build the safety net to make sure no empty keys ever end up in the database (I think lmdb doesn't even support that, so it will fail anyways). We can't distinguish between a legitimate uid and one that is accidentally created, so worst case that would even hide the root cause from us.

299

Yes. I think everything internal should be using the binary representation, and ResultSet is the primary means to "stream" results through the query that then get filtered.

rnicole updated this revision to Diff 38068.Jul 19 2018, 8:10 AM
  • Default constructor of Identifier now produces null ids
  • Remove deprecation of Key's default constructor
  • Safety net: assert the uid is not null when converting to binary representation (when we pass it to the database)
rnicole edited the summary of this revision. (Show Details)Jul 19 2018, 8:57 AM
rnicole updated this revision to Diff 38081.Jul 19 2018, 10:26 AM
rnicole marked 3 inline comments as done.

Remove invalid comments about avoiding default construction

cmollekopf accepted this revision.Jul 19 2018, 5:15 PM
This revision is now accepted and ready to land.Jul 19 2018, 5:15 PM
rnicole updated this revision to Diff 38226.Jul 23 2018, 8:40 AM
  • Since Keys and Identifiers can now be "null", we can convert the selection part of the Reduce filter to use Identifiers
  • Also added the operator== and operator!= to Identifier, Revision and Key
  • Benchmark update coming up
rnicole requested review of this revision.Jul 23 2018, 8:48 AM
rnicole edited the summary of this revision. (Show Details)

The benchmark tables are now updated, for some reason, memory went up in "Run benchmarks", but down in "Test Disk Usage"

The benchmark tables are now updated, for some reason, memory went up in "Run benchmarks", but down in "Test Disk Usage"

I don't know why that is either. If you rerun the benchmark a couple of times it should be possible to see if there is a certain amount of fluctuation that we always see. While it would be interesting to know where that fluctuation comes from exactly, it might be that the increase you see is just within the margin of error.

cmollekopf accepted this revision.Jul 23 2018, 9:04 AM
This revision is now accepted and ready to land.Jul 23 2018, 9:04 AM
This revision was automatically updated to reflect the committed changes.