Add test for adding properties to result
Needs ReviewPublic

Authored by astippich on Dec 2 2018, 11:28 AM.

Details

Reviewers
bruns
Group Reviewers
Baloo
Summary

Adds a simple test for the result class
that tests how multiple properties from the extractors
are added to the variant map

Test Plan

test passes

Diff Detail

Repository
R293 Baloo
Branch
test_result
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 11097
Build 11115: arc lint + arc unit
astippich created this revision.Dec 2 2018, 11:28 AM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptDec 2 2018, 11:28 AM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
astippich requested review of this revision.Dec 2 2018, 11:28 AM

Didn't found what I wanted to find, but since the code is there we can add it.

bruns added inline comments.Dec 2 2018, 1:56 PM
autotests/unit/file/resulttest.cpp
3

^_^

40

Spaces and linebreaks ...

41

No need to reset the default constructed Baloo::Document with a default constructed one.

astippich updated this revision to Diff 46710.Dec 2 2018, 2:48 PM
astippich marked 2 inline comments as done.
  • small fixes
astippich added inline comments.Dec 2 2018, 2:49 PM
autotests/unit/file/resulttest.cpp
41

Actually it is. The TermGenerators are otherwise only initialized with nullptr, and consequently will crash. That is something that should be fixed (separately).

bruns added inline comments.Dec 2 2018, 4:04 PM
autotests/unit/file/resulttest.cpp
41
astippich updated this revision to Diff 47329.Dec 11 2018, 8:31 AM
  • updates for master

I was recently wondering if this is actually desired behavior. Right now it is definitely required since KFileMetaData wrongly outputs QStrings instead of a QStringList, but when this is fixed, this behavior should be removed imho. Otherwise, querying metadata via KFileMetaData and via Baloo differs in output, which it shouldn't.

astippich updated this revision to Diff 48688.Jan 4 2019, 7:01 PM
  • extend the test
astippich retitled this revision from add simple test for string merging to Add test for adding properties to result.Jan 4 2019, 7:02 PM
astippich edited the summary of this revision. (Show Details)

Since nobody seems to care and this is just testing the status quo, I'll merge it next week unless somebody objects

bruns added inline comments.Feb 4 2019, 8:45 PM
autotests/unit/file/resulttest.cpp
64

This looks wrong to me ...
How many items do you get when you append "keyword3" first and ["keyword1", "keyword2"] next?

astippich added inline comments.Feb 5 2019, 6:42 PM
autotests/unit/file/resulttest.cpp
64

It's the same. The properties will get merged regardless of their form if the QVariantMap already contains an item with the same key, see https://phabricator.kde.org/source/baloo/browse/master/src/file/extractor/result.cpp$52 so this works as currently intended.
Right now, we rely on this behavior because KFileMetaData (incorrectly) outputs multiple properties with the same key instead of a single property made of a QStringList. I wrote the test to prepare the switch to string lists at some point.
My favorite solution is to remove this merging, once KFileMetaData outputs string lists when necessary. This requires a custom function for the conversion of the output JSON to a PropertyMap, see https://phabricator.kde.org/source/baloo/browse/master/src/lib/file.cpp$118
QJsonDocument::toVariantMap currently does not handle duplicated keys. IIRC JSON with duplicated keys is strictly not compliant, but since it is only internal, this is okay IMHO.
The own conversion function will save us an extra for loop, and makes sure that the same PropertyMap is generated when either querying via Baloo (file.load()) or directly via KFileMetaData's SimpleExtractionResult.

Is this a please don't merge or can I land it?

bruns added a comment.EditedFeb 11 2019, 10:24 PM

Currently, both
Result::add(prop, "value1"); Result::add(prop, "value2");
and
Result::add(prop, {"value1", "value2"});
are serialized (JSON) in the same way as {prop: ["value1", "value2"]} by Baloo, which is IMHO fine.
On the other hand,
Result::add(prop, "value1"); Result::add(prop, {"value2", "value3"});
ends up as {prop: ["value1", ["value2", "value3"] ]}, which is nonsense, should be {prop: ["value1", "value2", "value3"]}

After deserializing, we should have KFM::Propertymap pm.values(prop) -> QList<QVariant<QString>>({"value1", "value2", "value3"}).

I am currently moving the serialization (in file/result.cpp) /deserialization (in file/file.cpp) into a separate function so it becomes testable.

Currently, both
Result::add(prop, "value1"); Result::add(prop, "value2");
and
Result::add(prop, {"value1", "value2"});
are serialized (JSON) in the same way as {prop: ["value1", "value2"]} by Baloo, which is IMHO fine.

This can lead to issues if e.g. two ReleaseYears are added from different extractors. That is probably impossible right now since KFileMetaData does not have multiple extractors for the same mimetype, but still.
The result after deserialization is then a QVariantList with the values, and probably no client currently handles that since they expect no list.
IMHO Baloo should not alter the output of KFileMetaData in any way (e.g. merging, which it currently relies on for stringlist).

On the other hand,
Result::add(prop, "value1"); Result::add(prop, {"value2", "value3"});
ends up as {prop: ["value1", ["value2", "value3"] ]}, which is nonsense, should be {prop: ["value1", "value2", "value3"]}

I have patches ready for that and was waiting on this revision to land in order to continue...
But as stated above, I also changed my mind, I do not think this is a good idea. This can happen if there are two extractors for the same mimetype.
In that case, the entries may also be duplicated and merging is not a good idea.

After deserializing, we should have KFM::Propertymap pm.values(prop) -> QList<QVariant<QString>>({"value1", "value2", "value3"}).

I am currently moving the serialization (in file/result.cpp) /deserialization (in file/file.cpp) into a separate function so it becomes testable.

Anyways, the test tests how Baloo::Result _currently_ behaves. If another patch then modifies this behavior, the test should be changed when it lands.

bruns added a comment.Feb 22 2019, 6:39 PM

Currently, both
Result::add(prop, "value1"); Result::add(prop, "value2");
and
Result::add(prop, {"value1", "value2"});
are serialized (JSON) in the same way as {prop: ["value1", "value2"]} by Baloo, which is IMHO fine.

This can lead to issues if e.g. two ReleaseYears are added from different extractors. That is probably impossible right now since KFileMetaData does not have multiple extractors for the same mimetype, but still.
The result after deserialization is then a QVariantList with the values, and probably no client currently handles that since they expect no list.
IMHO Baloo should not alter the output of KFileMetaData in any way (e.g. merging, which it currently relies on for stringlist).

No, this is no issue. Either the client uses QMap::find(), and it will get exactly one value (QVariant), or it uses QMap::values() and it will always get a list (QVariantList == QList<QVariant>) - which may have a single element.

On the other hand,
Result::add(prop, "value1"); Result::add(prop, {"value2", "value3"});
ends up as {prop: ["value1", ["value2", "value3"] ]}, which is nonsense, should be {prop: ["value1", "value2", "value3"]}

I have patches ready for that and was waiting on this revision to land in order to continue...
But as stated above, I also changed my mind, I do not think this is a good idea. This can happen if there are two extractors for the same mimetype.
In that case, the entries may also be duplicated and merging is not a good idea.

After deserializing, we should have KFM::Propertymap pm.values(prop) -> QList<QVariant<QString>>({"value1", "value2", "value3"}).

I am currently moving the serialization (in file/result.cpp) /deserialization (in file/file.cpp) into a separate function so it becomes testable.

Anyways, the test tests how Baloo::Result _currently_ behaves. If another patch then modifies this behavior, the test should be changed when it lands.

No, it codifies that both, add(value1); add(value2) and add({value1, value2}) have the same result. And next you say this is wrong ... Iff it is wrong, write down the *correct* behavior and make it QEXPECT_FAIL.

Currently, both
Result::add(prop, "value1"); Result::add(prop, "value2");
and
Result::add(prop, {"value1", "value2"});
are serialized (JSON) in the same way as {prop: ["value1", "value2"]} by Baloo, which is IMHO fine.

This can lead to issues if e.g. two ReleaseYears are added from different extractors. That is probably impossible right now since KFileMetaData does not have multiple extractors for the same mimetype, but still.
The result after deserialization is then a QVariantList with the values, and probably no client currently handles that since they expect no list.
IMHO Baloo should not alter the output of KFileMetaData in any way (e.g. merging, which it currently relies on for stringlist).

No, this is no issue. Either the client uses QMap::find(), and it will get exactly one value (QVariant), or it uses QMap::values() and it will always get a list (QVariantList == QList<QVariant>) - which may have a single element.

I disagree. Right now, clients (Dolphin, Baloo-widgets, Elisa) only use QMap::find(), ignoring all other entries. Currently, properties of the integer are merged into list and will be given as a single property of QVariantList after deserialization. The clients then directly call QVariant::toInt()/toDouble() etc., which fails if the result is a QVariantList. (This will be fixed by D19087 and D19088)

On the other hand,
Result::add(prop, "value1"); Result::add(prop, {"value2", "value3"});
ends up as {prop: ["value1", ["value2", "value3"] ]}, which is nonsense, should be {prop: ["value1", "value2", "value3"]}

I have patches ready for that and was waiting on this revision to land in order to continue...
But as stated above, I also changed my mind, I do not think this is a good idea. This can happen if there are two extractors for the same mimetype.
In that case, the entries may also be duplicated and merging is not a good idea.

After deserializing, we should have KFM::Propertymap pm.values(prop) -> QList<QVariant<QString>>({"value1", "value2", "value3"}).

I am currently moving the serialization (in file/result.cpp) /deserialization (in file/file.cpp) into a separate function so it becomes testable.

Anyways, the test tests how Baloo::Result _currently_ behaves. If another patch then modifies this behavior, the test should be changed when it lands.

No, it codifies that both, add(value1); add(value2) and add({value1, value2}) have the same result. And next you say this is wrong ... Iff it is wrong, write down the *correct* behavior and make it QEXPECT_FAIL.

Well, that is the current behavior, after deserialization you cannot differentiate between these two. And it is actually a nice behavior considering a smooth transition period. It ensures that multiple entries, falsely added individually by the extractors, are read as list, which is expected by the clients.
If the extractors are switched to output stringlists instead of multiple properties, the same output is generated. Once all extractors are switched, the merging of the values can be removed, and the QMap will not be changed when serialized/deserialized.
I can mark the merging of the strings as QEXPECT_FAIL if you prefer, but it looked wrong to me as the clients currently rely on that behavior.

astippich updated this revision to Diff 52894.Mar 1 2019, 12:10 PM
  • mark expected fail

IMHO this test is pointless, as the contents of the variant map is inaccessible, it is only an intermediate storage. The correct way to retrieve the data is via Result::document(). Or you can just accept D19087, which already checks this is correct.

IMHO this test is pointless, as the contents of the variant map is inaccessible, it is only an intermediate storage. The correct way to retrieve the data is via Result::document(). Or you can just accept D19087, which already checks this is correct.

I would appreciate it next time if you think it is pointless, you say so right away, instead of providing hints on how to improve it. Otherwise it is a waste of time.
Also, testing Result::document() was always the next step. And at the time of writing (4 month ago!), it certainly was not pointless as this is the code responsible that Baloo outputs lists after serialization/deserialization.

astippich updated this revision to Diff 56729.Mon, Apr 22, 11:42 AM
  • Test result document