Add an option to extract image data and add front cover property
Needs ReviewPublic

Authored by astippich on Nov 25 2019, 11:01 AM.

Details

Reviewers
bruns
mgallien
ngraham
Group Reviewers
Baloo
Summary

prepare for the following patch to taglibextractor

Diff Detail

Repository
R286 KFileMetaData
Branch
binaryData
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 23901
Build 23919: arc lint + arc unit
astippich created this revision.Nov 25 2019, 11:01 AM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptNov 25 2019, 11:01 AM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
astippich requested review of this revision.Nov 25 2019, 11:01 AM
ngraham accepted this revision.Nov 25 2019, 3:21 PM
This revision is now accepted and ready to land.Nov 25 2019, 3:21 PM
astippich updated this revision to Diff 70303.Nov 25 2019, 3:55 PM
  • fix label comment
bruns requested changes to this revision.Nov 26 2019, 11:23 PM
bruns added inline comments.
src/extractionresult.h
61

Binary is how, not what.

This revision now requires changes to proceed.Nov 26 2019, 11:23 PM
astippich added inline comments.Nov 27 2019, 8:09 AM
src/extractionresult.h
61

Do you have a suggestion for a better name? I tried to be as generic as possible. It could be named more specifically ExtractImages, since this is what it will be primarily used for

astippich updated this revision to Diff 70723.Dec 2 2019, 8:29 AM
  • fix label comment
  • tweaks
astippich updated this revision to Diff 71623.Dec 15 2019, 7:36 PM
  • rename flag to ImageData
astippich retitled this revision from Add an option to extract binary data and add front cover property to Add an option to extract image data and add front cover property.Dec 15 2019, 7:37 PM

friendly ping! I would like to continue work on this. Is the new name better? If so, I will rebase follow-up patches

bruns added a comment.Jan 5 2020, 12:25 AM

I have thought about this and have come to the conclusion this is not really future proof.

The Property::FrontCover is IMHO fine, i.e. the changes to propertyinfo.cpp and properties.h could go in as is.

The request of additional properties should be done differently. How about the following:

  1. divide properties into (currently) two sets, "Simple Metadata" (all the current ones but frontcover) and "Complex Metadata" (currently frontcover).
  2. allow to specify extra properties for SimpleExtractionResult
SimpleExtractionResult result(fileName, mimeType);
result.setExtraProperties({Property::FrontCover});
plugin.extract(&result);

You can then also do:

SimpleExtractionResult result(fileName, mimeType, ExtractionResult::ExtractNothing);
result.setExtraProperties({Property::FrontCover, Property::BandLogo});
plugin.extract(&result);

I have thought about this and have come to the conclusion this is not really future proof.

The Property::FrontCover is IMHO fine, i.e. the changes to propertyinfo.cpp and properties.h could go in as is.

The request of additional properties should be done differently. How about the following:

  1. divide properties into (currently) two sets, "Simple Metadata" (all the current ones but frontcover) and "Complex Metadata" (currently frontcover).
  2. allow to specify extra properties for SimpleExtractionResult

Not strictly against your proposal, but how much do we gain? What is your concern about future proof?
From my Elisa application perspective, it would be very convenient to use the proposed ExtractionResult::ExtractEverythingIncludingImageData (or renamed ExtractionResult::ExtractEverything for KF6) and the ExtractionResult::ExtractImageData flags.
For the current proposal the API does not need to be changed.

SimpleExtractionResult result(fileName, mimeType);
result.setExtraProperties({Property::FrontCover});
plugin.extract(&result);

You can then also do:

SimpleExtractionResult result(fileName, mimeType, ExtractionResult::ExtractNothing);
result.setExtraProperties({Property::FrontCover, Property::BandLogo});
plugin.extract(&result);

This is also possible with the current proposal (when the extractors honor the ExtractionResult::ExtractNothing/ExtractMetaData flags properly, which I think currently none does) by just setting the ExtractionResult::ExtractImageData.
It is just that one does not have fine-grained control over each individual "complex" property, but IMHO we do not need that.
I also fear that it may not be clear to distinguish between "simple" and "complex" properties and that users of the API would expect that every property works for the setExtraProperties, which would require an immense amount of work for all extractors.

I also fear that it may not be clear to distinguish between "simple" and "complex" properties and that users of the API would expect that every property works for the setExtraProperties, which would require an immense amount of work for all extractors.

I agree.

bruns added a comment.Jan 5 2020, 4:59 PM

The problem is the extractor serves two different use cases:

  • retrieval for searching/caching properties
  • on-demand extraction

Baloo falls into the first category. It is only interested in properties which can be queried/compared in a
meaningful way. Even storing the properties as a cache for e.g. Dolphin is a little bit of a stretch.

Elisa falls into the second category. It does not store the extracted data persistently, but is fine
extracting e.g. the front cover on demand.

Currently the only large/blob property is the front cover. But what about BackCover, Leaflet pages,
..., Video Thumbnails? Returning all of these even when not required is wasteful. If you want a
front cover, say so.

For KF6, ExtractEverything should just be removed. It is not especially useful, not even for unit tests.
Request the data you want and are able to handle, and do so explicitly.

src/extractionresult.h
62

Does not belong in the API. It is not really clear what it does.

If you want a convenience flag in your application, please define it locally.

src/propertyinfo.cpp
585

shouldBeIndexed is irrelevant for ByteArray, omit it.

astippich updated this revision to Diff 73097.Jan 8 2020, 8:29 PM
  • adress review comments
astippich marked 2 inline comments as done.Jan 12 2020, 3:56 PM

The problem is the extractor serves two different use cases:

  • retrieval for searching/caching properties
  • on-demand extraction

    Baloo falls into the first category. It is only interested in properties which can be queried/compared in a meaningful way. Even storing the properties as a cache for e.g. Dolphin is a little bit of a stretch.

    Elisa falls into the second category. It does not store the extracted data persistently, but is fine extracting e.g. the front cover on demand.

    Currently the only large/blob property is the front cover. But what about BackCover, Leaflet pages, ..., Video Thumbnails? Returning all of these even when not required is wasteful. If you want a front cover, say so.

Yeah, in that case it would be a little bit wasteful. But again, I think if we implement this fine-grained control, it should be done for all properties.
And this is out-of-scope for this patch.

For KF6, ExtractEverything should just be removed. It is not especially useful, not even for unit tests.
Request the data you want and are able to handle, and do so explicitly.

I have adjusted this as requested.

friendly ping for the series

astippich updated this revision to Diff 75306.Feb 9 2020, 4:00 PM
  • udpate since tag

Where are we with this? @bruns are you accepting the proposed solution or do you reject it completely?
I think it is at least way better than the current solution and maintains current API.
A larger refactoring could be done for KF6

Ping! At least a reply would be nice

ngraham accepted this revision.Mon, Mar 16, 4:02 PM

I agree, let's get this in now and refactor it later.

ping @bruns
this has now been open and unanswered for month!
Same for the dependent patches

astippich updated this revision to Diff 77954.Wed, Mar 18, 6:43 PM
  • fix label comment
  • tweaks
  • rename flag to ImageData
  • adress review comments
  • udpate since tag
  • update since tag

Since I got one accept and no other response, I will merge on the weekend

Due to the recent circumstances I will probably not implement the support in Elisa, so I won't land this unless @bruns explicitly is okay with it (I still think this is an improvement over the status quo).