handle more tags in taglibextractor
ClosedPublic

Authored by astippich on Feb 24 2018, 6:10 PM.

Details

Summary

adds the ability to read more tags through taglib when supported by the format. tests including sample files are adjusted for the new features.

Diff Detail

Repository
R286 KFileMetaData
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a project: Frameworks. · View Herald TranscriptFeb 24 2018, 6:10 PM
Restricted Application added a subscriber: Frameworks. · View Herald Transcript
astippich requested review of this revision.Feb 24 2018, 6:10 PM

First of all, please review carefully for ABI compatibility, I do not know what to watch for that.

So I decided to walk into the minefield that tags are. Everyone implements it differently. I've implemented some tags that seem to be supported across a wide range of applications (Kid3, Amarok, Kodi... ) but no guarantee that they will work everywhere. Some tags were requested on LWN.net in the comments for the first alpha (regarding classical music), but most of them only work for Vorbis comments as the others lack support.
The album cover image is currently passed as a QByteArray which then needs to be further processed by the application. I don't know if that is the best approach, but I think is the most generic one. Comments welcome.

Nice! How do the infopanel or tooltips of dolphin look with this?

src/properties.h
164

?

169

Will we break ABI with correct 'Language'?

src/propertyinfo.cpp
590

Language?

I don't know dolphin works, but given how KFileMetadata is designed, the new tags should be ignored until explicit support is added in dolphin.

src/properties.h
164

Oops, seems like a wrong copy & paste :) will fix

169

I noticed the typo as well, but I think this is a matter of API compatibility, so I left it unchanged. Matthieu should know.

mgallien added inline comments.
src/properties.h
169

It is part of the source compatibility promise for a major version of KFileMetaData. However, you can add a new one and push the data to both properties. This way, we could deprecate the one with the typo. @dfaure do you have another solution ?

I don't know dolphin works, but given how KFileMetadata is designed, the new tags should be ignored until explicit support is added in dolphin.

Please try that. For unknown types baloo-widgets (responsible for infopanel/tooltips) falls back to value.toString(), no idea what will happen when you feed it with binary data. The cover properties return binary data, right?

I have created T8079 to work on baloo database update when extractors are modified and returned different data.

astippich planned changes to this revision.Feb 26 2018, 9:51 PM

I don't know dolphin works, but given how KFileMetadata is designed, the new tags should be ignored until explicit support is added in dolphin.

Please try that. For unknown types baloo-widgets (responsible for infopanel/tooltips) falls back to value.toString(), no idea what will happen when you feed it with binary data. The cover properties return binary data, right?

The cover is binary data, right. And I think I was wrong. While I don't fully understand yet how baloo-widgets work, from what I've read all available properties are automatically queried. The binary data would then be converted to a string as you said. So adding the cover this way may not be a good solution. A different interface should be created for image/binary data.
I think I will split this into two diffs and do the cover images separately.

A thumbnail extractor might be better suited for this job.
On my system(tumbleweed) there is a kde-audio-thumbnail. I don't know anything more about it.
Maybe ffmpeg-thumbnails can do that?

I don't know dolphin works, but given how KFileMetadata is designed, the new tags should be ignored until explicit support is added in dolphin.

Please try that. For unknown types baloo-widgets (responsible for infopanel/tooltips) falls back to value.toString(), no idea what will happen when you feed it with binary data. The cover properties return binary data, right?

The cover is binary data, right. And I think I was wrong. While I don't fully understand yet how baloo-widgets work, from what I've read all available properties are automatically queried. The binary data would then be converted to a string as you said. So adding the cover this way may not be a good solution. A different interface should be created for image/binary data.
I think I will split this into two diffs and do the cover images separately.

This is not necessary. You could use a QImage or a QPixmap. You can put them in a QVariant and they should be converted to an empty QString.
As far as I understand, in Elisa, we would need to access them with QQuickImageProvider. All in all, well done job.

I am not sure for the rating given that there is also a rating (the one used in Elisa) that is stored as an extended file system attribute independent from the file mime type. One limit of this rating is that it is specific to the KFileMetaData API users. Do you have an idea ?

This is not necessary. You could use a QImage or a QPixmap. You can put them in a QVariant and they should be converted to an empty QString.

Please consider writing a plugin for KIO:PreviewJob (for the covers) anyway. It would be more widely useable. I'd like to see a/the cover/s in Dolphin too.
:-) We could also argue a little about covers being metadata or primary data. For starters I'd consider a reference to a cover as metadata. Still :-)

I don't know dolphin works, but given how KFileMetadata is designed, the new tags should be ignored until explicit support is added in dolphin.

Please try that. For unknown types baloo-widgets (responsible for infopanel/tooltips) falls back to value.toString(), no idea what will happen when you feed it with binary data. The cover properties return binary data, right?

The cover is binary data, right. And I think I was wrong. While I don't fully understand yet how baloo-widgets work, from what I've read all available properties are automatically queried. The binary data would then be converted to a string as you said. So adding the cover this way may not be a good solution. A different interface should be created for image/binary data.
I think I will split this into two diffs and do the cover images separately.

This is not necessary. You could use a QImage or a QPixmap. You can put them in a QVariant and they should be converted to an empty QString.
As far as I understand, in Elisa, we would need to access them with QQuickImageProvider. All in all, well done job.

I am not sure for the rating given that there is also a rating (the one used in Elisa) that is stored as an extended file system attribute independent from the file mime type. One limit of this rating is that it is specific to the KFileMetaData API users. Do you have an idea ?

I think QImage is no different than QByteArray for baloo-widgets. With QByteArray, it shows a garbage string. With QImage, it still displays the label and an empty string afterwards. So not really a nice solution as can be seen in the picture. One could add a special method to ExtactionResult to query the picture data, but I think this breaks ABI. I think the issue here is baloo-widgets unconditionally displaying everything it gets while it should actually check for the type. Judging from a first glance, this should be possible, but since baloo widgets is in kde applications and has a different release schedule, one could still see the issue when a new kfilemetadata is used with an older kde applications release. Is this acceptable?

Regarding the rating, I'm also not entirely happy about this. I implemented it because I thought it would be a nice fallback for Elisa if the file has not been rated before (and maybe also for different operating systems). We also had this as a feature request somewhere. But there is no real standard regarding how the rating is stored and it is probably hard to get it right. So right know I'm inclined to remove that tag again. Thoughts?

Btw, lots of more tags incoming

This is not necessary. You could use a QImage or a QPixmap. You can put them in a QVariant and they should be converted to an empty QString.

Please consider writing a plugin for KIO:PreviewJob (for the covers) anyway. It would be more widely useable. I'd like to see a/the cover/s in Dolphin too.

That would be a nice feature indeed.

Restricted Application added a project: Baloo. · View Herald TranscriptFeb 28 2018, 6:40 PM

[...]

This is not necessary. You could use a QImage or a QPixmap. You can put them in a QVariant and they should be converted to an empty QString.
As far as I understand, in Elisa, we would need to access them with QQuickImageProvider. All in all, well done job.

I am not sure for the rating given that there is also a rating (the one used in Elisa) that is stored as an extended file system attribute independent from the file mime type. One limit of this rating is that it is specific to the KFileMetaData API users. Do you have an idea ?

I think QImage is no different than QByteArray for baloo-widgets. With QByteArray, it shows a garbage string. With QImage, it still displays the label and an empty string afterwards. So not really a nice solution as can be seen in the picture. One could add a special method to ExtactionResult to query the picture data, but I think this breaks ABI. I think the issue here is baloo-widgets unconditionally displaying everything it gets while it should actually check for the type. Judging from a first glance, this should be possible, but since baloo widgets is in kde applications and has a different release schedule, one could still see the issue when a new kfilemetadata is used with an older kde applications release. Is this acceptable?

I am not really sure bout putting images inside the baloo database.

@vhanda do you have any advice about this ? I am really hesitating handling all of this into final applications or directly inside baloo and in baloo-widgets for the benefits of all clients of baloo.

Regarding the rating, I'm also not entirely happy about this. I implemented it because I thought it would be a nice fallback for Elisa if the file has not been rated before (and maybe also for different operating systems). We also had this as a feature request somewhere. But there is no real standard regarding how the rating is stored and it is probably hard to get it right. So right know I'm inclined to remove that tag again. Thoughts?

Btw, lots of more tags incoming

Let's remove all the undecided stuff and land this one quickly for the benefit of everybody. Do you agree ?

This is not necessary. You could use a QImage or a QPixmap. You can put them in a QVariant and they should be converted to an empty QString.

Please consider writing a plugin for KIO:PreviewJob (for the covers) anyway. It would be more widely useable. I'd like to see a/the cover/s in Dolphin too.

That would be a nice feature indeed.

It could be shown by baloo-widgets if the data is stored by baloo. Let's wait a bit for possible help and feedback on this.

astippich updated this revision to Diff 28300.Feb 28 2018, 10:58 PM
  • handle more tags and remove cover

With this many tags we need to be more considerate because it affects other applications which will need to adapt.

  1. Plural or singular. E.g. If it was 'performers' the property could reused by video files. (In KF6 the same should apply to 'Languages')
  2. If there is too much information in them the tooltips in dolphin will refuse to display (not enough room). This behaviour depends on the position of the hovered file relative to the screen. For the same file tooltip sometimes displays sometimes not. That will most likely confuse users.
  3. The property selection dialog for dolphin's infopanel currently is neat and small. If the list of items in it becomes too long users will have a hard time to find their way through it. We have to check if it needs another layout.
  4. The property names should be choosen to be as generic as possible to make them reusable by other file types.

With this many tags we need to be more considerate because it affects other applications which will need to adapt.

  1. Plural or singular. E.g. If it was 'performers' the property could reused by video files. (In KF6 the same should apply to 'Languages')

While I think you have a point here, KFileMetaData sticked to singular names before, e.g. author, where also multiple persons are possible. So I think we should keep that behavior.

  1. If there is too much information in them the tooltips in dolphin will refuse to display (not enough room). This behaviour depends on the position of the hovered file relative to the screen. For the same file tooltip sometimes displays sometimes not. That will most likely confuse users.

To be honest, I don't know which tooltip you mean. Can you post a picture? Is this an option I have to switch on?

  1. The property selection dialog for dolphin's infopanel currently is neat and small. If the list of items in it becomes too long users will have a hard time to find their way through it. We have to check if it needs another layout.

I think it behaves quite well and reasonable with many tags as it shows a scrollbar. I don't think that there is a better solution.

  1. The property names should be choosen to be as generic as possible to make them reusable by other file types.

I went through the properties again and only found that we might want to use the "generator" property instead of a new "encodedby" property. Do you have any other suggestion?

While I think you have a point here, KFileMetaData sticked to singular names before, e.g. author, where also multiple persons are possible. So I think we should keep that behavior.

We're stuck with that. Nevertheless, it's wrong IMO. All I'm saying is: Before simply repeat this error, let's give it some thought.

To be honest, I don't know which tooltip you mean. Can you post a picture? Is this an option I have to switch on?

Configure Dolphin > General > Show Tooltips
Currently tooltips are a little broken. see D9973

I think it behaves quite well and reasonable with many tags as it shows a scrollbar. I don't think that there is a better solution.

I haven't tested this. So I don't really know. And you're probably right. I forgot that the choice of properties depends on the file type. A screenshot of the dialog would be nice (large enough to display everthing).

  1. The property names should be choosen to be as generic as possible to make them reusable by other file types.

I went through the properties again and only found that we might want to use the "generator" property instead of a new "encodedby" property. Do you have any other suggestion?

Not yet. To illustrate what I'm thinking of:
Conductor and Director (of a Movie) are analogous. It would be nice to have a term covering both. . Ditto Lyricist and Screenwriter, and maybe more.
I don't know if that is possible - at least I have no idea yet. We should just give it some (more) thought.
Or the other way around:
I find the distinction between 'Release Year' for media and 'Creation Date' (which should correctly be 'Publication Date') in for EBooks annoying and confusing. We should not repeat that, if possible.

The property names are important for searching with baloo. Everything comes together here. (I'll elaborate this on demand)
The code you are working on exemplifies pretty well, that everybody is brewing his own soup when is comes to metadata and tags. And this is only audio!
I'm not saying you're doing this. But instead of just throwing in what is there we should agree on a concept how KDE is handling metadata. This is not the place to discuss that, though. I haven't this much thought yet it's just a feeling we might get into trouble in the future.

src/extractors/taglibextractor.cpp
93

Lyrics are not metadata. Unless a song is about itself ;-)

src/properties.h
168

This may become a plural in KF6, should we decide on it.

302

Could you give an example for what 'opus' is describing?

307

Could you give an example for what 'label' is describing?

michaelh added inline comments.Mar 2 2018, 8:12 AM
autotests/taglibextractortest.cpp
80

Just a hint: The code would be easier to read if the title of the sample file was
'I love you so much' or 'Desaster will come' just not the same as the propery name. (Same for most of the props below)

What do you think of using https://schema.org/AudioObject Vocabulary?

While I think you have a point here, KFileMetaData sticked to singular names before, e.g. author, where also multiple persons are possible. So I think we should keep that behavior.

We're stuck with that. Nevertheless, it's wrong IMO. All I'm saying is: Before simply repeat this error, let's give it some thought.

Well, if we would go to plurals, it would be wrong the other way if there are only single entries . Also, I think the properties don't have to plural, they are internal. The labels of them should adjust for the singular or plural case. That said, I don't think we can achieve that without breaking the programming interface, so we're stuck with that.

To be honest, I don't know which tooltip you mean. Can you post a picture? Is this an option I have to switch on?

Configure Dolphin > General > Show Tooltips
Currently tooltips are a little broken. see D9973

Okay, so here is a screenshot. Looks not so nice with lots of entries. Imho there should be a maximum number of properties, and a scrollbar should be used if that threshold is exceeded.

I think it behaves quite well and reasonable with many tags as it shows a scrollbar. I don't think that there is a better solution.

I haven't tested this. So I don't really know. And you're probably right. I forgot that the choice of properties depends on the file type. A screenshot of the dialog would be nice (large enough to display everthing).

The infopanel looks perfectly fine:

  1. The property names should be choosen to be as generic as possible to make them reusable by other file types.

I went through the properties again and only found that we might want to use the "generator" property instead of a new "encodedby" property. Do you have any other suggestion?

Not yet. To illustrate what I'm thinking of:
Conductor and Director (of a Movie) are analogous. It would be nice to have a term covering both. . Ditto Lyricist and Screenwriter, and maybe more.
I don't know if that is possible - at least I have no idea yet. We should just give it some (more) thought.
Or the other way around:
I find the distinction between 'Release Year' for media and 'Creation Date' (which should correctly be 'Publication Date') in for EBooks annoying and confusing. We should not repeat that, if possible.

I agree that we should think that through. Once committed, we're stuck with that for the time being. In this specific case, I would actually argue that the conductor and director are different. Imho there is only limited potential to find tags that work for everything. Video, Music, text etc. are different after all.

The property names are important for searching with baloo. Everything comes together here. (I'll elaborate this on demand)
The code you are working on exemplifies pretty well, that everybody is brewing his own soup when is comes to metadata and tags. And this is only audio!
I'm not saying you're doing this. But instead of just throwing in what is there we should agree on a concept how KDE is handling metadata. This is not the place to discuss that, though. I haven't this much thought yet it's just a feeling we might get into trouble in the future.

We pretty much have to cope with whats available out there in the wild as de facto standard. We're only consumers here.

autotests/taglibextractortest.cpp
80

Hmm, I disagree since they are simple and they don't leave room for ambiguity

src/extractors/taglibextractor.cpp
93

But they are commonly stored as metadata tags in the audio file.

src/properties.h
302
307

Phew, what a conversation!

For the issue of the rating, perhaps we could sync the tag metadata with the baloo metadata, if the baloo rating metadata is already empty. I might even go so far as to overwrite the baloo rating metadata with the tag rating metadata; I can't think of a use case for wanting to maintain two separate rating values for the same file. And it would be really cool to be able to see and manipulate the rating in Dolphin, and have that rating reflected in your music player.

For the issue of singular vs plural, is baloo-widgets smart enough to conditionally display a plural form of the string when there are multiple entries (e.g
"performer: X"; "performers X, Y, Z")? Could we make it smart enough? If not, let's stick with singular to avoid making our lives too difficult for now.

For the issue of Dolphin's Tooltip, perhaps we should make it become multi-row when there's a huge amount of content, or the font size should automatically reduce in size. Or maybe the tooltip should only display content from among a curated selection of tags that we deem to be the most important, and we can rely on the Information panel using used to access the full set

Let's remove all the undecided stuff and land this one quickly for the benefit of everybody. Do you agree ?

+1

astippich updated this revision to Diff 28467.Mar 3 2018, 9:57 AM
  • remove lyrics and map encodedby to generator

I played around a bit with your previous diff (lyrics included). And modified some test.ogg and test.mp3. Lyrics is not metadata, but if we include them they will be searchable:

$ baloosearch lyrics:carefree
~/samplefiles/test.ogg
~/samplefiles/test.mp3
Elapsed: 0.76411 msecs

which is very nice, IMO. Will work in KRunner too.
Info panel looks good.
Tooltips: buggy. see here

I'm a bit sorry, that I'm so back and forth with this. Finding a proper compromise takes time.

In conclusion for me:
Singular is OK
For a first shot go for

  • Performer
  • Conductor
  • Ensemble
  • (and maybe) Arranger

Fixing the tooltips is only half-done. Let me finish fixing them and then do the rest of the tags.

Side note:
There is a big advantage of using extended attributes for Rating: It does not write to the file, hence file times are preserved. Otherwise that information will be lost. Rating is the most volatile tag, changing over time.

Also, since this discussion also applies for the taglibextractor: Is a string list preferred for new properties in KFileMetadata when multiple entries are possible? I think right now most of the properties are using a single concatenated string.

Looks like except for composer it is a string list when persons are involved. In general, I think, going from stringlist -> string causes less pain/errors than the other way around.

mgallien accepted this revision.Mar 26 2018, 8:17 PM

I am really sorry. I had missed that you had modified the diff to only include the safe set of changes. I should really apologize.
Thanks for your work, it is very much appreciated.

This revision is now accepted and ready to land.Mar 26 2018, 8:17 PM
michaelh accepted this revision.Mar 27 2018, 10:19 AM
michaelh added a reviewer: michaelh.

Tooltips cope well

ngraham accepted this revision.Mar 27 2018, 2:01 PM

Thanks for the heads-up, Matthieu. Looks good and works well for me too!

All of a sudden :)
I've been quiet about this because I'd like to resolve the issues regarding strings/stringlists first, so if you have some spare time, please have a look at D11365. I will rebase after this one landed

astippich updated this revision to Diff 30950.Mar 30 2018, 4:44 PM
  • rebase on top of master
astippich updated this revision to Diff 30951.Mar 30 2018, 4:46 PM
  • update value types

Okay, I changed my mind, I rebased on top of master. Since the other revision needs more time and the new properties behave like the old ones, I'd like to end this long journey and land this revision if no-one objects :)

michaelh requested changes to this revision.Mar 30 2018, 6:00 PM

Except for the white space stuff I'm still fine with this.

src/extractors/taglibextractor.cpp
230

New line please

297

Nitpick: These comments aren't really informative. The tagname is explicit enough.
I don't mind if you leave them in.

621

Unrelated white-space change

This revision now requires changes to proceed.Mar 30 2018, 6:00 PM
astippich marked 3 inline comments as done.Mar 30 2018, 6:19 PM

Except for the white space stuff I'm still fine with this.

Oops, sorry, messed that up during rebase.

src/extractors/taglibextractor.cpp
297

You're right, I copied them from MP3s, but they make no sense for APE and OGG tags

astippich updated this revision to Diff 30955.Mar 30 2018, 6:19 PM
astippich marked an inline comment as done.
  • fix formatting issues during rebase
astippich updated this revision to Diff 30956.Mar 30 2018, 6:25 PM
  • remove duplicate genre property for mp3
michaelh accepted this revision.Mar 30 2018, 7:32 PM

Thank you.

This revision is now accepted and ready to land.Mar 30 2018, 7:32 PM

@mgallien this should be safe to land now since KF 5.45 has been tagged, right?

astippich edited the summary of this revision. (Show Details)Apr 7 2018, 8:13 PM
bruns added a subscriber: bruns.Apr 7 2018, 11:00 PM
bruns added inline comments.
autotests/taglibextractortest.cpp
80

Maybe just add a Prefix or Suffix for all QStrings?
Same applies to any Int values, you can never tell if the right property has been fetched.

ping. Can I push? I'm asking again because I don't want to mess up anything before the frameworks release, and this one contains string changes

ping. Can I push? I'm asking again because I don't want to mess up anything before the frameworks release, and this one contains string changes

I am not sure. I tried asking on #kde-devel.

@dfaure should we wait after the final v5.45 tag to push this diff ?

No. As soon as the RC1 tags are there, the message freeze is lifted.

In case of a bugfix that comes in after RC1, I cherry-pick it into the release branch, so that master doesn't have to stay frozen for a week.

This revision was automatically updated to reflect the committed changes.