add Baloo DBus signals for moved or removed files
ClosedPublic

Authored by mgallien on Mar 3 2017, 7:11 AM.

Details

Summary

Add new DBus signals sent by Baloo indexer for removed or moved files.
I still need to extend existing automatic tests with checks that the signals are sent

Diff Detail

Repository
R293 Baloo
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
mgallien created this revision.Mar 3 2017, 7:11 AM
Restricted Application added a project: Frameworks. · View Herald TranscriptMar 3 2017, 7:11 AM
Restricted Application added a subscriber: Frameworks. · View Herald Transcript
apol added a subscriber: apol.Mar 3 2017, 10:59 AM
apol added inline comments.
src/file/metadatamover.cpp
71

make const

73

const auto &

99

That reserve is wrong, it should be 3.

How about this?
message.setArguments({QVariant(fileList), from, to});

150

This would be much more readable using the initializer syntax.

mgallien updated this revision to Diff 12116.Mar 3 2017, 12:09 PM

Fix issues reported by apol

Not sure the list iteration on quint64 needs a const auto &.

Will add automatic tests later and as soon as possible

mgallien marked 4 inline comments as done.Mar 3 2017, 12:15 PM
apol added a comment.Mar 3 2017, 12:46 PM

autotests would be very welcome!

src/engine/transaction.cpp
134

Return right away?

src/file/metadatamover.cpp
73

Uh you are right, no need for a const&, I assumed it were QString for some reason.

mgallien updated this revision to Diff 12134.Mar 3 2017, 4:22 PM

fix two new issues reported by apol

mgallien marked 2 inline comments as done.Mar 3 2017, 4:23 PM
apol added a comment.Mar 3 2017, 4:38 PM

:) good, thanks, can you look into the autotests?

mgallien updated this revision to Diff 12137.Mar 3 2017, 4:57 PM

add checks that the correct signal is sent during tests of MetadataMover

apol added a reviewer: vhanda.Mar 3 2017, 11:21 PM
vhanda edited edge metadata.Mar 3 2017, 11:37 PM

I'm not the maintainer of Baloo any more, so I don't want to give it a clear Yes / No.

This patch is going to be a big CPU hog. For files this will barely have an impact, but for folders of a large enough size, it's going to result in tons of dbus signals, and more importantly, tons of database lookups and full path constructions. It really will add up. In an earlier version of Baloo, we used to store the full file paths, which meant when a folder moved, every sub-file/folder's URL had to be updated. That was a huge CPU burner. This patch isn't that bad, but it's half way there.

I cannot see what advantage these signals add, apart from possibly making Baloo more introspect-able. If it's to build applications on top of Baloo, I would recommend against it. The kernel provides file system monitoring APIs which should be used instead.

In D4911#92473, @vhanda wrote:

I'm not the maintainer of Baloo any more, so I don't want to give it a clear Yes / No.

This patch is going to be a big CPU hog. For files this will barely have an impact, but for folders of a large enough size, it's going to result in tons of dbus signals, and more importantly, tons of database lookups and full path constructions. It really will add up. In an earlier version of Baloo, we used to store the full file paths, which meant when a folder moved, every sub-file/folder's URL had to be updated. That was a huge CPU burner. This patch isn't that bad, but it's half way there.

For any top folders, only one signal is sent with my patch. Not sure that would add significant CPU overhead.
If I understand you correctly you fear that fetching the names of the files under the directory will add significant CPU overhead. Do you have any numbers ?
I believe I could do some benchmarks to see how much CPU is used to get the list of removed paths and not only the Baloo internal ids. What would convince you ?
I believe one worst case would be somebody deleting a lot of folders selected in dolphin and each one having a very deep hierarchy. Each top folders would mean a DBus signal with a lot of content. Do you think I should benchmarks this worst case scenario ?
Currently for changed files, one property changes and two signals are sent.

I cannot see what advantage these signals add, apart from possibly making Baloo more introspect-able. If it's to build applications on top of Baloo, I would recommend against it. The kernel provides file system monitoring APIs which should be used instead.

This is the most important part of the question.
I think that software like Baloo should provide a way to have live refresh of queries for applications using it. Currently Baloo does not provide this.
This a real blocker in my opinion for an application. Sure, I could do polling but the user experience would not be ideal at all.

The other alternative is applications using Baloo also watch the file system. That would mean that the user of an application using Baloo would have double quantity of inotify watches. This would also mean that if there is a limit in their number, that would be reached more quickly. Those duplicated watches would exist only to know when the results of the Baloo queries have changed all done by each application using it.

I will probably do that since this way I am independent of any new patches for Baloo and release for me will be easier (no need to bump a dependency, ...).

Just my 2 cents from the sideline:

  1. baloo is unmaintained and the bugs pile up, just check bugs.kde.org for that, not sure if adding yet-an-other feature to it is a good idea
  2. as vhanda said, perhaps better use other API for that

I tried to replace baloo with tracker, see https://cgit.kde.org/clones/baloo/cullmann/tbaloo.git/ but failed out of time (searching works, but not all API), that won't support such things anyways.

Just my 2 cents from the sideline:

  1. baloo is unmaintained and the bugs pile up, just check bugs.kde.org for that, not sure if adding yet-an-other feature to it is a good idea
  2. as vhanda said, perhaps better use other API for that

Those APIs are not free and takes some kernel resources.

I tried to replace baloo with tracker, see https://cgit.kde.org/clones/baloo/cullmann/tbaloo.git/ but failed out of time (searching works, but not all API), that won't support such things anyways.

I was hoping to support the same file indexer that Plasma is using hence my work on Baloo. On my roadmap, I plan to also support Tracker.
For Windows and Android, I have implemented an indexer with Qt + KFileMetaData APIs. I was just hoping that Baloo would provide live refresh of queries. I have to check if Tracker does it or not.

Just my 2 cents from the sideline:

  1. baloo is unmaintained and the bugs pile up, just check bugs.kde.org for that, not sure if adding yet-an-other feature to it is a good idea
  2. as vhanda said, perhaps better use other API for that

Those APIs are not free and takes some kernel resources.

I tried to replace baloo with tracker, see https://cgit.kde.org/clones/baloo/cullmann/tbaloo.git/ but failed out of time (searching works, but not all API), that won't support such things anyways.

I was hoping to support the same file indexer that Plasma is using hence my work on Baloo. On my roadmap, I plan to also support Tracker.
For Windows and Android, I have implemented an indexer with Qt + KFileMetaData APIs. I was just hoping that Baloo would provide live refresh of queries. I have to check if Tracker does it or not.

I have installed again Tracker and an application using it on my Debian unstable. With qdbusviewer, I see a DBus signal for signaling changes to its database. i have to check but I think this is exactly the feature I need to be able to really use Baloo. This way, an application know when to execute once more its query to get an updated result.

I will also modify my patch to implement only one signal fired without parameters (i.e. no list of modified content) when Baloo database is modified. Vishesh do you think this would be problematic performance wise ? This would allow my application to always get an updated list of audio files whenever files are added, removed or modified. This would avoid duplicating file system watches.

I will try using Tracker with your work compatibility layer. Did you get feedback from frameworks maintainer ?

I was hoping to support the same file indexer that Plasma is using hence my work on Baloo. On my roadmap, I plan to also support Tracker.
For Windows and Android, I have implemented an indexer with Qt + KFileMetaData APIs. I was just hoping that Baloo would provide live refresh of queries. I have to check if Tracker does it or not.

I have installed again Tracker and an application using it on my Debian unstable. With qdbusviewer, I see a DBus signal for signaling changes to its database. i have to check but I think this is exactly the feature I need to be able to really use Baloo. This way, an application know when to execute once more its query to get an updated result.

Than I would more go the "use tracker" route than implement this in baloo.

I will also modify my patch to implement only one signal fired without parameters (i.e. no list of modified content) when Baloo database is modified. Vishesh do you think this would be problematic performance wise ? This would allow my application to always get an updated list of audio files whenever files are added, removed or modified. This would avoid duplicating file system watches.

Hmm, baloo can modify the database the whole day around, do we really want signals on that?

I will try using Tracker with your work compatibility layer. Did you get feedback from frameworks maintainer ?

None, I play around with baloo-search, that seemed to work fine with my patch on my tracker DB, but given I really not use a lot of baloo features (as my normal experience is that of instant segfault or OOM), I can't comment on the usability.
(beside baloo is a large security hole, tracker at least partly fixed the biggest issues, remembering the "oh, tracker loads your exploit on the fly after downloading some hacked file" flames)

mgallien abandoned this revision.Mar 10 2017, 9:47 AM

I think this review is going nowhere. I prefer to cancel it instead of spending my energy on it.
I will limit Elisa usage of Baloo to getting an initial list of files hopefully faster than by looking at file system. All later events will go through other APIs.

I will limit Elisa usage of Baloo to getting an initial list of files hopefully faster than by looking at file system.

I would actually not even do that, as baloo isn't working that well e.g. on NFS and co. and that will limit the use of your application (which is cool btw.!).

mgallien added a comment.EditedMar 10 2017, 10:12 AM

I will limit Elisa usage of Baloo to getting an initial list of files hopefully faster than by looking at file system.

I would actually not even do that, as baloo isn't working that well e.g. on NFS and co. and that will limit the use of your application (which is cool btw.!).

This is one way to get tracks in parallel with other ways to get tracks. If Baloo is active I can only hope that the user as made actions to ensure that it is not just crashing., If Baloo is not active, I should check that my application still behave nominally.

I almost forgot to say thank you for your motivating comment about Elisa.

This is one way to get tracks in parallel with other ways to get tracks. If Baloo is active I can only hope that the user as made actions to ensure that it is not just crashing., If Baloo is not active, I should check that my application still behave nominally.

That sounds very reasonable!

Greetings
Christoph

mgallien reclaimed this revision.Mar 28 2017, 9:05 AM

Reopening since I am still convinced that getting signals from Baloo is a lot more sane than adding workaround in each users of Baloo.

My point is that for a music player like the one I am working on (Elisa), I have to do the following:

  • ask Baloo for an initial list of files (using Baloo APIs) ;
  • add file system watches for all those files (needed for removed files or directory and moved files or directories) ;
  • listen to some DBus signals from Baloo (needed to understand new files have been detected by Baloo in directories I am watching or not). May be used to detect changes to already known files ;
  • hope that my code is good enough to not miss any changes.

One of the worst thing in this schema is that Baloo is slower than file system watches. Changes will not be detected in the order they happen. Changes detected through Baloo will happen after changes detected by watches. User Experience may suffer from that (needs more test to see if this is real).

As I already said, if the cost of getting exact changes is too high, just a signal saying something changed is good enough if we get notified of *all* changes.

Could this be made more lightweight by adding a precise subscription method rather than signal broadcasts?

The interested application would register into baloo (using dbus or C++, not sure what's applicable here), saying "please do inform me about changes" or maybe even "please inform me about changes <in these dirs> or <for files of this mimetype>". And the MetadataMover code would only emit a dbus signal if 1) someone registered, 2) the signal passes the dir or mimetype filter.

This way, nothing changes performance-wise for people not using an app that needs these signals.

Could this be made more lightweight by adding a precise subscription method rather than signal broadcasts?

I will work on this solution.

The interested application would register into baloo (using dbus or C++, not sure what's applicable here), saying "please do inform me about changes" or maybe even "please inform me about changes <in these dirs> or <for files of this mimetype>". And the MetadataMover code would only emit a dbus signal if 1) someone registered, 2) the signal passes the dir or mimetype filter.

This way, nothing changes performance-wise for people not using an app that needs these signals.

Thanks David for your suggestion. This looks like a great idea and should really help me find a solution that works for applications.

mgallien updated this revision to Diff 14953.May 29 2017, 6:41 PM

add a new method to register applications wanting to watch moved files and
define (via xml) the interface to implement for applications

Not to throw cold water on your patch (it's nice to see someone working on Baloo!), but perhaps we should continue @cullmann's work to move over to Tracker instead. It's an active project, and we would benefit from all the work that the GNOME people are putting into it. I fear that Baloo is destined to suffer a slow death from bitrot unless it gains at least one full-time maintainer. If that happens, Elisa won't be able to rely on it anyway, and that code will need to be removed or ported over to use Tracker (or whatever else we use instead).

Unfortunately I never finished the https://cgit.kde.org/clones/baloo/cullmann/tbaloo.git/ port :/
Even more unfortunately there are close to zero useful baloo commits either since that time.

Hello @ngraham and @cullmann,
I lack time to properly maintain stuff I added to KFileMetaData. I do not think I may be able to put a lot of energy in Baloo.
At the same time, this is a solution that quite work for me and the usage I do in Elisa. I am just trying to come to a conclusion in this review such that I can move on.
I still plan to properly support tracker since it is used in a lot of default configuration of Linux distributions, just not now.

michaelh added a project: Baloo.

By the way, I take back everything I said up-thread about abandoning Baloo. We've got some new blood working on it and I feel confident about its future now! So it would be great to get some experienced eyeballs on this patch.

Considering that I'm more or less the only one working on baloo, I'd prefer to postpone this for a while instead of adding more complexity. At least until I have understood the inner workings of baloo.
Also the list of bugs reporting baloo's crashes is quite long. Fixing those has become my top priority.

@mgallien: Unless it really thwarts you I ask for your patience.

I would actually not even do that, as baloo isn't working that well e.g. on NFS and co. and that will limit the use of your application (which is cool btw.!).

@cullmann: T7860 lists some of my observations wrt samba shares. I don't know NFS, can you enlighten me regarding deviceids and inodes on NFS?

@cullmann: T7860 lists some of my observations wrt samba shares. I don't know NFS, can you enlighten me regarding deviceids and inodes on NFS?

I can only tell that baloo can't work at all on NFS/SMB home dirs with the current DB and that its idea of inode number usage to encode the file won't work there either (it already doesn't work on large file systems with 64-bit inodes).

I am very happy that somebody is working on fixing the current bugs, but I am still not sure it would not just be better to port away to some better maintained and more secure thing like tracker, but my https://cgit.kde.org/clones/baloo/cullmann/tbaloo.git/ port stagnated
as there was no interest.

@cullmann: T7860 lists some of my observations wrt samba shares. I don't know NFS, can you enlighten me regarding deviceids and inodes on NFS?

I can only tell that baloo can't work at all on NFS/SMB home dirs with the current DB and that its idea of inode number usage to encode the file won't work there either (it already doesn't work on large file systems with 64-bit inodes).

I am very happy that somebody is working on fixing the current bugs, but I am still not sure it would not just be better to port away to some better maintained and more secure thing like tracker, but my https://cgit.kde.org/clones/baloo/cullmann/tbaloo.git/ port stagnated
as there was no interest.

I did some test today on tbaloo and noticed one problem when fetching the results from a query. The paths are encoded like URLs but without the scheme. I had to modify my code to use it. Apart from that, nice work. It just works. Are you still interested to work on that ?

I did some test today on tbaloo and noticed one problem when fetching the results from a query. The paths are encoded like URLs but without the scheme. I had to modify my code to use it. Apart from that, nice work. It just works. Are you still interested to work on that ?

I would be willing to work more on that, if there is consensus that it should replace baloo and we use the synergy with tracker instead of doing all things tracker does just once again for the sake of it (beside running in all the same security problems).
But until now, that doesn't look like there is such a consensus.

dfaure accepted this revision.Apr 28 2018, 10:27 AM

My objections no longer apply, thanks for the redesign for more performance.

This revision is now accepted and ready to land.Apr 28 2018, 10:27 AM
This revision was automatically updated to reflect the committed changes.