Add support for detecting avahi crash - cleanup invalid dbus objects in that case
Needs ReviewPublic

Authored by jpalecek on Mar 4 2019, 2:52 PM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

This is a first take at tackling a problem caused by avahi going out of service -- usually its restart during an upgrade. In the current state, these things happen:

  • kdnssd holds dbus paths of avahi objects that no longer exist. Eventually, it calls methods on those paths, which might or might not exist or designate different objects
  • the client won't discover that published addresses/browsing addresses etc. don't work anymore. Ironically (this happens with kopete), the client will get some error message in the moment avahi comes alive again, not when it's going dead
  • KDNSSD objects used with this avahi instance stop working, and are unusable in the future

In this diff, mostly the first problem is tackled. I add a new singleton class, AvahiDisconnectListener, which will provide a signal when avahi goes away. Every KDNSSD object would listen for that signal and perform the necessary cleanup. This means, in this diff, mostly deleting dbus objects and clearing dbus addresses.

On top of that, the PublicService class is changed that it reports failure to publish with a published(false) signal, and a RemoteService reports failure with resolved(false). The latter should be able to restart when convienient with resolveAsync.

Other objects are simply left in a consistent, but nonworking state and can't be used anymore, because they lack the signals to signal error, and methods to restart their operations.

Looking forward, this patch could be extended by adding aforementioned error signals, or even, providing a much better user experience, automatically restarting normal operation when avahi comes on again. That is doable in the client, but is tedious and error prone, as it needs listening for DBus NameOwnerChanged, and then correctly wait because Avahi isn't ready for business right away. Also, you need to manage all the race conditions.

Test Plan

Launch kopete with an (online) Bonjour Account and run

WARNING: this stops avahi
# systemctl mask avahi-daemon.socket; systemctl stop avahi-daemon;

kopete should notice immediately that Bonjour isn't online anymore. Ideally, when you restart avahi with

# systemctl unmask avahi-daemon.socket; systemctl start avahi-daemon

kopete should make Bonjour online immediately, however that neds more work

Diff Detail

Repository
R272 KDNSSD
Branch
wip-merged
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 9155
Build 9173: arc lint + arc unit
jpalecek created this revision.Mar 4 2019, 2:52 PM
Restricted Application added a project: Frameworks. · View Herald TranscriptMar 4 2019, 2:52 PM
jpalecek requested review of this revision.Mar 4 2019, 2:52 PM
jpalecek edited the summary of this revision. (Show Details)Mar 4 2019, 2:54 PM
jpalecek edited the test plan for this revision. (Show Details)
jpalecek edited the test plan for this revision. (Show Details)Mar 4 2019, 2:55 PM
jpalecek added inline comments.Mar 4 2019, 3:01 PM
src/avahi-publicservice_p.h
42

Initialize the QObject with the parent object to ensure deleting the Private object. In this case, the PublicService object simply doesn't do it in its destructor AFAIK

src/avahi_listener.cpp
28

m_connection is needed to manually manage the connection, because AvahiListener is not a QObject. It is inherited by QObjects, but relying on that (and casting) would need that the QObject is initialized first which would be brittle. OTOH saving the connection handle seems hassle-free.

jpalecek edited the summary of this revision. (Show Details)Mar 4 2019, 3:02 PM
sitter added a comment.Mar 4 2019, 4:02 PM

I haven't had a proper look, so only style complaints for now... but...

Isn't the problem rather that the daemon shouldn't be randomly restarted on a live system? I.e. this ultimately is an integration issue on the distro-level.
I doubt many if any avahi clients handle this properly and disadvantageous side effects are rather expected. Same as pulseaudio for example.

So I am not sure if this is worth handling at all, but if we handle it we should do it properly and republish/rediscover services. The client shouldn't get broken frontend objects just because the backend of the backend restarted.

src/CMakeLists.txt
22

disconnect

src/avahi-disconn-handler.cpp
1

needs a license header

8

no need for QObject::

9

& go right of the space

use descriptive variable names

15

don't use trailing return type but simply use return type instead of auto

src/avahi-disconn-handler.h
1

needs a license header

8

we do not indent classes within a namespace

12

privates at the bottom of the class in a private section please

13

the signal section goes after the public section

src/avahi-domainbrowser.cpp
40

deleteLater may be better

the cleanup in this slot seems insufficient compared to the cleanup done for ServiceBrowser?

src/avahi-remoteservice.cpp
37

no need for QObject::

goes for all? other classes as well

40

space after if

src/avahi-servicebrowser.cpp
42

no space after instance

also I think you need to bind the connection to this seeing as you are capturing this

42

no need for QObject::

src/avahi-servicebrowser_p.h
38

put this in an actual private section at the bottom please

src/avahi-servicetypebrowser.cpp
37

no need for QObject::

39

deleteLater may be better seeing as this is a slot and deletes in slots are dangerous

40

nullptr

src/avahi_listener.cpp
28

use a member initializer list here

32

The other classes ctors look super repetitive and could maybe just be moved in here. couldn't you simply make a pure virtual reset() in here that is implemented in all avahi listeners and connect to that? Specifically since the expectation is that every dbus based object switches state when avahi disappears/appears the best way to express that in a way that one doesn't forget about it moving forward would be making the compiler throw a tantrum on a not implemented reset functionality.

src/avahi_listener_p.h
26

sort alphabetically

37

not very descriptive name

privates at the bottom of the class in a private section please

bruns added a subscriber: bruns.Mar 4 2019, 8:37 PM

I haven't had a proper look, so only style complaints for now... but...

Isn't the problem rather that the daemon shouldn't be randomly restarted on a live system? I.e. this ultimately is an integration issue on the distro-level.
I doubt many if any avahi clients handle this properly and disadvantageous side effects are rather expected. Same as pulseaudio for example.

As Avahi is mostly stateless, I see no reason why it shouldn't be restarted after e.g. an update.

So I am not sure if this is worth handling at all, but if we handle it we should do it properly and republish/rediscover services. The client shouldn't get broken frontend objects just because the backend of the backend restarted.

+1 - especially, as the client side is almost guaranteed to be implemented.

bruns added a comment.Mar 4 2019, 8:39 PM

I think some repetitive code can be avoided when the T* inside the FooPrivate classes are replaced by e.g. QScopedPointer<T> or std::unique_ptr<T>.

sitter removed a reviewer: sitter.Aug 6 2019, 10:15 AM