flatpak: starting kube twice is not supported.
Open, NormalPublic

Description

I can fairly reliably reproduce a crash when trying to start two instances of kube in flatpak.
Does not seem to be reproducible with attaching gdb.

SIGABRT received
1 0x405338 kube() [0x405338]
2 0x3153a330b0 /lib/libc.so.6() [0x3153a330b0]
3 0x3153a3304f gsignal + 207
4 0x3153a3447a abort + 362
5 0x4053b8 kube() [0x4053b8]
6 0x3154a8d646 /lib/libstdc++.so.6() [0x3154a8d646]
7 0x3154a8d691 /lib/libstdc++.so.6() [0x3154a8d691]
8 0x3154a8d8a8 /lib/libstdc++.so.6() [0x3154a8d8a8]
9 0x3154ab60f2 /lib/libstdc++.so.6() [0x3154ab60f2]
10 0x7fc8535090f2 std::function<void (Sink::Storage::DataStore::Error const&)>::operator()(Sink::Storage::DataStore::Error const&) const + 50
11 0x7fc853509346 Sink::Storage::DataStore::Transaction::Private::startTransaction() + 230
12 0x7fc853505723 Sink::Storage::DataStore::Transaction::Transaction(Sink::Storage::DataStore::Transaction::Private*) + 35
13 0x7fc85350b517 Sink::Storage::DataStore::Private::initEnvironment(QString const&, Sink::Storage::DbLayout const&) + 2295
14 0x7fc8535073dc Sink::Storage::DataStore::Private::Private(QString const&, QString const&, Sink::Storage::DataStore::AccessMode, Sink::Storage::DbLayout const&) + 876
15 0x7fc853507533 Sink::Storage::DataStore::DataStore(QString const&, QString const&, Sink::Storage::DataStore::AccessMode) + 147
16 0x7fc85317ec01 /app/lib/libsink.so.0.6(+0x137c01) [0x7fc85317ec01]
17 0x7fc85317eaf2 /app/lib/libsink.so.0.6(+0x137af2) [0x7fc85317eaf2]
18 0x7fc85317e914 /app/lib/libsink.so.0.6(+0x137914) [0x7fc85317e914]
19 0x7fc8531b74fa /app/lib/libsink.so.0.6(+0x1704fa) [0x7fc8531b74fa]
20 0x7fc8531b71ec /app/lib/libsink.so.0.6(+0x1701ec) [0x7fc8531b71ec]
21 0x7fc8531b5fcb /app/lib/libsink.so.0.6(+0x16efcb) [0x7fc8531b5fcb]
22 0x7fc8531b6478 /app/lib/libsink.so.0.6(+0x16f478) [0x7fc8531b6478]
23 0x7fc8531b5aaa /app/lib/libsink.so.0.6(+0x16eaaa) [0x7fc8531b5aaa]
24 0x7fc8531a743c /app/lib/libsink.so.0.6(+0x16043c) [0x7fc8531a743c]
25 0x7fc8531b207d /app/lib/libsink.so.0.6(+0x16b07d) [0x7fc8531b207d]
26 0x7fc8531b0fbe /app/lib/libsink.so.0.6(+0x169fbe) [0x7fc8531b0fbe]
27 0x7fc8531b08e4 /app/lib/libsink.so.0.6(+0x1698e4) [0x7fc8531b08e4]
28 0x7fc8531b0614 /app/lib/libsink.so.0.6(+0x169614) [0x7fc8531b0614]
29 0x7fc8531bd2da /app/lib/libsink.so.0.6(+0x1762da) [0x7fc8531bd2da]
30 0x7fc8531bcfdc /app/lib/libsink.so.0.6(+0x175fdc) [0x7fc8531bcfdc]
31 0x7fc8531bb492 /app/lib/libsink.so.0.6(+0x174492) [0x7fc8531bb492]
32 0x7fc8531bbcb8 /app/lib/libsink.so.0.6(+0x174cb8) [0x7fc8531bbcb8]
33 0x7fc8531baf7a /app/lib/libsink.so.0.6(+0x173f7a) [0x7fc8531baf7a]
34 0x7fc8531becdc /app/lib/libsink.so.0.6(+0x177cdc) [0x7fc8531becdc]
35 0x4086e5 kube() [0x4086e5]
36 0x406d68 kube() [0x406d68]
37 0x405dff kube() [0x405dff]
38 0x3153a20291 __libc_start_main + 241
39 0x404e7a _start + 42

cmollekopf triaged this task as Normal priority.

A more useful variant of the backtrace:

[I] ⋊> ~/flatpak on master  flatpak run org.kolab.kube
Log:     kube.store                : Upgrading...
Warning: kube.storage_lmdb         : Failed to to open transaction:  "Resource temporarily unavailable" true 0x636e657265666552
Warning: kube.storage_common       : Database error in  "{c1baf4b8-19d2-483d-9299-b248f1040d2f}" , code  0 , message:  "Error while opening transaction: Resource temporarily unavailable"
SIGSEV received
1             0x405378 kube() [0x405378]
2         0x3153a330b0 /lib/libc.so.6() [0x3153a330b0]
3       0x7f7700288f04 mdb_txn_begin + 20
4       0x7f77135f6457 Sink::Storage::DataStore::NamedDatabase::Private::openDatabase(bool, std::function<void (Sink::Storage::DataStore::Error const&)>) + 743
5       0x7f77135f171d Sink::Storage::DataStore::Transaction::openDatabase(QByteArray const&, std::function<void (Sink::Storage::DataStore::Error const&)> const&, bool) const + 493
6       0x7f77134f0fb6 Sink::Storage::DataStore::databaseVersion(Sink::Storage::DataStore::Transaction const&) + 134
7       0x7f77131c3f1f /app/lib/libsink.so.0.7(+0x849f1f) [0x7f77131c3f1f]
8       0x7f77131c3d82 /app/lib/libsink.so.0.7(+0x849d82) [0x7f77131c3d82]
9       0x7f77131c3ba4 /app/lib/libsink.so.0.7(+0x849ba4) [0x7f77131c3ba4]
10      0x7f7713be1b2a std::function<KAsync::Job<void> (QSharedPointer<Sink::ApplicationDomain::SinkResource>)>::operator()(QSharedPointer<Sink::ApplicationDomain::SinkResource>) const + 106
11      0x7f7713be1922 /app/lib/libkubeframework.so(+0x118922) [0x7f7713be1922]
12      0x7f7713be08f4 /app/lib/libkubeframework.so(+0x1178f4) [0x7f7713be08f4]
13      0x7f7713be0cae KAsync::Private::Executor<QSharedPointer<Sink::ApplicationDomain::SinkResource>, void, QSharedPointer<Sink::ApplicationDomain::SinkResource> >::runExecution(KAsync::Future<QSharedPointer<Sink::ApplicationDomain::SinkResource> > const*, QSharedPointer<KAsync::Private::Execution> const&, bool) + 318
14      0x7f7713be0540 KAsync::Private::Executor<QSharedPointer<Sink::ApplicationDomain::SinkResource>, void, QSharedPointer<Sink::ApplicationDomain::SinkResource> >::exec(QSharedPointer<KAsync::Private::ExecutorBase> const&, QSharedPointer<KAsync::Private::ExecutionContext>) + 816
15      0x7f7713b77bf5 KAsync::Private::Executor<void, void>::exec(QSharedPointer<KAsync::Private::ExecutorBase> const&, QSharedPointer<KAsync::Private::ExecutionContext>) + 261
16      0x7f7713bdd4a8 KAsync::Job<void, QSharedPointer<Sink::ApplicationDomain::SinkResource> >::exec() + 104
17      0x7f7713bdc897 KAsync::Future<void> KAsync::Job<void, QSharedPointer<Sink::ApplicationDomain::SinkResource> >::exec<QSharedPointer<Sink::ApplicationDomain::SinkResource> >(QSharedPointer<Sink::ApplicationDomain::SinkResource>) + 407
18      0x7f7713bdc452 /app/lib/libkubeframework.so(+0x113452) [0x7f7713bdc452]
19      0x7f7713bdc1b4 /app/lib/libkubeframework.so(+0x1131b4) [0x7f7713bdc1b4]
20      0x7f7713b87cda std::function<KAsync::Job<void> (QList<QSharedPointer<Sink::ApplicationDomain::SinkResource> >)>::operator()(QList<QSharedPointer<Sink::ApplicationDomain::SinkResource> >) const + 106
21      0x7f7713b87ad2 /app/lib/libkubeframework.so(+0xbead2) [0x7f7713b87ad2]
22      0x7f7713b864d4 /app/lib/libkubeframework.so(+0xbd4d4) [0x7f7713b864d4]
23      0x7f7713b868ae KAsync::Private::Executor<QList<QSharedPointer<Sink::ApplicationDomain::SinkResource> >, void, QList<QSharedPointer<Sink::ApplicationDomain::SinkResource> > >::runExecution(KAsync::Future<QList<QSharedPointer<Sink::ApplicationDomain::SinkResource> > > const*, QSharedPointer<KAsync::Private::Execution> const&, bool) + 318
24      0x7f7713b86120 KAsync::Private::Executor<QList<QSharedPointer<Sink::ApplicationDomain::SinkResource> >, void, QList<QSharedPointer<Sink::ApplicationDomain::SinkResource> > >::exec(QSharedPointer<KAsync::Private::ExecutorBase> const&, QSharedPointer<KAsync::Private::ExecutionContext>) + 816
25      0x7f771320fdac KAsync::Private::Executor<void, Sink::Store::UpgradeResult>::exec(QSharedPointer<KAsync::Private::ExecutorBase> const&, QSharedPointer<KAsync::Private::ExecutionContext>) + 268
26            0x408865 kube() [0x408865]
27            0x406ea8 kube() [0x406ea8]
28            0x405e3f kube() [0x405e3f]
29        0x3153a20291 __libc_start_main + 241
30            0x404eba _start + 42
Sleeping for 10s to attach a debugger: gdb attach 5
OK closing connection

This seems to be the EAGAIN error code that is returned from some operation that lmdb does from mdb_txn_begin

  • mdb_reader_pid (via mdb_txn_renew0) is doing some fcntl calls.
  • sem_wait is being called (via LOCK_MUTEX -> LOCK_MUTEX0). EAGAIN should only returned from sem_try_wait, so this shouldn't actually be a source.

Looks like it's coming from mdb_reader_pid's fcntl call (we're trying to set the reader lock):

F_SETLK (struct flock *)
             Acquire a lock (when l_type is F_RDLCK or F_WRLCK) or release a lock (when l_type is F_UNLCK) on the bytes specified by the l_whence, l_start, and l_len fields of lock.  If a conflicting lock is held  by  another  process,  this  call
             returns -1 and sets errno to EACCES or EAGAIN.  (The error returned in this case differs across implementations, so POSIX requires a portable application to check for both errors.)

This might be a flatpak incompatibilty.

	for (;;) {
		int rc;
		struct flock lock_info;
		memset(&lock_info, 0, sizeof(lock_info));
		lock_info.l_type = F_WRLCK;
		lock_info.l_whence = SEEK_SET;
		lock_info.l_start = pid;
		lock_info.l_len = 1;
		if ((rc = fcntl(env->me_lfd, op, &lock_info)) == 0) {
			if (op == F_GETLK && lock_info.l_type != F_UNLCK)
				rc = -1;
		} else if ((rc = ErrCode()) == EINTR) {
			continue;
		}
		return rc;
	}

The code assumes that no two processes can ever have the same pid (it sets a lock at the offset of $PID). I suppose we break that assumption when starting a flatpak twice because files are shared, but pid's are not.

cmollekopf renamed this task from Crash when starting kube twice. to flatpak: Crash when starting kube twice..Jul 6 2018, 7:55 AM

So; the lmdb assumption that pid's are unique simply no longer holds with pid namespaces. I suppose this could be seen as an lmdb bug.
I'm also not sure how well the whole separate process for synchronizers works if we were able to start multiple flatpaks. The sockets would have to be shared, and then each synchronizer would just live on in the sanbox of the kube process that started it...

In any case, the simplest solution to the problem is probably to enforce that no two flatpaks can be started at the same time, thus sidestepping all those issues.

cmollekopf renamed this task from flatpak: Crash when starting kube twice. to flatpak: starting kube twice is not supported..Jul 27 2018, 8:45 AM

At the moment we avoid starting kube twice by using a lockfile (in flatpak only).

In the long run it probably makes sense to:

Dbus is appropriate because that is the infrastructure that flatpak provides and is probably easiest to implement (custom socket communication would probably also work, but is probably more work).

restarting the flatpak can still result in:

Warning: kolabnowSmtp.storage_common : Database error: "Error while opening transaction: Resource temporarily unavailable" Code:  0 Db:  "kolabnowSmtp.changereplay"
SIGSEV received
1             0x406c88 crashHandler(int) + 184
2         0x3153a330b0 /lib/libc.so.6() [0x3153a330b0]
3       0x7fbad7f4df04 mdb_txn_begin + 20
4       0x7fbad9c31fc1 Sink::Storage::DataStore::NamedDatabase::Private::openDatabase(bool, std::function<void (Sink::Storage::DataStore::Error const&)>) + 1601
5       0x7fbad9c2d162 Sink::Storage::DataStore::Transaction::openDatabase(QByteArray const&, std::function<void (Sink::Storage::DataStore::Error const&)> const&, int) const + 482
6       0x7fbad9bb9438 Sink::ChangeReplay::getLastReplayedRevision() + 168
7       0x7fbad9c1939b Sink::CommandProcessor::setSynchronizer(QSharedPointer<Sink::Synchronizer> const&) + 347
8       0x7fbad9a7e99b Sink::GenericResource::setupSynchronizer(QSharedPointer<Sink::Synchronizer> const&) + 155
9       0x7fbaccf495af MailtransportResource::MailtransportResource(Sink::ResourceContext const&) + 975
10      0x7fbaccf49c69 MailtransportResourceFactory::createResource(Sink::ResourceContext const&) + 57
11      0x7fbad9b134a3 Listener::loadResource() + 243
12      0x7fbad9b13369 Listener::checkForUpgrade() + 25
13            0x409a2e main + 3662
14        0x3153a20291 __libc_start_main + 241

I can reproduce it by restarting flatpak kube (close window, open from console again immediately afterwards).

When running ps aux just before starting kube again I can see the synchronizer processes still around, however, the lockfile doesn't trigger, so it must be somehow after the synchronizer lockfile is released, but before the lmdb database lock is released.

I tried reproducing the problem with something like, but without success.

sudo kill -9 $(pidof sink_synchronizer); sudo unshare -fp env PATH=/install/bin/ QT_PLUGIN_PATH=/install/lib64/plugins/ sink_synchronizer kolabnowImap sink.imap

The location of the QLockfile should be shared accross flatpaks, so I'm not sure what the problem is.

lmdb uses a POSIX record lock (fcntl), QLockFile uses flock I think. (https://gavv.github.io/blog/file-locks/)

We now execute "kill $(pidof sink_synchronizer)" at the end of the wrapper script to hopefully avoid running multiple synchronizer instances in parallel.