[WIP] Restrict file extractor with Seccomp
Needs ReviewPublic

Authored by davidk on Oct 28 2017, 7:22 AM.

Details

Reviewers
apol
ossi
smithjd
bruns
Group Reviewers
Frameworks
Summary

Use Seccomp for implementing a sandbox for baloo_file_extractor

This change introduces a new optional dependency on libseccomp.
Libseccomp allows to forbid syscalls. With that we can constrain the
extractor process.

This is important because the extractors are parsing multimedia files with
libraries like FFmpeg. These libraries have sometimes bugs in their parsers,
allowing code execution. This can result in remote code execution
vulnerabilities, because webbrowsers like Google Chrome are downloading files
from websites automatically, and place them in your download folder. Then,
baloo will pick them up and parse them, which allows the attacker to trigger
bugs in e.g. FFmpeg. Such an attack has been demonstrated for Gnome Tracker[1].

If the attacker is able to execute code on your system, he/she might try to
mess with your home directory, e.g. encrypt it ("ransomware", very widespread
on Windows), or send your files to a webserver. This change tries to defeat
suchs attacks, by forbidding all syscall which could mess up your
files or send them to a server by using Seccomp. This is done using a blacklist
of systemcalls. Using a whitelist would be possible, but the we don't know what
syscalls third-party extractors might use, so I tried to make the impact on
them as small as possible. Extractors opening files in read-write mode will
fail and need to be fixed (I don't think they should do this). If a user uses an
extractor which gets restricted by Seccomp, he/she can set the
BALOO_DISABLE_SECCOMP environment variable to disable the sandboxing.
One can debug such issues by running "dmesg | grep baloo_file_extractor", as
Seccomp violations show up in dmesg (you may need to enable this with auditctl).

This would also be possible using AppArmor or SELinux (maybe even better),
but these aren't available on all distributions, while Seccomp is AFAIK
available on most distributions. Of course one could combine them, and use
e.g. Seccomp and AppArmor.

Unfortunately kidletime opens a X11 socket, so an attacker can control every
X11 application. Wayland fixes this.

The Seccomp filter gets only installed if the Seccomp dependency is
available.
Credits go to Martin Flöser. He did something similar for kscreenlocker,
and my work is mostly an adaption of his work.

This is work in process: I want to get some feedback on the general idea and on
these TODO items:

  • Maybe use a whitelist instead of a blacklist?
  • FindSeccomp.cmake is now in ecm, this one should be removed as soon as we can depend on it.
  • Check whether CI has seccomp installed.

FEATURE: 408570

Test Plan
  • Ran a full reindex with wiped config and database, including the kfilemetadata samplefiles
  • Got no Seccomp violations (checked with strace and dmesg)
  • Writing files from the extractor fails with EPERM
  • Seccomp Test is green

[1] https://scarybeastsecurity.blogspot.de/2016/11/0day-poc-risky-design-decisions-in.html?m=1

Diff Detail

Repository
R293 Baloo
Branch
seccomp
Lint
No Linters Available
Unit
No Unit Test Coverage
davidk created this revision.Oct 28 2017, 7:22 AM
Restricted Application added a project: Frameworks. · View Herald TranscriptOct 28 2017, 7:22 AM
Restricted Application added a subscriber: Frameworks. · View Herald Transcript
ossi added a comment.Dec 3 2017, 2:10 PM

you *really* should use a whitelist. it's ok if that breaks some 3rdparty extractor; you'll get a bug report which you can properly evaluate.
you could go totally overboard and assign fine-grained syscall capabilities to individual extractors, but i can't really think of legitimate reasons why that would be necessary in this context.

davidk added a comment.Jan 5 2018, 9:24 AM
In D8532#175079, @ossi wrote:

you *really* should use a whitelist. it's ok if that breaks some 3rdparty extractor; you'll get a bug report which you can properly evaluate.
you could go totally overboard and assign fine-grained syscall capabilities to individual extractors, but i can't really think of legitimate reasons why that would be necessary in this context.

It would be more secure, of course. But the downside is a higher maintenance cost, as one need to check whether the filter works for every QT version, because if a QT function starts using another syscall, baloo may get problems.
I'm not sure which way to go here.

I think we cannot use different (less strict) filters for different extractors, as a child process has at least the same restrictions as its parent process. Making filters for external extractors more strict would be possible, but i doubt it would be useful.

davidk updated this revision to Diff 26108.Jan 28 2018, 10:04 AM

Update TODO items.

davidk edited the summary of this revision. (Show Details)Jan 29 2018, 2:04 PM
davidk edited the test plan for this revision. (Show Details)

So, are there any more opinions on the whitelist vs. blacklist topic?
Personally I still prefer the blacklist as I fear regressions in the future, especially because baloo is unmaintained.

davidk edited the summary of this revision. (Show Details)Jan 29 2018, 3:52 PM
detlefe added a comment.EditedJan 31 2018, 2:15 PM

A whitelist, even if it is broad, would be desirable to reduce the attack surface of the kernel, and is also the way it has been done for Gnome Tracker. But the concerns about maintenance remain, it probably should be tested regularly. Are there ways this can be automated?

In case the decision goes in favor of the blacklist, would it be possible to add ptrace, process_vm_readv, process_vm_writev?

In any case thank you very much for working on this!

Sorry for the late reply and the slow process in general. Reallife keeps me busy...

A whitelist, even if it is broad, would be desirable to reduce the attack surface of the kernel, and is also the way it has been done for Gnome Tracker. But the concerns about maintenance remain, it probably should be tested regularly. Are there ways this can be automated?

If we want to test this, we would need a directory with files for each extractor (kfilemetadata includes such files for its autotests). Then, we should configure seccomp to kill the process if it calls a prohibited syscall. The test should then index all files in the directory. Unfortunately we can't test some things, e.g. the dbus integration and communication with baloo_file. This would need a test which starts the whole extractor as a child process. But i'm not sure if thats feasible. What do you think?

In case the decision goes in favor of the blacklist, would it be possible to add ptrace, process_vm_readv, process_vm_writev?

That's possible of course.

ossi added a comment.Feb 27 2018, 9:08 PM

But i'm not sure if thats feasible.

launching subprocesses is no biggie; e.g., the kprocess test does it.
if you're lucky, a few env variables will be sufficient. in the worst case, you'd need to chroot and do bind mounts or something like that, which is of course "a bit more challenging".

michaelh added a subscriber: Baloo.

I don't know Seccomp. But as far as I understood this, the same concers apply to the baloo_file_temp_extractor baloo-widgets is using. Naivly I suggest to implement this KFileMetadata because both executables are using it. I don't know if that is possible or reasonable though.

I don't know Seccomp. But as far as I understood this, the same concers apply to the baloo_file_temp_extractor baloo-widgets is using. Naivly I suggest to implement this KFileMetadata because both executables are using it. I don't know if that is possible or reasonable though.

I think Seccomp would be usefull in the baloo_file_temp_extractor too. What API would you suggest? I think on something like

KFileMetaData::setProcessReadOnly(KFileMetaData::SeccompAction action, QList<KFileMetaData::SeccompFilter> addionalWhitelist)

This would be a whitelist approach, so we would need to make sure we have good autotests. The autotests should be in KFileMetaData (fairly easy) and in baloo_file{,temp}_extractor. I would try to make an integration test, which starts up the baloo_file{,temp}_extractor and let's it index an directory containing test files. It would be good to use the KFileMetadata/autotests/samplefiles directory, but I'm not sure how to do this, without copying it. Do you have any ideas about this?

davidk added a comment.Jul 9 2018, 3:30 PM

I was asked in private about the current state of libseccomp integration and why there was no progress in a long time.
The current state is, that I have implemented seccomp support in kfilemetadata using this API:

bool setProcessReadOnly(uint32_t defaultAction, std::vector<SeccompFilter> addionalWhitelist)

But there are two blockers, related to external plugins:

  • External plugins based on interpreters like python/lua/perl etc. need a huge whitelist. This is problematic as I want to keep the list of allowed syscalls as small as possible (the list would be huge). Additionally, it would be difficult to get a list of all needed syscalls. Thus, we would break many external plugins.
  • Baloo is basically unmaintained. Thus, if something breaks, fixing it should be as easy as possible. But what if QT requires a new syscall and thus, the tests (and deployments) are failing? We need a way to know which syscall failed. This works for kfilemetadata plugins, but not for external plugins (because they are separate processes). The only way I can image, would be running the whole test with strace.

So, if anyone is willing to continue this work, I would be happy to share my current state.
Otherwise, if everyone agrees that we don't care about external plugins (users of external plugins can disable Seccomp support with an environment variable), I can finish the patches.

Restricted Application edited subscribers, added: kde-frameworks-devel; removed: Frameworks. · View Herald TranscriptJul 9 2018, 3:30 PM

I'm just an interested user and cannot comment on the question of external plugins. But before this enters a deep sleep, I wonder if at least the current patch should find its way into the extractors or into kfilemetadata.

A whitelist is the real thing, and it would be awesome if that could be made possible. But if I had the choice between nothing and the blacklist, I would pick... the blacklist.

+1 for something rather than nothing. No comment on the technical aspect, but I'm adding more reviewers who can hopefully help un-wedge this patch.

mgallien added a subscriber: mgallien.EditedSep 23 2018, 9:14 PM

I was asked in private about the current state of libseccomp integration and why there was no progress in a long time.
The current state is, that I have implemented seccomp support in kfilemetadata using this API:

bool setProcessReadOnly(uint32_t defaultAction, std::vector<SeccompFilter> addionalWhitelist)

But there are two blockers, related to external plugins:

  • External plugins based on interpreters like python/lua/perl etc. need a huge whitelist. This is problematic as I want to keep the list of allowed syscalls as small as possible (the list would be huge). Additionally, it would be difficult to get a list of all needed syscalls. Thus, we would break many external plugins.
  • Baloo is basically unmaintained. Thus, if something breaks, fixing it should be as easy as possible. But what if QT requires a new syscall and thus, the tests (and deployments) are failing? We need a way to know which syscall failed. This works for kfilemetadata plugins, but not for external plugins (because they are separate processes). The only way I can image, would be running the whole test with strace.

    So, if anyone is willing to continue this work, I would be happy to share my current state. Otherwise, if everyone agrees that we don't care about external plugins (users of external plugins can disable Seccomp support with an environment variable), I can finish the patches.

Sorry for my late reply
The external extractors tests of KFileMetaData have always failed on Windows and nobody ever fixed them. This makes me think that they are not really maintained.
Is there any use of them apart from the unit tests ?

fvogt added a subscriber: fvogt.Oct 4 2018, 3:16 PM

AFAICT this won't actually protect much - the open DBus socket is enough to execute arbitrary programs.

The best design would be (IMO, not sure how well the current architecture fits) to have a fully sandboxed executable which can only communicate with baloo over a single socket.
Over that socket it receives a (read-only) file descriptor for the to be dissected file and then sends the result to baloo.

In D8532#336584, @fvogt wrote:

AFAICT this won't actually protect much - the open DBus socket is enough to execute arbitrary programs.

The best design would be (IMO, not sure how well the current architecture fits) to have a fully sandboxed executable which can only communicate with baloo over a single socket.
Over that socket it receives a (read-only) file descriptor for the to be dissected file and then sends the result to baloo.

I love the proposal(!), but just in case a full sandbox is wanted here and large changes to (unmaintained) baloo are out of scope, maybe a combination of Apparmor and seccomp could offer a way out. Note that this was also proposed by Martin Flöser for the original kscreenlocker seccomp sandbox, but as far as I know never implemented.

Apparmor has the advantage that access to DBus can be controlled in a very fine grained way, that writing a profile should be relatively straightforward and that users can debug (or adapt) the profile easily. The main disadvantage is that it is not available everywhere, as pointed out already by @davidk in the summary. Regarding this latter point, however, it is maybe worth pointing out that also a proper sandbox is not necessarily available everywhere. The question here is how would you do it? You need root to set it up, and there are in the moment two ways to get there: Unprivileged user namespaces, which are not available in all distributions (Arch, Debian, ...), or SUID root, which is risky and hard to get it right. Bubblewrap or something like that would introduce an additional dependency.

One advantage of seccomp is that it works basically everywhere. Also I tend to disagree on the suggestion that it is useless. Whitelists can have properties of a real sandbox, but usually the role of seccomp is to reduce attack surface... this is why it exists, what seccomp was designed for, and why IMHO it makes a lot of sense for Baloo even if there is not a perfect solution/if Apparmor is unavailable.

Dropping one more comment, in case someone wants to give it a try: Apparmor profile transitions don't work if a seccomp filter has been installed before. This makes it probably rather difficult to integrate DrKonqi into an Apparmor policy.

Ping! What's going on with this?

ngraham edited the summary of this revision. (Show Details)Jun 11 2019, 2:59 PM
bruns added a comment.Jun 11 2019, 4:55 PM

I totally agree with fvogt here - the extractors should just receive a readonly file descriptor.

For this, there are several steps required:

  1. let the extractors work with file descriptors (KFileMetaData)
  2. make sure the extractor plugins are fully initialized before receiving file descriptors
  3. actually feed file descriptors to the extractor

(1.) is trivial for some extractors (e.g. taglib), for others it may be hard.
(2.) depends on several things - the plugins must be instantiated early (which clashes with the lazy loading), and the plugin may not load any external resources later on.

Using file descriptors has another benefit - currently, the file is stat'ed and so on, and then the corresponding path is fed to the extractor. It would be much better to open the file, use fstatat and friends, run the extractor and close the file again.

fvogt added a comment.Jun 11 2019, 5:04 PM
In D8532#478224, @bruns wrote:

I totally agree with fvogt here - the extractors should just receive a readonly file descriptor.

For this, there are several steps required:

  1. let the extractors work with file descriptors (KFileMetaData)
  2. make sure the extractor plugins are fully initialized before receiving file descriptors
  3. actually feed file descriptors to the extractor

    (1.) is trivial for some extractors (e.g. taglib), for others it may be hard. (2.) depends on several things - the plugins must be instantiated early (which clashes with the lazy loading), and the plugin may not load any external resources later on.

    Using file descriptors has another benefit - currently, the file is stat'ed and so on, and then the corresponding path is fed to the extractor. It would be much better to open the file, use fstatat and friends, run the extractor and close the file again.

What could also be done as an intermediate step is to whitelist opening read-only fds for metadata extractions. That way something like plugin loading is also covered and not many changes are required.
The sandbox could be opt-in for plugins which just specify that they support sandboxing using the specified whitelist, with plugins which don't support sandboxing disabled by default.
I used this approach in a (private so far) branch for sandboxing the thumbnail kio slave and it works well.

Hmm, looks like @davidk has not been active in almost a year, so I suspect someone else may need to commandeer this...