KDevelop/Shell: set dedicated TMPDIR
Needs RevisionPublic

Authored by rjvbb on Dec 1 2018, 7:56 PM.

Details

Reviewers
kfunk
mwolff
Group Reviewers
KDevelop
Summary

This is a simple patch implementing a dedicated KDevelop temporary directory (on Unix). With this, every session has its own directory for temporary files under the default QSP::TempLocation directory, set via the TMPDIR env. variable. The clang parser's preamble* files go here, for instance, but also any files the session creates under QSP::TempLocation itself.

The entire directory is removed during a clean exit, and also at startup. This takes care of preventing the accumulation of (large) old temporary files I tend to get on Mac.

Test Plan

Tested on Mac and Linux. Processes launched by KDevelop will also see the new directory as their TempLocation but I haven't noticed any unwanted side-effects in my workflow. Processes spawned by the session and outliving it may run into issues but how common are those? (The only such processes I create are other KDevelop sessions which are evidently not concerned.)

Setting TEMPDIR (or TEMP) instead of TMPDIR may have the same on MS Windows(?)

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 7538
Build 7556: arc lint + arc unit
rjvbb requested review of this revision.Dec 1 2018, 7:56 PM
rjvbb created this revision.
kfunk requested changes to this revision.Dec 2 2018, 9:16 AM
kfunk added a subscriber: kfunk.

Not usable as-is.

  • You're changing behavior for all of KDevelop, while you only want to fix the Clang backend's behavior. This can lead to subtle side effects. I'm not sure we want this. Not sure how to implement this properly the way we use libclang right now either, without further thought.
  • Why not using QTemporaryDir in the first place? Also see patch in QtCreator (https://codereview.qt-project.org/#/c/142566). (Yes, I know QtC usesd libclang out-of-process, we don't. Not relevant here. The interesting bits are the env vars and the QTemporaryDir usage.)

This needs more thought.

This revision now requires changes to proceed.Dec 2 2018, 9:16 AM
kfunk added a comment.Dec 2 2018, 9:23 AM

Forgot: Also needs documentation *in* source code why were are doing this. Noone can figure out that the temp dir is changed due to issues with the Clang backend here...

rjvbb added a comment.Dec 2 2018, 9:26 PM

Actually, issues with clang temp files is why I started to think about this kind of change but ultimately I realised it didn't seem such a bad idea at all to put all KDevelop-related temporary files in a dedicated location. I often clean out a bunch of KDevelop's own temp files that were left behind, e.g. after a crash. I just didn't mention them because they're negligible in size (and I purge before their numbers really start to grow).

I'm not sure what QTemporaryDir would improve here; I use a deterministic name for a reason here. The relevant Qt Creator code I've seen also sets TMPDIR. Probably because it gets picked up by QSP, and most non-Qt applications also seem to recognise TMPDIR as the variable that defines the (user) location for storing temporary files.

I'm curious what kind of subtle side effects others here can think of. The only one I could think of is what could happen with processes that outlive their parent KDevelop process. For the rest, every application spawned by KDevelop should see the same TMPDIR location and I'm guessing (hoping) that none expect to exchange files with other processes under the assumption that everyone uses the same temp. directory.
Files in TMPDIR are usually private, or else their full path is handed to external apps that need to use them (like sockets), no?

Note that there is a design oversight in the initial version: if the QSP::TempLocation ends with /kdevelop-temp-session-XXXX that substring should be clipped to avoid recursion when launching another session through KDevelop.

kfunk added a comment.Dec 3 2018, 6:01 AM

Okay, trying to keep this short.

I'm not sure what QTemporaryDir would improve here;

It would implement this feature with around 3 lines of code instead of 18, because it is /exactly/ suited for this use-case.

I use a deterministic name for a reason here.

You can instruct QTemporaryDir to use hard-coded names; just do not use "XXXXXX" in the template path passed to it. Less code is preferrable.

The relevant Qt Creator code I've seen also sets TMPDIR.

That's *after* creating the directory. You'd still have to do it here as well.

pino added a subscriber: pino.Dec 3 2018, 6:10 AM

Also, considering that "session" in Core::initialize is user-specified and defaults to an empty string, please do not use a 100% deterministic name, otherwise two different users will have a temporary directory conflict when launching kdevelop. Please use the "XXXXXX" variable part.

rjvbb added a comment.Dec 3 2018, 10:21 AM
Also, considering that "session" in `Core::initialize` is user-specified and defaults to an empty string, please do not use a 100% deterministic name,

From what I can see the function is only called from main() and the session variable will not ever be an empty string in practice. I just tested with a newly created session: even there a new UUID was created before calling Core::initialize().

I can still add the user ID to the dirname just in case session is ever empty after all. That's actually useful if TMPDIR isn't set to start with and the directory is created in /tmp .

I just tested QTemporaryDir. It only relieves me from handling the directory creation and deletion because I still have to create the absolute path (and thus prune it): giving just the directory name (instead of a full path) means the directory will be created in the current directory.

So I end up with a slightly cleaner snippet

#if defined(Q_OS_UNIX)
    auto tmpPath = QStandardPaths::writableLocation(QStandardPaths::TempLocation);
    const QString tmpName(QStringLiteral("/kdevelop-tmp-%1-").arg(getuid()));
    const auto pos = tmpPath.lastIndexOf(tmpName);
    if (pos >= 0) {
        // we must have been started by another KDevelop session,
        // restore the default TempLocation
        tmpPath = tmpPath.left(pos);
    }
    tmpPath += tmpName + session;
    m_tmpLocation = new QTemporaryDir(tmpPath);
    if (m_tmpLocation->isValid()) {
        qputenv("TMPDIR", tmpPath.toLatin1());
        qCDebug(SHELL) << "temp. directory:" << m_tmpLocation->path();
    }
#endif

and still don't have a deterministic name because QTemporaryDir will always add something random to the name even if you don't include an "XXXXXX" pattern in the template. From the docs:

If the templatePath ends with XXXXXX it will be used as the dynamic portion of the directory name, otherwise it will be appended.

and this is obligatory on Unix as mkdtemp() is used underneath. So the snippet above is actually broken, the temp dir is not at tmpLocation.
Come to think of it, the directory is also not emptied on creation as I want it to be; the class guarantees it never reuses an existing directory.

rjvbb updated this revision to Diff 46779.Dec 3 2018, 11:01 AM

FIxed and streamlined version.

rjvbb set the repository for this revision to R32 KDevelop.Dec 3 2018, 11:02 AM
mwolff requested changes to this revision.Jan 12 2019, 12:20 PM
mwolff added a subscriber: mwolff.

-2 from my side - we change the behavior of all (sub)process launched by KDevelop and that's not something I want to do. you've seen it affect a sub-kdevelop process, but what if a user launches his own app through kdevelop? we shouldn't change the temp dir for these apps.

please let us instead fix the issue where it affects us. if it's about leaked clang dirs, then add API upstream that allows us to specify where these files get stored when using libclang. Though probably it's going to be enough for now to set TMPDIR temporarily in the background thread where we launch the parse job via libclang?

This revision now requires changes to proceed.Jan 12 2019, 12:20 PM
pino added a comment.Jan 12 2019, 12:22 PM

Though probably it's going to be enough for now to set TMPDIR temporarily in the background thread where we launch the parse job via libclang?

Note that environment variables are per-process, not per-thread.

rjvbb added a comment.Jan 12 2019, 1:40 PM
Note that environment variables are per-process, not per-thread.

The bg thread could cache the old value and restore it once libclang is launched. I'm not certain that's enough though. It would be if KDevelop used a clang parser client/server model like Creator does, but that's a whole different can of worms (apparently).

IMHO we shouldn't care too much about affecting the environment of applications launched from KDevelop. I think of it as "they run in the KDevelop universe" and if that includes a specific $TMPDIR value so be it. No appliction should let its behaviour depend on where $TMPDIR is, normally, nor should it assume that the value is exclusive. Any app with special TMPDIR requirements should set it itself, not unlike I'm doing here.

Possible middle ground: add an option to KDevelop that makes it apply the default environment profile at startup (= to itself), so that anyone can set TMPDIR in there. Or add a 2nd env. profile that's loaded at startup but I don't see a reason myself when you'd want that profile to be different.

Good point Pino - setting the env var, even temporarily, would induce a race - so we can't do it there.

one potential option would be to remember the original tempdir and then use that for all process environments and thus all qprocesses we launch. maybe this works, but you'd have to check that.

Actually, issues with clang temp files is why I started to think about this kind of change but ultimately I realised it didn't seem such a bad idea at all to put all KDevelop-related temporary files in a dedicated location. I often clean out a bunch of KDevelop's own temp files that were left behind, e.g. after a crash. I just didn't mention them because they're negligible in size (and I purge before their numbers really start to grow).

Temporary files being left behind after a crash is not unusual. This is why systemd has systemd-tmpfiles --clean.

Files in TMPDIR are usually private, or else their full path is handed to external apps that need to use them (like sockets), no?

Not necessarily: if you open a file from your browser, it's typically stored in /tmp and then opened with the associated application.

No appliction should let its behaviour depend on where $TMPDIR is, normally, nor should it assume that the value is exclusive. Any app with special TMPDIR requirements should set it itself, not unlike I'm doing here.

The way I see it, it's exactly the other way around. Just like any other environment variable, TMP_DIR is there to let a user control an application's behavior. So an application setting that variable for itself is a bit pointless.

I definitely see your point though, and it seems that libclang doesn't allow users to control where the files go except through TMP_DIR. It should be possible to reset the TMP_DIR for subprocesses started from KDevelop, at least exec supports that. So we could set TMP_DIR for ourselves (based on the user-provided TMP_DIR), but pass the original environment variable to all subprocesses.

rjvbb added a comment.Wed, Jan 23, 3:46 PM

Temporary files being left behind after a crash is not unusual. This is why systemd has systemd-tmpfiles --clean.

Not really relevant outside of the systemd realm, right? :)

Files in TMPDIR are usually private, or else their full path is handed to external apps that need to use them (like sockets), no?

Not necessarily: if you open a file from your browser, it's typically stored in /tmp and then opened with the associated application.

Which falls into the "their path is handed to external apps" category...

The way I see it, it's exactly the other way around. Just like any other environment variable, TMP_DIR is there to let a user control an application's behavior. So an application setting that variable for itself is a bit pointless.

Note that my patch still uses the TMPDIR value set by the user, as long as QSP::TempLocation does. Which should be the case everywhere.

So we could set TMP_DIR for ourselves (based on the user-provided TMP_DIR), but pass the original environment variable to all subprocesses.

Milan also suggested that, and I have no qualms with it except possibly for helper apps that are really specific to or used in a way specific to KDevelop. Those c/should use the KDevelop-specific tempdir, IMHO.
The question is, can we do this env. tweak in a central way? I can start digging but it would help if someone already knows a formal answer.
For instance, is KDevelop's environment profile feature applied to *every* subprocess that is created? If so we could prepend the original TMPDIR value to the profile's list of key/values (prepend so that the user can still set a value via the profile).

We'd also have to remember if TMPDIR was NOT set initially...

Temporary files being left behind after a crash is not unusual. This is why systemd has systemd-tmpfiles --clean.

Not really relevant outside of the systemd realm, right? :)

I was just trying to point out that there is basically no way around regularly cleaning up temporary directories. This has nothing to do with systemd. I wanted to add that this is the reason for facilities like systemd-tmpfiles. If you don't use systemd, for example because you're not on Linux, there are certainly other tools for doing the same thing.

So we could set TMP_DIR for ourselves (based on the user-provided TMP_DIR), but pass the original environment variable to all subprocesses.

Milan also suggested that, and I have no qualms with it except possibly for helper apps that are really specific to or used in a way specific to KDevelop. Those c/should use the KDevelop-specific tempdir, IMHO.
The question is, can we do this env. tweak in a central way? I can start digging but it would help if someone already knows a formal answer.
For instance, is KDevelop's environment profile feature applied to *every* subprocess that is created?

I can't talk about Qt, but from the POSIX point of view there are exec variants with and without explicit environment argument. Those without take the environment of the existing process. So if we don't want to copy our environment over to subprocesses, we need to always call one of the variants with explicit environment argument. However, as you noted, that would be hard to enforce. I was just trying to point out that there is a way to reset TMP_DIR for childs that doesn't introduce race conditions.

However, because of the first point, I'm actually not sure if this is worth the trouble. We shouldn't deliberately leave garbage in /tmp, but we don't need to guarantee that we clean it up either. That's pretty much the point of having /tmp in the first place, otherwise we could just put the files somewhere else.

rjvbb added a comment.Thu, Jan 24, 8:34 AM
If you don't use systemd, for example because you're not on Linux, there are certainly other tools for doing the same thing.

How does that systemd thing clean tmp dirs at runtime, IOW, how can it know it's safe to clean up a given file if the application that created it doesn't do something explicit to guarantee cleanup?

I'm not aware of anything like this for Mac (besides a reboot) or MSWin ...

a way to reset TMP_DIR for childs that doesn't introduce race conditions.

How could this be a race condition?

However, because of the first point, I'm actually not sure if this is worth the trouble. We shouldn't deliberately leave garbage in `/tmp`, but we don't

I'd be fine with that if libclang never left its big pch files behind even on a clean exit. The argument that disk space is cheap doesn't really fly here because users can make certain that tmp dirs which will hold performance-critical files end up in places with stricter restrictions, like a ram disk.
They could of course do what my patch does in a wrapper script ... but they could also just use vi and make for software development ;)

If you don't use systemd, for example because you're not on Linux, there are certainly other tools for doing the same thing.

How does that systemd thing clean tmp dirs at runtime, IOW, how can it know it's safe to clean up a given file if the application that created it doesn't do something explicit to guarantee cleanup?

Of course it can't technically know if it is safe to delete a file. The assumption is that if it hasn't been used in a long time, it's probably not needed anymore. There is a configurable age parameter, set to a few days/weeks usually. “The age of a file system entry is determined from its last modification timestamp (mtime), its last access timestamp (atime), and (except for directories) its last status change timestamp (ctime). Any of these three (or two) values will prevent cleanup if it is more recent than the current time minus the age field.”

There is also tmpwatch which allows restricting the cleanup to files that are no longer open. It seems that systemd doesn't offer this because it's too slow and shouldn't be needed.

I know that Windows also has a disk cleanup program. There is an article claiming that the only way to clean up temporary files on Mac is a reboot.

a way to reset TMP_DIR for childs that doesn't introduce race conditions.

How could this be a race condition?

I was referring to @mwolff's comment D17289#395163.

I'd be fine with that if libclang never left its big pch files behind even on a clean exit.

Well, maybe you can solve that instead? Perhaps we're using it wrong, or there is a bug in Clang.

rjvbb added a comment.Fri, Jan 25, 6:16 AM

I was referring to @mwolff's comment D17289#395163.

Without much thought about what he meant with them? ;)
Seriously, the only race condition I can think of is already handled in the code. When KDevelop launches a copy of itself the app-specific bits are detected in and removed from the input TMPDIR variable.

There is an article claiming that the only way to clean up temporary files on Mac is a reboot.

That's indeed the only way I know to get automatic cleanup.

Well, maybe you can solve that instead? Perhaps we're using it wrong, or there is a bug in Clang.

No, you're not using it wrong as far as I've been able to see. Whether or not the llvm people see this as a bug is another thing, and beyond leaving a vague bug report plus feature request to give backwards-compatible control over the tmp. dir used there isn't much I can do about it.

Meanwhile I don't know what to do with this ticket?

rjvbb updated this revision to Diff 50304.Fri, Jan 25, 11:12 PM

This version saves the original TMPDIR value, and adds a wrapper around QProcessEnvironment::systemEnvironment() that restores that original value. To keep changes to a minimum I've added that wrapper to the Core class itself.

The idea is to replace the QProcessEnvironment call with Core()->systemEnvironment() everywhere (where the return value is passed to a child process). Please let me know if I can go ahead and do that, or else if this is not the way to go.

rjvbb set the repository for this revision to R32 KDevelop.Fri, Jan 25, 11:13 PM

I've had a look at the Clang code. They actually go to some lengths to make sure all temporary files are deleted on a clean exit — they register all open temporary files and clean them up in an exit-time destructor. (See TemporaryFiles in lib/Frontend/PrecompiledPreamble.cpp.) However, this won't work in the case of a crash. They have some mechanisms for cleanup in the case of crashes, but I suspect that this works by setting a signal handler and is thus only available in the stand-alone executable. There are mechanisms to ensure cleanup in any event by removing an open file on POSIX compatibles (close will then automatically delete it), or FILE_FLAG_DELETE_ON_CLOSE on Windows. But that doesn't work for Clang, because they work with paths and don't pass file descriptors around.

By the way, does someone know why we are using CXTranslationUnit_CreatePreambleOnFirstParse? This seems overly wasteful to me — most of these preambles are not reused again. It was introduced by f14a8fc9ceb38a91c2d718552779da666d509bf1, but I fail to see how this relates to the problem the commit message describes. The documentation for CXTranslationUnit_PrecompiledPreamble says that “An implicit precompiled header is used as an optimization when a particular translation unit is likely to be reparsed many times when the sources aren't changing that often.” For CXTranslationUnit_CreatePreambleOnFirstParse it says “Used to indicate that the precompiled preamble should be created on the first parse. Otherwise it will be created on the first reparse. This trades runtime on the first parse (serializing the preamble takes time) for reduced runtime on the second parse (can now reuse the preamble).” I don't understand why we are doing this - we're creating preambles for all files in a project, although most of them are probably not even open in the editor at the time.

rjvbb added a comment.Sat, Jan 26, 4:18 PM

There are mechanisms to ensure cleanup in any event by removing an open file on POSIX compatibles (close will then

I've tried to unlink certain files after they were created but reverted that again when I started getting crashes in the parser. Not that reverting helped against that...

>  By the way, does someone know why we are using `CXTranslationUnit_CreatePreambleOnFirstParse`? This seems overly wasteful to me

I've wondered about this too, but tried to convince myself there must be a reason. I also wonder if this option explains why ObjC(++) files are so much more expensive to parse than C and C++ (processing time and required RAM, which is NOT freed when you close the file).
Sounds like /me is going to see what changes if you remove that option...

I've tried to unlink certain files after they were created but reverted that again when I started getting crashes in the parser. Not that reverting helped against that...

Yes, I believe that's not going to work. Clang doesn't keep the preamble files open, but expects them to stay.

Sounds like /me is going to see what changes if you remove that option...

Please try it out, I've already pushed a change for this: D18551. I've tested it briefly, and didn't notice any issues. It generated preambles only when I started editing files.

mwolff requested changes to this revision.Sun, Jan 27, 9:14 PM

we use CXTranslationUnit_CreatePreambleOnFirstParse to get code completion results fast. otherwise the first code completion request would create the preamble, which felt much worse

otherwise this patch has some issues, but it goes in the right direction. you'll have to audit our whole platform then for use of [QK]Process and ensure the original system env is used. I think these two greps should show you the places:

ack '[QK]Process \w' # full object instances
ack '[QK]Process::start' # static utilities

that's ~50ish code locations... Quite a lot :-/

kdevplatform/shell/core.cpp
134

imo this should also be handled on windows via TMP and TEMP, according to

https://superuser.com/questions/709953/temp-vs-tmp-in-environment-variables

136

all of this code shouldn't be required - the system env used to launch subprocesses should not use the changed tmpdir after all, so it cannot be recursive

150

it exists, you create the path in the line above

151

instead of this code, you should do something like

#if defined(Q_OS_UNIX)
m_originalTmpDir = qEnvironmentVariable("TMPDIR");
qputenv("TMPDIR", tmpLocation.toLatin1());
#elif defined(Q_OS_WIN)
m_originalTmpDir = qEnvironmentVariable("TEMP");
if (m_originalTmpDir.isEmpty()) {
    m_originalTmpDir = qEnvironmentVariable("TMP");
}
qputenv("TEMP", tmpLocation.toLatin1());
qputenv("TMP", tmpLocation.toLatin1());
#else
#error unhandled OS
#endif
341

it must exist, no? we should always create it after all

343

don't make this a ptr

482

this should essentially be

auto env = QProcessEnvironment::systemEnvironment();
env.insert(QStringLiteral("TMPDIR"), d->m_originalTmpDir);
483

don't check whether it's empty or not, always insert the original value. this will essentially remove the TMPDIR env var then and whatever system default is picked up then

485

insert will overwrite, so this line is not required

kdevplatform/shell/core.h
100

don't use out params, just return a QProcessEnvironment similar to how it's done in QProcessEnvironment::systemEnvironment()

kdevplatform/shell/core_p.h
79

no ptr

This revision now requires changes to proceed.Sun, Jan 27, 9:14 PM
we use CXTranslationUnit_CreatePreambleOnFirstParse to get code completion results fast. otherwise the first code completion request would create the preamble, which felt much worse

Shall we keep that discussion to D18551?

that's ~50ish code locations... Quite a lot :-/

Erm, yes, so many that it begs the question if "this is worth it"... Is it?

I only looked at places where the existing env. profile feature is used. There were already enough of those that I didn't feel like changing them all potentially for nothing, but I don't think there were this many.

Why would we not limit the change to those locations, or in other words, why is the selected env. profile not used for every process KDevelop spawns?

Would it be feasible to reduce the footprint of this overhaul by using a wrapper around writableLocation(QSP::TempLocation) instead? I'd have to check in how many locations libclang reads TMPDIR and so if it's feasible to change that variable temporarily around the parser's entrypoints into libclang.

(replying by email)

+#if defined(Q_OS_UNIX)
+ auto tmpLocation = QStandardPaths::writableLocation(QStandardPaths::TempLocation);

imo this should also be handled on windows via TMP and TEMP, according to

Aren't those used in the MSWin version of TempLocation? I can try to implement this for MSWin too but only if someone promises to test it (I can't).

BTW, cf. https://codereview.qt-project.org/#/c/239952

all of this code shouldn't be required - the system env used to launch subprocesses should not use the changed tmpdir after all, so it cannot be recursive

Evidently this can go once that restored system env. is used.

+ m_tmpDir->mkpath(tmpLocation);

it exists, you create the path in the line above

Except when mkpath failed because tmpLocation was messed up. KDevelop does NOT like that. That happened to me during development and I decided to leave the check in.

+ void systemEnvironment(QProcessEnvironment& env) const;

don't use out params, just return a QProcessEnvironment similar to how it's done in QProcessEnvironment::systemEnvironment()

This was out of a concern that certain spawned processes might be performance critical. I guess the overhead of the extra copy can be neglected then?