Kdevelop CMake plugin : use canonical paths to build.dir
Needs ReviewPublic

Authored by rjvbb on Sep 21 2017, 9:06 PM.

Details

Reviewers
apol
mwolff
Group Reviewers
KDevelop
Summary

The build system of projects like KDevelop itself depends internally on the use of relative paths to (auto-generated) headerfiles. The build and thus also the parsing of such projects can break because of this if there are symlinks on the build.dir path and the host OS doesn't normalise (canonicalise) paths by default.

This modification introduces optional build.dir normalisation and uses it wherever required. I kept the default to off since I'm not certain what subtle things and user expectation would be broken by turning it on by default.

Normalisation is done on the fly, so the path is preserved in the configuration dialog and settings files as entered initially by the user.

Test Plan

works as intended. It also restores parsing for KDevelop (and similar projects) when imported with the JSON importer (probably by reintroducing #a8d7c1bd2351d25faa47e9ba9c17b3d52d173579).

Question: should the source directory also be canonicalised?

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 3097
Build 3115: arc lint + arc unit
rjvbb created this revision.Sep 21 2017, 9:06 PM
apol requested changes to this revision.Sep 21 2017, 11:45 PM
apol added a subscriber: apol.

Which part of the system doesn't work if there are symlinks?

Also adding a bool argument is just bad API, makes code harder to read and bound to fail.

If you're sure this is required (I'm not) instead of adding the clunky API, please create a canonicalizePath method or similar.

plugins/cmake/settings/cmakepreferences.cpp
335

Are you trying to confuse whoever wants to read this code again?

This revision now requires changes to proceed.Sep 21 2017, 11:45 PM
rjvbb updated this revision to Diff 19767.Sep 22 2017, 1:07 AM

It wasn't clear if you wanted to add a method to CMake or to KDevelop::Path so I went with the former which I find gives more readable code.

Why is this needed? Did you read the commit message of Milian's commit I referenced?

Take a working dir that's somewhere hidden behind a long access path (like what Linux will give to auto-mounted external drives), let's assume that you're using an out-of-source build approach where the source dir and the build dir live under a same parent. (Not crucial for the argument but I just work that way.)

You do the initial cmake run using the actual path, and then at some point you get tired of typing the long path so you create a symlink from a regular work directory under your $HOME to that working dir on the external.

KDevelop's build tree will contain auto-generated files that include headerfiles with paths like ../../../kdevelop-source/kdevplatform/interfaces which are representative for the actual locations on disk. Accessing that build directory via (cd link-to-work/build ; make) will lead to all kinds of failures if make (and thus potentially cmake) finds itself executing under /home/joe/work/coding/link-to-work/build instead of something like /media/joe/JoesExternal/coding/KDE/KDevelop/build. Headerfiles expected under relative paths may not longer be found.

It's taken me a few days to realise why I kept getting those non-canonical shortcut paths in my compile commands after I rewrote my wrapper scripts to use canonical paths

There's another argument why this is a good idea: cross-platform consistency. On Mac it's impossible to avoid path normalisation and there may be other platforms where this is true too.

rjvbb set the repository for this revision to R32 KDevelop.Sep 22 2017, 1:08 AM
mwolff added a subscriber: mwolff.Sep 22 2017, 11:09 AM

please write a unit test

And btw: I'm in favor of the concept, i.e. to use canonical paths everywhere. I've run into issues myself, but I wonder why I'm not seeing the issue you are referencing here. My paths are also containing symlinks. It sounds to me like you are solving a fringe case (i.e. changing paths after the fact). I'm all for handling this gracefully, but to ensure it won't break in the future, we will need a unit test. That will also show us what actually is broken.

You did see the issue in the past, judging from that referenced commit message.

I'm not changing paths after the fact voluntarily, but you could say that's what I'm doing with creating a shortcut link to the build directory, in a way. The result of that is that certain paths can be changed after the fact by cmake itself (easily visible in the directory entries in compile_commands.json). But it's KDevelop who causes that to happen (or the Makefile), not me directly.

But even if you use the application in a standard workflow where all development, cmake'ing and make'ing takes place from inside the IDE you can still run into this problem. All it takes is that the build system calls a utility (say, moc) which canonicalises its work directory. Those symlinks on the paths used in (C)Makefiles will then all of a sudden behave like they are actual directories, and the 2 paths to the current workdir will not only be different in a string comparison but also in terms of depth w.r.t. the root. It's the latter aspect that breaks header inclusion using relative paths, of course. And that can (will!) happen if you're never using KDevelop at all to work on or build its own sources.

I must admit that I have absolute no idea how to write a unittest for this, in part because it's something I rarely do and above all because I fail to understand why this breaks the KDevelop project and no other that I've identified for now. The initial version of the patch would have made it trivial to test both forms in all existing cmake plugin unit tests, but we'd still need a sensible example that uses a build.dir with a symlink on the path. But ideally you'd need a parser unittest for cmake-based projects, no?
Right now the only unittest I can think of is setting up a project as described for KDevelop's source, and detecting parsing errors in it or simply building it.

I'm also not sure how this could break in the future. What this patch does is to make sure that cmake (and make) are called *in the actual directory where they would have been called if it weren't for that symlink*. That should never break anything, as long as QFileInfo::canonicalFilePath() isn't broken.

Thinking aloud: this kind of patch isn't required on OS X because there paths are made canonical just about everywhere by the system. I can enter a build.dir of the form ~/work/symlink2work/build by hand in the cmake project configuration dialog, but not with one of the UI widgets. Similarly, cd ~/work/symlink2work/build ; pwd will always report the actual destination directory, and that's also what the builddir combox drop-down will show.

Let's assume that the KDevelop::Path is used systematically throughout the code (can anyone even hope to vouch for that?) AND that in order to use the stored path one always uses a known set of member functions (toLocalFile() being the one I gotten to know). This patch could then be replaced by a global KDevelop::Path option to make the class return canonical paths systematically (maybe that's what Aleix really had in mind?). In turn that should remove the need to write a specific unittest for the cmake although the fact remains a unittest should exist that tests things in presence of a symbolic link.

FWIW, it's also not like I have more time to spend on this than anyone else here. If this patch hadn't addressed the parser issues I was seeing I would probably have gone back to using 5.1.2 until further notice, to get back to the things I had planned for last week...

rjvbb added a comment.Sep 22 2017, 3:16 PM
This comment was removed by rjvbb.
rjvbb marked an inline comment as done.EditedSep 22 2017, 8:13 PM
diff --git a/kdevplatform/util/path.cpp b/kdevplatform/util/path.cpp
index edcbfbe9adbcdb80b8b51759099aad0cc9e7deb1..b71b6f186f3ba514c2ebae480a06e4fa73a77b6d 100644
--- a/kdevplatform/util/path.cpp
+++ b/kdevplatform/util/path.cpp
@@ -23,6 +23,7 @@
 #include "path.h"
 
 #include <QStringList>
+#include <QFileInfo>
 #include <QDebug>
 
 #include <language/util/kdevhash.h>
@@ -397,13 +398,36 @@ static QVarLengthArray<QString, 16> splitPath(const QString &source)
     return list;
 }
 
+// 
+static QString toCanonicalPath(const QString &path)
+{
+    const QFileInfo info(path);
+    if (info.exists()) {
+        if (info.isDir()) {
+            // the simple case: just resolve all symlinks in the entire path
+            return info.canonicalFilePath();
+        } else {
+            // resolve the symlinks in the path to the file, and reappend the filename
+            return QFileInfo(info.absolutePath()).canonicalFilePath() + QStringLiteral("/") + info.fileName();
+        }
+    }
+    // don't attempt anything on non-existing files, it will fail
+    return path;
+}
+
 void Path::addPath(const QString& path)
 {
     if (path.isEmpty()) {
         return;
     }
 
-    const auto& newData = splitPath(path);
+    // convert local paths to canonical if not adding it to an existing path
+    const bool convertToCanonical = m_data.isEmpty() && path.startsWith(QLatin1Char('/'));
+    // Use QFileInfo::canonicalPath() to resolve any symlinks on the path, but not the
+    // final element (filename). This returns an empty string when path doesn't exist
+    // so we need to check the result.
+    const QString canonicalPath = convertToCanonical ? toCanonicalPath(path) : QString();
+    const auto& newData = splitPath(canonicalPath.isEmpty() ? path : canonicalPath);
     if (newData.isEmpty()) {
         if (m_data.size() == (isRemote() ? 1 : 0)) {
             // this represents the root path, we just turned an invalid path into it
@@ -421,6 +445,17 @@ void Path::addPath(const QString& path)
 
     std::copy(it, newData.end(), std::back_inserter(m_data));
     cleanPath(&m_data, isRemote());
+    // convert the concatenated result to canonical if local
+    // can surely be done in a more efficient manner
+    if (canonicalPath.isEmpty() && isLocalFile()) {
+        Path temp;
+        // using the default Path ctor is the fastest and easiest way
+        // to call ourselves with a fresh temporary m_data instance
+        temp.addPath(toCanonicalPath(generatePathOrUrl(true, true, m_data)));
+        if (!temp.m_data.isEmpty()) {
+            m_data = temp.m_data;
+        }
+    }
 }
 
 Path Path::parent() const

This is a draft, PoC implementation that should make KDevelop::Path work with canonical paths, by converting them on input. It just leaves the final symlink if that is a file, so symlink shortcuts to executables are not replaced with the target.

Making KDevelop::Path purely canonical indeed makes the cmake patch redundant but has an interesting side-effect. The parser now complains about undefined logging category symbols (CMAKE in the cmake plugin), and from the looks of it that is because it finds the kdevplatform/project/debug.h headerfile before it finds the correct debug.h headerfile.

This PoC patch isn't cheap, btw, judging from how much it slows down the test_path unittest. I'd be curious to know how to make it more efficient, esp. how better to handle the 2nd canonicalisation in addPath().

That problem exists with the current 5.2 branch (encountered by me, confirmed by frinring), so it's probably not caused by your patch.

The issue with the debug.h header confusion (for lack of a better word)? My patch of the cmake plugin isn't causing it, I agree. The attached patch of KDevelop::Path is involved somehow, in my case at least but indeed that doesn't mean it's the cause.

I'll try to see if it triggers the issue in other projects too, which don't have symlinks in their paths. Does KDevelop use symlinks in its working directories anywhere?

On a side-note, I must say I'm surprised that KDE projects get away so well with #include "debug.h" supposed to include different headers. That's usually asking for trouble given how you cannot control which -I/-isystem apply to what headers. Ultimately this looks like a build system design issue, not a parser issue.

Francis Herne wrote on 20170922::23:42:27 re: "D7930: Kdevelop CMake plugin : use canonical paths to build.dir"

That problem exists with the current 5.2 branch (encountered by me, confirmed by frinring), so it's probably not caused by your patch.

I just tested this with a fresh build dir for a clone of the KDevelop source tree. I made a project using simple, shortish paths to the source tree with a build.dir directly under the source toplevel directory, and no symlinks in either path.

I'm not seeing the undefined logging category macro issue even if I patch KDevelop::Path to be canonical. But I keep seeing the issue with a canonical KDevelop::Path class when I open my regular KDevelop project using the symlinked build.dir.

Maybe this is just a question of the order in which -I/-isystem directives are added which could somehow be dependent on alphanumerical sorting order of directory names? Is the issue being tracked somewhere?

To come back to the original subject, here's an example of an include statement that could easily go wrong when executed from a working directory that is not canonical but has a symlink in it that changes the directory level:

/home/bertin/work/src/Scratch/KDE/KF5/kk-git/build/kdevplatform/sublime/examples/example2_autogen/EWIEGA46WW/moc_example2main.cpp:108: undefined reference to `Sublime::MainWindow::qt_metacall(QMetaObject::Call, int, void**)'
CMakeFiles/example2.dir/example2_autogen/mocs_compilation.cpp.o: In function `~Example2Main':
/home/bertin/work/src/Scratch/KDE/KF5/kk-git/build/kdevplatform/sublime/examples/example2_autogen/EWIEGA46WW/../../../../../../kdevplatform/sublime/examples/example2main.h:24: undefined reference to `Sublime::MainWindow::~MainWindow()'

(Taken from an error message that's hopefully not relevant here)

You need to provide us with a way to setup a project to trigger this problem, then one can build a unit test out of it.

Yes, I have encountered this problem in the past, and I fixed it. In my case e.g.:

┌milian@milian-kdab2:~
└$ cs kf5/src/extragear/kdevelop/kdevelop/ 
┌milian@milian-kdab2:~/projects/kf5/src/extragear/kdevelop/kdevelop|master=
└$ echo $PWD
/home/milian/projects/kf5/src/extragear/kdevelop/kdevelop
┌milian@milian-kdab2:~/projects/kf5/src/extragear/kdevelop/kdevelop|master=
└$ readlink -f $PWD
/ssd/milian/projects/kf5/src/extragear/kdevelop/kdevelop
┌milian@milian-kdab2:~/projects/kf5/src/extragear/kdevelop/kdevelop|master=
└$ cb
┌milian@milian-kdab2:~/projects/kf5/build/extragear/kdevelop/kdevelop
└$ echo $PWD
/home/milian/projects/kf5/build/extragear/kdevelop/kdevelop
┌milian@milian-kdab2:~/projects/kf5/build/extragear/kdevelop/kdevelop
└$ readlink -f $PWD
/ssd/milian/projects/kf5/build-dbg/extragear/kdevelop/kdevelop
┌milian@milian-kdab2:~/projects/kf5/build/extragear/kdevelop/kdevelop
└$

So, if it breaks for you, how come?

rjvbb added a comment.Sep 26 2017, 1:19 PM

You need to provide us with a way to setup a project to trigger this problem, then one can build a unit test out of it.

See below. If I'm right, you could write a unittest that sets up a project of minimum complexity (but still requiring moc, possibly uic and generating config.h files) in the regular fashion. Then, set up another access path to the build directory using symlinks, one that goes through a different number of subdirs (simply /tmp/link-to-build.dir?). A make inside that shortcut build.dir will likely fail without canonicalisation, and then so will the parser.

So, if it breaks for you, how come?

If I read this correctly, you have separate trees for the source directories and for the builds which seem to live on an (external?) SSD, and symlinks to those trees under your home directory.
It's not easy to read but if I'm not mistaken you get into the build directories via the same number of subdirectories under /ssd and /home, and the names are equal too (except for build vs. build-dbg).

IOW, a relative path to a header like the one shown in my previous comment will work regardless whether you're in the canonical/physical PWD, or in the PWD that includes a symlink. That's not my case, and you may run into problems too when you do

> ln -s /ssd/milian/projects/kf5/build-dbg/extragear/kdevelop/kdevelop /tmp/build.dir
> cd /tmp/build.dir
> pwd
/tmp/build.dir
> make

Here's my current understanding, somewhat deeper than a few days ago. Simplifying the paths a little bit, I have

source: /home/bertin/src/kf5/kdevelop-git

work dir:
/opt/local/var/build/tree1/kf5-kdevelop/work

containing
/opt/local/var/build/tree1/kf5-kdevelop/work/kdevelop -> /home/bertin/src/kf5/kdevelop-git
/opt/local/var/build/tree1/kf5-kdevelop/work/build : configured for /opt/local/var/build/tree1/kf5-kdevelop/work/kdevelop

and my shortcut
/home/bertin/src/kf5/kdevelop-work -> /opt/local/var/build/tree1/kf5-kdevelop/work

So yes, my source directory accessed via a symlink too, but I don't think that is relevant for the issue (note though I'm not proposing to canonicalise the source directory even if doing it via KDevelop::Path didn't reveal any effects).

in a nutshell, my configure+build script does the equivalent of

source.dir=/opt/local/var/build/tree1/kf5-kdevelop/work/kdevelop
build.dir=/opt/local/var/build/tree1/kf5-kdevelop/work/build

if configuring ;then
    (cd ${build.dir} && cmake $OPTIONS ${source.dir})
fi
if building ;then
    (cd ${build.dir} && make)
fi

As I said, I rarely if ever build from inside KDevelop, and even my "official" cmake runs are made through those scripts in order to be certain that everything is set up as I want. I think my issue is above all related to the initial build steps where moc is being run, and possibly also the direct generation of certain headerfiles which aren't present after just running cmake. It will make a difference if you try to access these from /home/bertin/src/kf5/kdevelop-work/build or /opt/local/var/build/tree1/kf5-kdevelop/work/build, as long as they're generated and searched for using relative paths and those paths aren't updated as a function of that work directory.

When all is said and done, /home/bertin/src/kf5/kdevelop-work/build is not the same path as /opt/local/var/build/tree1/kf5-kdevelop/work/build, despite the fact that they both point to the same location. The crucial difference is in the recursion level under the root. City-block distance I think one could call it. (Insert some relevant quip about the roads to Rome here.)

Note that I never had any issues as long as I could prevent KDevelop from invoking cmake directly by ensuring I generate my own compile_commands.json file (when not doing builds, that is).

I realise one could say "just don't use KDevelop that way" but I'd hope to think we agree that it shouldn't be possible to break things with something which in the end is so trivial. It corresponds to using KDevelop to working on packages that have to build through some build/distribution system (think Gentoo Prefix or pkgsrc) so it shouldn't be a complete fringe use case.

rjvbb updated this revision to Diff 20244.Oct 2 2017, 1:07 PM

Don't use QFileInfo::toCanonicalFilePath() without checking first if the file/folder exists.
Without this check, opening a project after trashing its build dir will cause the initial cmake configuration dialog to open instead of rebuilding the build dir silently.

rjvbb set the repository for this revision to R32 KDevelop.Oct 2 2017, 1:09 PM
rjvbb added a comment.Oct 7 2017, 1:29 PM

I've kept the request to add a unittest to this in the back of my mind. I cannot think of any feasible way to do this.

  • The canonicalisation itself uses an external function
  • if it makes the cmake project import fail this will be caught by existing unittests

In addition:

  • it is not the import of projects affected by the use of a build.dir with symlink(s) in its path that goes awry: the project is imported correctly (and faithfully enough to reproduce a symptom that exists outside of KDevelop)
  • the issue manifests itself during parsing but is not the result of an issue in how KDevelop uses libclang to parse code

In short, KDevelop triggers something that could be considered an issue in CMake and/or in Qt utilities like whatever generates the moc source files. The only thing KDevelop does wrong here is the fact that it doesn't protect the user against this situation.

The patch would undoubtedly be simpler if canonicalisation were done at the input level (i.e. when the user specifies a build directory) rather than on the fly each time the build.dir is obtained from the settings store. Doing this will also inform the user directly (and immediately, when the dir is obtained using a file dialog) what directory is used. I didn't dare to propose that change initially because of potential side-effects I cannot foresee but maybe such side-effects simply don't exist.

FWIW, some form of canonicalisation already happens when picking a build.dir. Otherwise one should get an error when selecting a build.dir path with symlinks in it for a project that was already configured for the canonical version of that path.

mwolff added a comment.Oct 8 2017, 7:18 PM

The test should replicate the situation and show it fail before this patch and then work with this patch. Why is this not possible to do?

rjvbb added a comment.Oct 8 2017, 8:00 PM
The test should replicate the situation and show it fail before this patch and then work with this patch. Why is this not possible to do?

This would mean one would have to write a test that

  • sets up full-blown KDevelop session containing a project that's known to fail under the conditions to test and configure it for those conditions (meaning kdevelop itself)
  • launch the session
  • detect if parsing fails

Please note that I didn't say it's impossible. I said I don't see a *feasible* way to do it and (implicitly) that I don't think it's appropriate enough to figure out how.

I have almost no experience writing unittests and as you know I don't have any formal training as a software developer, so I don't know how to explain with better terminology why I think the issue at hand is not something you can check for with a simple automated test. It also doesn't help that I have only encountered this with KDevelop *after* its reorganisation. Yet I've been using the kind of set-up with symlinks for years.

What I can try to do is to clean up my .kdev4 project file and provide a script that handles setting up the directories and symlinks.

mwolff added a comment.Oct 9 2017, 8:43 PM
In D7930#153476, @rjvbb wrote:
The test should replicate the situation and show it fail before this patch and then work with this patch. Why is this not possible to do?

This would mean one would have to write a test that

  • sets up full-blown KDevelop session containing a project that's known to fail under the conditions to test and configure it for those conditions (meaning kdevelop itself)
  • launch the session
  • detect if parsing fails

No, that is not true. You only need to set up some folder structure with symlinks similar to what you have. Then you can load it into the cmake manager and then verify that all paths are "correct", i.e. canonicalized if that is what we want.

rjvbb added a comment.Oct 10 2017, 8:52 AM

Ah. So basically you're saying we should write our own unittest for QFileInfo::canonicalFilePath()?

rjvbb added a comment.Oct 10 2017, 9:04 AM

Sorry, something went wrong.

In addition to that previous remark, what you're describing also sounds like it could be a relatively minor addition to an existing test of the cmake import procedure. Which test would that be, kdevprojectopen or test_cmakemanager?

Do we agree that this will not test whether or not the underlying issue occurs, so won't provide any "proof" that canonical paths have to be used?

mwolff requested changes to this revision.Oct 19 2017, 12:00 PM

No, it should not just be a test around QFileInfo::canonicalFilePath, that is done already in Qt upstream for us. What I want is test(s) for the CMakeManager code you are changing. I.e. take CMakeManager::buildDirectory as an example. Extend the test_cmakemanager (i.e. add another test method there). Then play with symlink structure in a QTemporaryDir to replicate your erroneous setup. Finally call CMakeManager::buildDirectory and ensure it returns "the right thing". Similarly you can call the cmakeutils functions and check their return values for sanity.

This test should fail before you apply this patch, and then pass with this patch.

plugins/cmake/cmakeutils.h
101

When should this function be used instead of the above? Shouldn't the above always return the canonical version?

207

same as above

This revision now requires changes to proceed.Oct 19 2017, 12:00 PM
rjvbb added a comment.Oct 24 2017, 6:50 AM

Then play with symlink structure in a QTemporaryDir to replicate your erroneous setup. Finally call CMakeManager::buildDirectory and ensure it returns "the right thing"

I'll try to figure out how in due time. I'm still convinced that this will just be a contrived, "circumvolitious" way of testing QFileInfo::canonicalFilePath(), possibly combined with Qt's function for changing the working directory. If I read you correctly the test will be looking at a black box where it inputs a path with symlinks and then compares the output with the result of running canonicalFilePath() on that same input (hardwired or not). Except that the box isn't that black when testing with CmakeManager::canonicalBuildDirectory() because we *know* that it calls the QFileInfo function inside.

Same reasoning with a CMakeManaer::buildDirectory() that always returns canonicalised paths, if you think it should do that. I don't disagree with that idea btw (would make the patch simpler but probably also make its effects visible and irreversible as if on an OS where canonicalisation occurs somewhere in the system).

rjvbb updated this revision to Diff 42097.Sep 21 2018, 3:30 PM

Refactored for the 5.3 branch + completed.

An example where this is required:

mkdir -p work/src/KDE/KF5
cd work/src/KDE
git clone kde:kdevelop kdevelop-git
cd KF5
ln -s kdevelop5-git ../kdevelop
(cd kdevelop5-git ; git checkout 5.3)
mkdir kdevelop5-git/build
# import kdevelop5-git into KDevelop,
# configure with kdevelop5-git/build as build dir
# (outside of KDevelop:)
(cd kdevelop5-git/build/plugins/astyle ; make -w kdevastyle_autogen)

For me this will fail on an include of configpage.h because it is looked for via a relative path that tries to access ~/work/src/KDE/kdevelop but fails because it is executed in ~/work/src/KDE/KF5/kdevelop5-git/build instead of ~/work/src/KDE/kdevelop/build. These paths refer to the same location on disk but differ in the number of path segments, as mentioned above.

rjvbb set the repository for this revision to R32 KDevelop.Sep 21 2018, 3:32 PM