Clang Plugin: Report some problems from included files
AcceptedPublic

Authored by thomassc on Jan 13 2019, 2:08 PM.

Details

Reviewers
mwolff
Group Reviewers
KDevelop
Summary

Consider the following piece of faulty code:

struct A {};

template <typename T>
T AddObjects(const T& a, const T& b) {

return a + b;

}

int main(int /*argc*/, char* /*argv*/[]) {

A a, b;
AddObjects(a, b);

return 0;

}

This fails since struct A does not have an operator +. The actual error could be either in function AddObjects()
(since it adds two objects that should not be added) or in the call to AddObjects (since it might be called
with a type that it does not support). For the code above, KDevelop will however only report an error inside the template
function. Now, consider that the template function is in an external library, and the fault is it being called with
wrong arguments. This way, no error will be visible to the user at all, despite the error being within the user's code.
This is for example commonly the case when using the "Eigen" linear algebra library and trying to add incompatible matrices.

The diff proposed here creates ClangProblems for certain diagnostics that are located in included files, if there is
a child diagnostic in the current file that ends on "requested here". This would error-highlight the call in the example
above when the template function is in an external header.

In addition, the existing code which creates ClangProblems already included missing-include errors in the error list of the parsed file with the intention
to make them visible to the user. However, it seems that this was not actually the case, since the problem reporter plugin
by default filters those errors out, since their finalLocation is in another file. Thus, the proposed diff also treats those
errors in a similar way as the "requested here" errors to make them show up.

Note 1: In case the template function and the call are located in the same file, KDevelop will still only report the error
within the template function with this patch applied. Similarly, if there is a chain of multiple template functions within
the same file that call each other and finally cause an error in an external template function, the "requested here" error
highlight will only be generated for the first one and not for the others.

Note 2: While testing, I noticed some inconsistencies in which errors are detected upon different actions: pressing F5,
modifying the source file, or modifying the header. Not sure what the cause for this is. Sometimes the errors in
included files only seem to be reported for the include files themselves and not for the source files.

Test Plan

Did short manual testing. Trying to add incompatible Eigen matrices is reported as an error now and underlined red.

Diff Detail

Repository
R32 KDevelop
Lint
Lint Skipped
Unit
Unit Tests Skipped
thomassc created this revision.Jan 13 2019, 2:08 PM
Restricted Application added a project: KDevelop. · View Herald TranscriptJan 13 2019, 2:08 PM
Restricted Application added a subscriber: kdevelop-devel. · View Herald Transcript
thomassc requested review of this revision.Jan 13 2019, 2:08 PM
mwolff requested changes to this revision.Jan 15 2019, 1:48 PM
mwolff added a subscriber: mwolff.

can you please also add tests for this feature? Have a look at test_duchain.cpp and in there e.g. TestDUChain::testInclude for how to create multiple files and have them include each other.

Generally, I'm very much in favor of the direction this patch is going. This issue has been bugging me for a long time but I never got around to fixing it!

Regarding Note 1: I think that sounds fine. Just make sure to add tests for all these scenarios and then we can decide to change this behavior later on, if need be.

Regarding Note 2: Could this be a result of the "don't reparse if file didn't change" logic? I can believe that there are issues left that result in inconsistencies, when only a header was changed, or only an include file... Anyhow, I hope that's unrelated to this patch?

plugins/clang/duchain/parsesession.cpp
457

put this into a helper function to reduce the complexity here

486

similarly, create a helper function for this chunk

This revision now requires changes to proceed.Jan 15 2019, 1:48 PM
thomassc updated this revision to Diff 49961.Jan 20 2019, 7:30 PM

Make helper functions and add tests, as requested by Milian.

pino added a subscriber: pino.Jan 20 2019, 7:35 PM
pino added inline comments.
plugins/clang/duchain/parsesession.cpp
213

OOC, does clang support translations of its messages? if so, would this check be problematic?

251

no string puzzles please:
https://techbase.kde.org/Development/Tutorials/Localization/i18n_Mistakes#Pitfall_.232:_Word_Puzzles
so i18n("Requested here: %1", problem->description())

262

ditto

I moved the code into helper functions and added an initial version of the tests. The tests for having the function and the call in the same file, and for having a chain of template functions in the same file include QEXPECT_FAIL(). In principle it would be easy to generate additional problems for these cases, but then a single actual problem would be represented by multiple problems in the same file. Maybe it would be cleaner to make a single problem have multiple ranges.

Could this be a result of the "don't reparse if file didn't change" logic? I can believe that there are issues left that result in inconsistencies, when only a header was changed, or only an include file... Anyhow, I hope that's unrelated to this patch?

Yes, that should be unrelated to this patch, it will only become more noticeable.

Unfortunately, upon more testing, I noticed that some of the existing tests in the clang plugin's test_duchain.cpp become very flaky with this patch. This does not seem to be caused by the actual behavior change of the patch, but by the addition of the new test(s). I'll resume debugging and address Pino's comments later when I'll have more time again.

without digging into the patch myself (I'll have to see whether I can find time for this), could we solve more of the open TODOs by reporting problems for *all tail* child diagnostics that point at the current file? And then reference the "parent" from them somehow? I.e. I believe currently, we have something like this (bare with me, it was quite some time since I worked on this):

  • error in X
    • child in Y
    • child in Z

Previous to this patch, we only report the "error in X" and show the children beneath. This change will also make it such that we also show the "error in X" problem for the "child in Z" location. Maybe we could just do that for *all* tail child diagnostics, i.e. j == numChildDiagnostics - 1?

plugins/clang/duchain/clangproblem.cpp
99

= default;

plugins/clang/duchain/parsesession.cpp
251

can't we use the original description of the child diagnostic?

plugins/clang/tests/test_duchain.cpp
2003

why is that? i.e. can you elaborate on this - do you know already what would be required to fix this too?

mwolff requested changes to this revision.Jan 23 2019, 8:16 PM
This revision now requires changes to proceed.Jan 23 2019, 8:16 PM

To some extent that problem is going to disappear as soon as concepts are more widely used. GCC ≥ 6 already supports them and I expect Clang to follow suit — there is already a prototype.

plugins/clang/duchain/parsesession.cpp
213

Not yet, but it seems they want to.

239–240

Wouldn't it be easier to relax that filtering?

mwolff added inline comments.Jan 26 2019, 1:29 PM
plugins/clang/duchain/parsesession.cpp
239–240

that's not so easy either, since it would require to recursively iterate through all imported contexts in the current file and then find ones that are associated with the current one. I.e. that would basically mean we always "show imported", which can become slow. So I would prefer to do it here, as suggested by this patch

This new patch (hopefully) addresses the test flakiness that I observed before:
https://phabricator.kde.org/D18567

This should go in before adding the new tests here.

ping? I'd really like to see this getting integrated, could you please fix the small issues pointed out by pino and me and resubmit?

thomassc updated this revision to Diff 52481.EditedFeb 24 2019, 7:35 PM

Address comments. Now, problems are created for each "requested here" child diagnostic that refers to the current document. This may be more than just the last child diagnostic, since there may be multiple ones referring to the current document.

I used a hack to apply i18n() in cases where the replacement for the %1 parameter is only known later, e.g.: i18n("In included file: %1", QLatin1String("%1")). Is there a better solution? When leaving the replacement away, the function appended an error message to the string. Alternatively, if clang doesn't translate its messages yet, maybe this should also be left in English for consistency?

I wouldn't use the original description of the child diagnostic as-is, at least not without further marking the problem somehow as referring to another file. Otherwise, the diagnostic message may look unrelated to the line of code where it points to, which may be confusing. Prepending the message with something like "Requested here: " seemed like the easiest way to signal that it eventually relates to another line of code.

The unit tests now all pass (given that https://phabricator.kde.org/D18567 is applied).

Edit: Sorry for not using arc. It is not installed on the system I am using currently, and I hope to use the new gitlab instance for further PRs, such that I hopefully won't need it anymore.

mwolff requested changes to this revision.Mar 10 2019, 7:20 PM

nice, this is getting better! some suggestions on how to improve the code quality, and then some potential issues I can think of - please fix or document why they aren't an issue

cheers

plugins/clang/duchain/parsesession.cpp
205

make -> create

206

pass a KLocalizedString here

212

auto

233

once using KLS, use descriptionTemplate.subs(problem->description()).toString()

238

make -> create

256

once using KLS, use ki18n("Requested here: %1") here

505

this isn't correct anymore:

When we run into this code path with, say, 5 problems, we would reserve with 5 and then fill the cache. When we then get back here the next time with 0 problems, we would reserve 0, but the size would still be 5!

the easiest solution I can come up with for now is to resize to output_index at the end... but I kind of dislike that we need that to begin with :( can you come up with something better? otherwise, please at least introduce a lambda helper that wraps the "lookup in cache and reuse or insert new problem"

541

use ki18n too

are we sure that this external problem will always have a location within the current file? couldn't it happen that we encounter completely unrelated errors? I would assume that e.g. in the following scenario, we may end up creating unhelpful issues?

A > B > C

if B has an error due to C, then A shouldn't show anything (except when we "show imports")

plugins/clang/tests/test_duchain.cpp
2070

QStringLiteral(R"(
#include "%1"
...
").arg(header.url().str());

2088

auto

2142

see above

2167

auto

This revision now requires changes to proceed.Mar 10 2019, 7:20 PM
thomassc marked 14 inline comments as done.Sat, Apr 6, 2:50 PM
thomassc added inline comments.
plugins/clang/duchain/parsesession.cpp
505

To be honest, I am not sure how that cache is supposed to work. I did some testing with debug output added to ParseSession::problemsForFile(), and my impression is that it is currently broken. The behavior seems to be the following:

  • When editing a source file, the cache always starts out empty. problemsForFile() is only called once for that source file. The cache is thus not used.
  • When editing a header file, problemsForFile() is called first for that header file and then for some source file which is probably supposed to include that header (but did not always actually do that in my tests). The cache is empty at the start of the first call, so it is not used there. At the start of the second call, it contains the elements added by the header, but the function is now called on the source file, so the cached elements are wrongly used for it. So, the only use of the cache is a wrong use.

In case these observations are true and I didn't miss anything, could the cache be simply removed?

541

This should only happen in case of missing includes. In this case, it is intentional to show these errors since they may cause other errors later. Other types of problems with location outside of the current file are filtered out above:

// missing-include problems are so severe in clang that we always propagate
// them to this document, to ensure that the user will see the error.
if (diagnosticFile != file && ClangDiagnosticEvaluator::diagnosticType(diagnostic) != ClangDiagnosticEvaluator::IncludeFileNotFoundProblem) {
    continue;
}
thomassc updated this revision to Diff 55563.Sat, Apr 6, 2:52 PM

Address Milian's comments; remove m_diagnosticsCache.

mwolff requested changes to this revision.Sat, Apr 13, 4:49 PM

Hey Thomas, please don't remove the cache. See f2a6941e086cdf506c8fb1798c52982bff43792d for why this was introduced. Your tests don't include other files, so probably that's why you didn't see any effect of the cache?

This revision now requires changes to proceed.Sat, Apr 13, 4:49 PM

I re-tested the behavior on files from an actual project and it showed the same behavior when editing files as described in my last post, regardless of how many files are included from the files that are edited. I noticed that the behavior is different when re-opening files: in this case, problemsForFile() is actually called for a larger set of files (although from only looking at this set of files, it seemed a bit random and it was not clear to me how it is determined, I'll have to dig into the source code for this at some point ...).

In any case, the diagnostics cache seems to be clearly broken to me at the moment. The issue is that the ParseSession class and the diagnostics cache is related to a translation unit, but problemsForFile() is called for a specific file of that translation unit. So, if problemsForFile() is called for file A first and then for file B, the problems returned for file B will wrongly re-use cached entries that were created for file A before, while the actual problems might be completely different for both files. Thus, as a concrete issue, if a header file is edited and then the user switches to the source file that was used within the translation unit for determining the header's problems, then KDevelop might show a wrong set of problems for that source file as a result (until the source file is edited again).

I haven't seen a case where the cache has been used correctly. So far I have only seen problemsForFile() being called once for a file from a translation unit. In this case, no correct re-use of Problems is possible on the file level. The incorrect re-use does currently speed up the function, but not in a desirable way. A place where valid re-use would be possible are the IncludeFileNotFoundProblem that are propagated to each document that includes the file with the error. But for that, the cache must be limited to only such problems, which has not been the case.

I am sorry, I think that I misunderstood how the cache works. Please disregard my previous message. I am wondering where the problem that I've been seeing then comes from, seems like it has a different origin.

I re-tested the behavior on files from an actual project and it showed the same behavior when editing files as described in my last post, regardless of how many files are included from the files that are edited. I noticed that the behavior is different when re-opening files: in this case, problemsForFile() is actually called for a larger set of files (although from only looking at this set of files, it seemed a bit random and it was not clear to me how it is determined, I'll have to dig into the source code for this at some point ...).

Upon reopening a file, we should only update the problems of changed files that got updated. Otherwise we should only grab the TU for the main .cpp file and attach it, such that we can do code completion.

In any case, the diagnostics cache seems to be clearly broken to me at the moment. The issue is that the ParseSession class and the diagnostics cache is related to a translation unit, but problemsForFile() is called for a specific file of that translation unit. So, if problemsForFile() is called for file A first and then for file B, the problems returned for file B will wrongly re-use cached entries that were created for file A before, while the actual problems might be completely different for both files.

Yes and no. The problems should be reused, it's the whole idea of the cache. Note how clang_getNumDiagnostics is per-TU and thus applies to all files encountered in this TU. If we thus happen to use the same diagnostic, identified by its index, in two files, we can easily reuse the same problem.

Thus, as a concrete issue, if a header file is edited and then the user switches to the source file that was used within the translation unit for determining the header's problems, then KDevelop might show a wrong set of problems for that source file as a result (until the source file is edited again).

Really, does this happen? Note how in both cases, the .cpp file should be used for parsing, and since that happens to include the header file, it will update that one too.

I haven't seen a case where the cache has been used correctly. So far I have only seen problemsForFile() being called once for a file from a translation unit. In this case, no correct re-use of Problems is possible on the file level. The incorrect re-use does currently speed up the function, but not in a desirable way. A place where valid re-use would be possible are the IncludeFileNotFoundProblem that are propagated to each document that includes the file with the error. But for that, the cache must be limited to only such problems, which has not been the case.

Apparently you've figured out that this statement is wrong :)

thomassc updated this revision to Diff 56336.Mon, Apr 15, 10:59 PM

Re-add m_diagnosticsCache with a hopefully correct caching implementation. Differing from the original situation, the new function createExternalProblem() adapts the problems it creates to the specific file they are inserted in. The cache thus only caches the original problems as created by ClangDiagnosticEvaluator::createProblem(). createExternalProblem() makes copies of these with a new copy constructor added to ClangProblem.

Upon reopening a file, we should only update the problems of changed files that got updated. Otherwise we should only grab the TU for the main .cpp file and attach it, such that we can do code completion.

If I understand this correctly, then perhaps something does not behave as it should there, since in some cases, a lot of files' problems seem to get updated when re-opening a file, including lots of system headers that certainly haven't changed. In other cases, no calls to ParseSession::problemsForFile() are made. It seems that the first case can be triggered by making changes to a header, closing it without saving, and then re-opening it.

Apparently you've figured out that this statement is wrong :)

Yes, I am really sorry here, as I apparently got confused thoroughly. I think that I first introduced a bug to the cache myself that caused the described behavior, and then I observed a similar symptom with the original code (which however came from another bug), which lead me to strongly believe that this same issue that I had introduced must have been present in the original version as well ... this should really not happen. The other bug is that sometimes, KDevelop reports that include guards are missing in a header, when there is actually a #pragma once there. I don't know yet why this happens, but it is possibly unrelated here.

Upon reopening a file, we should only update the problems of changed files that got updated. Otherwise we should only grab the TU for the main .cpp file and attach it, such that we can do code completion.

If I understand this correctly, then perhaps something does not behave as it should there, since in some cases, a lot of files' problems seem to get updated when re-opening a file, including lots of system headers that certainly haven't changed. In other cases, no calls to ParseSession::problemsForFile() are made. It seems that the first case can be triggered by making changes to a header, closing it without saving, and then re-opening it.

I'll have to try that myself once I find the time.

Apparently you've figured out that this statement is wrong :)

Yes, I am really sorry here, as I apparently got confused thoroughly. I think that I first introduced a bug to the cache myself that caused the described behavior, and then I observed a similar symptom with the original code (which however came from another bug), which lead me to strongly believe that this same issue that I had introduced must have been present in the original version as well ... this should really not happen.

Yeah, our code base is somewhat convoluted in some places, esp. once we try to make things faster we usually end up introducing such strange corner cases. I wouldn't be surprised if there are issues lying dormant here!

The other bug is that sometimes, KDevelop reports that include guards are missing in a header, when there is actually a #pragma once there. I don't know yet why this happens, but it is possibly unrelated here.

Yes, I've seen this issue. It would be awesome if you can figure that one out - it's been around for a very long time. I always thought it's libclang that erroneously says that the header isn't guarded while it actually is. You say: maybe we associate a warning from a different header and that gives us this spurious warning? Maybe! Please investigate, if possible (in a separate patch)

mwolff accepted this revision.Tue, Apr 16, 1:36 PM

patch lgtm now, many thanks!

This revision is now accepted and ready to land.Tue, Apr 16, 1:36 PM