Normalize line endings when creating ParseJob from background file.

Authored by flherne on May 3 2017, 8:11 PM.



When the file being parsed is open in the editor, ParseJob::readContents() uses KTextEditor::Document::text() to read the contents. At some point within KTextEditor, the line endings are normalized - even if the document uses \r or \r\n, the returned content is separated by \n only.

Otherwise, the contents are read directly from the file. Until this patch, the line endings aren't normalized in this case; the ParseJob contents differ depending on whether the file is open in the editor.

Several places in kdev-python assume that lines can be separated by, and end with, \n. Old-style Macintosh endings (\r) break this assumption, in the worst case causing a crash.

BUG: 378827

Using a regex in each instance would be a pain, and one code path normalizes line endings already, so I think the best option is to do so for background files also.

TODO: Enforce final newline?

Test Plan

Tested loading files with each of the three line-ending styles. ParseJob content is now the same regardless of whether the file is opened for editing.

Diff Detail

R33 KDevPlatform
Lint Skipped
Unit Tests Skipped
flherne created this revision.May 3 2017, 8:11 PM
Restricted Application added a subscriber: kdevelop-devel. · View Herald TranscriptMay 3 2017, 8:11 PM
brauch edited edge metadata.May 3 2017, 8:16 PM

I think this is pretty much the sane solution to the problem ... still, we might want to turn it off for languages that don't need it, it seems like a non-negligible overhead if you consider a whole project. Maybe introduce a new normalizeLineEndings() member function which is called explicitly by the parse job ...?

flherne added a comment.EditedMay 3 2017, 9:53 PM

I think it's needed for all backends - if you grep for '\n', there are many similar uses in clang, qmljs, etc. It mostly works for Windows line endings, because nothing cares about the extra carriage returns, but old Mac-style ones seem likely to break everything.

mwolff added a subscriber: mwolff.May 4 2017, 1:48 PM

ugh. I see it's a necessary evil. But I think it's actually going to be faster to either hand roll this replacement or use a QRegularExpression, because your code is now scanning the contents twice. Once should be enough. And a greedy pattern like "\\r\\n??" should be quite fast.

Split those two lines into a normalizeNewlines(QByteArray&) function or something, so we can optimize it and unit test it later.

flherne updated this revision to Diff 14472.May 13 2017, 4:31 PM

QByteArray doesn't support regexes. Could be done in one pass with a loop and a bit of state-tracking, but I'd rather do the obvious thing for a 5.1 fix.

Factored it out as Nicolás suggested and added a TODO.

mwolff requested changes to this revision.May 14 2017, 10:33 AM

Sorry, but I'm against pushing it as-is. Doing the optimization isn't hard, you are more or less prematurely pessimizing it here. And if you want to put this into anything but master, having a proper unit test is a must anyways I'd say.

So, please if at all possible at least write a unit test. And then use something like this instead of the two replace calls (untested):

QByteArray normalizeLineEndings(QByteArray text)
    for (int i = 0, s = text.size(); i < s; ++i) {
        if (text[i] != '\r') {
        } else if (i + 1 < s && text[i + 1] == '\n') {
            text.remove(i, 1);
        } else {
            text[i] = '\n';
    return text;

Note that by making it a free function in some util header, we can properly unit test it.

82 ↗(On Diff #14472)

put the { on its own line

This revision now requires changes to proceed.May 14 2017, 10:33 AM
flherne updated this revision to Diff 17164.Jul 25 2017, 9:34 AM
flherne edited edge metadata.

Revised, with Milian's non-pessimal code and some tests.

kfunk added a subscriber: kfunk.Jul 25 2017, 11:07 AM



Style: No need for else if after continue.


if (expr1) {

if (expr2) {
mwolff accepted this revision.Jul 25 2017, 4:24 PM

what kevin said can be applied, otherwise lgtm

This revision is now accepted and ready to land.Jul 25 2017, 4:24 PM
This revision was automatically updated to reflect the committed changes.