Escape bytes that are larger than or equal to 127 in config files
ClosedPublic

Authored by vandenoever on Dec 18 2018, 12:56 PM.

Details

Summary

UserBase tells me that KDE configuration files are encoded in UTF-8.
https://userbase.kde.org/KDE_System_Administration/Configuration_Files

In practice some *rc files have bytes outside that encoding. In my home directory I found two files that are not valid UTF-8.

I searched with

find ~/.config/ -name '*rc' -exec file {} +

which gives these exceptions:

akonadiconsolerc: Non-ISO extended-ASCII text, with very long lines
kmail2rc: Non-ISO extended-ASCII text, with very long lines

In kmail2rc, the offending fields are
[AttachmentView]/State
[CollectionFolderView]/HeaderState

Both are QByteArray values saved from QHeaderView::saveState().

In the instance I found, the offending bytes were 0x81 and 0x84.

akonadiconsolerc had way more of these values. All seem related to saving widget state and hence probably QByteArrays.

The written QByteArray values look very strange. The bytes in the non-printable ASCII range are written as \xXX but the values above 127 are written literally.

The encoding is done by KConfigIniBackend::stringToPrintable. It does not currently escape bytes that are larger than or equal to 127.

Diff Detail

Repository
R237 KConfig
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
vandenoever created this revision.Dec 18 2018, 12:56 PM
Restricted Application added a project: Frameworks. · View Herald TranscriptDec 18 2018, 12:56 PM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
vandenoever requested review of this revision.Dec 18 2018, 12:56 PM
vandenoever edited the summary of this revision. (Show Details)Dec 18 2018, 12:57 PM
vandenoever edited the summary of this revision. (Show Details)Dec 18 2018, 1:09 PM
vandenoever retitled this revision from Escape bytes that larger than 127 to Escape bytes that are larger than 127 in config files.Dec 18 2018, 3:07 PM

+1 from me, although a unittest would be good. Especially in order to check that this then gets decoded properly when reading back from the kconfig file...

vandenoever edited the summary of this revision. (Show Details)

Added a unit test and changed the code so that 0x7f (DEL) is also
escaped.

vandenoever retitled this revision from Escape bytes that are larger than 127 in config files to Escape bytes that are larger than or equal to 127 in config files.Dec 18 2018, 6:50 PM
vandenoever edited the summary of this revision. (Show Details)

Add a check for reading in the unit test.

apol accepted this revision.Dec 18 2018, 7:06 PM
apol added a subscriber: apol.

LGTM

This revision is now accepted and ready to land.Dec 18 2018, 7:06 PM
This revision was automatically updated to reflect the committed changes.

Thanks!

autotests/kconfigtest.cpp
1726

Remove (or at least degrade to qDebug, this is not a warning)

1739

The expected value without trailing newline could have been put into a constant, shared between Windows and non-Windows code paths, to avoid all this duplication ;-)

vandenoever marked 2 inline comments as done.Dec 18 2018, 11:37 PM

I pushed two encore commits fixing the two raised issues.

aacid added a subscriber: aacid.Dec 28 2018, 6:10 PM

You broke the tests with this change.

vandenoever added a comment.EditedDec 28 2018, 9:57 PM

Can you give some more details? I see no problem on Jenkins. https://build.kde.org/job/Frameworks/job/kconfig/

Since my commits there was one more patch, but it's not related. https://cgit.kde.org/kconfig.git/commit/?id=d90a37b5a1d892d7b0ff7cc3b56c8a6e8c4bfe1a

Can you give some more details? I see no problem on Jenkins. https://build.kde.org/job/Frameworks/job/kconfig/

You need to look harder ;) Those yellow bubbles mean tests are failing

https://build.kde.org/job/Frameworks/job/kconfig/job/kf5-qt5%20SUSEQt5.11/

Build #3 was fine, build #4 broke the tests

Build #4 was triggered by this patch https://build.kde.org/job/Frameworks/job/kconfig/job/kf5-qt5%20SUSEQt5.11/4/

Clearer now?

Found it. This is the failing test.

FAIL!  : KConfigTest::testEncoding() Compared values are not the same
   Actual   (lines.first())                    : "[UTF-8:\\xc3\\xb6l]\n"
   Expected (QByteArray("[UTF-8:\xc3\xb6l]\n")): "[UTF-8:\xC3\xB6l]\n"
   Loc: [/home/jenkins/workspace/Frameworks/kconfig/kf5-qt5 SUSEQt5.11/autotests/kconfigtest.cpp(463)]

The test writes a config file with this group header:

[UTF-8:öl]

The above patch breaks that. Instead it outpus the escaped bytes:

[UTF-8:\xc3\xb6]

I'm working on a fix now.

Here is a patch that solves the problem:

diff --git a/src/core/kconfigini.cpp b/src/core/kconfigini.cpp
index 39e5936..b674973 100644
--- a/src/core/kconfigini.cpp
+++ b/src/core/kconfigini.cpp
@@ -673,7 +673,12 @@ QByteArray KConfigIniBackend::stringToPrintable(const QByteArray &aString, Strin
         switch (s[i]) {
         default:
             // The \n, \t, \r cases (all < 32) are handled below; we can ignore them here
-            if (((unsigned char)s[i]) < 32 || ((unsigned char)s[i]) >= 127) {
+            if (((unsigned char)s[i]) < 32) {
+                goto doEscape;
+            }
+            // GroupString and KeyString should be valid UTF-8, but ValueString
+            // can be a bytearray with non-UTF-8 bytes that should be escaped.
+            if (type == ValueString && ((unsigned char)s[i]) >= 127) {
                 goto doEscape;
             }
             *data++ = s[i];

Can you submit that patch using Phabricator?