[kioslave/file] Add a codec for legacy filenames
ClosedPublic

Authored by cfeck on Jan 10 2019, 5:40 PM.

Details

Summary

UNIX filenames can contain any bytes (except \0 and /). Qt's QFile::decodeName() calls QString::fromLocal8Bit(), assuming that all filesystems use the system's locale encoding. For filenames that have been created with a different encoding, and have not yet been converted (e.g. using convmv), this creates non-reversible U+FFFD (REPLACEMENT CHARACTER) code points in the filenames.

For example, some old-style archives might not contain any information about the encoding of the filenames, and even today archivers extract them without trying to convert to the locale's encoding.

While full support for those filenames is not needed, Dolphin should at least be able to delete, rename, and move those files. Since all actual (local) file handling is done inside the file kioslave, patching Dolphin will not help.

This code is a near verbatim copy of the code we had in kdelibs, written by Szókovács Róbert. Only minor adaptions to Qt5 were done. It decodes invalid bytes as U+10FExx from Plane 16 (Supplementary Private Use Area-B) to be able to encode them later.

Dolphin could detect filenames with those characters, and either mark them (by color or overlay icon), or even automatically offer to rename them.

CCBUG: 204768
CCBUG: 165044

Test Plan

touch "/tmp/test-"$'\377'".txt"
dolphin /tmp

Copying and deleting a test file worked with this code, failed without.

Diff Detail

Repository
R241 KIO
Lint
Lint Skipped
Unit
Unit Tests Skipped
cfeck created this revision.Jan 10 2019, 5:40 PM
Restricted Application added a project: Frameworks. · View Herald TranscriptJan 10 2019, 5:40 PM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
cfeck requested review of this revision.Jan 10 2019, 5:40 PM
ngraham added a subscriber: ngraham.

I tried with my script that tried to reproduce that "kio skips files" issue
https://raw.githubusercontent.com/n3rdopolis/otherstuff/master/kiocopy
(changing the TEST_INVALID_UNICODE_CHARS=0 line to TEST_INVALID_UNICODE_CHARS=1 )
and then letting it run.
Unfortunately, I get "The file protocol died unexpectedly

cfeck updated this revision to Diff 49684.Jan 17 2019, 12:45 AM

Fix worst case estimate. We need up to two UTF-16 code units for each byte.

Awesome! For what I can input, base file operations work. I can copy, delete, rename files, and testing, and diffing the trees with my kiocopy test script, it copies them perfectly. However, for example, some of the files I cannot open with kate

cfeck added a comment.Jan 17 2019, 4:11 AM

It's possible that Kate doesn't use KIO for local files, but falls back to QFile. It _would_ be possible somehow (e.g. via the QPA plugins) to inject the hack into all Qt applications, but I suggest to improve Dolphin in a way that it shows an error message or even a rename dialog when trying to open or execute such a file.

cfeck added a comment.Jan 17 2019, 4:12 AM

Thanks for testing, btw.

Wow, it's nice that this is fixable, I thought it wasn't.

I can't really review the code itself though, too clueless about utf8 details. Thiago might be able to review this.

Hi Christoph,

I just wanted to say thanks for working on this. It will be great when I can interact with these files again via KDE and its suite of apps.

The encode/decode functions were already reviewed for kdelibs4. It's the remaining code that needs review.

cfeck retitled this revision from [WIP/RFC] [kioslave/file] Add a codec for legacy filenames to [kioslave/file] Add a codec for legacy filenames.May 16 2019, 10:55 PM
cfeck edited the summary of this revision. (Show Details)
dfaure accepted this revision.May 17 2019, 7:19 PM
dfaure added inline comments.
src/ioslaves/file/legacycodec.cpp
118

[it would be so much more readable to move the assignment to the previous line...]

src/ioslaves/file/legacycodec.h
21

this "1" here is unusual :-)

36

remove the extra ';', some compilers warn about that.

This revision is now accepted and ready to land.May 17 2019, 7:19 PM
This revision was automatically updated to reflect the committed changes.