handle non-ASCII encodings of file names in tar archives
ClosedPublic

Authored by ibragimovrinat on Aug 7 2018, 3:54 PM.

Details

Summary

BUG: 266141

As UTF-8 may use multiple bytes per character, it's required to measure encoded file name length in bytes, not just in characters. It's possible to accidentally truncate file name if its name in characters is shorter that 100, but in bytes — longer than 100. Also, checksum calculation must be performed on unsigned bytes. Otherwise bytes in range 0x80-0xff when promoted to int become negative.

Diff Detail

Repository
R243 KArchive
Lint
Lint Skipped
Unit
Unit Tests Skipped
ibragimovrinat created this revision.Aug 7 2018, 3:54 PM
Restricted Application added a project: Frameworks. · View Herald TranscriptAug 7 2018, 3:54 PM
Restricted Application added a subscriber: kde-frameworks-devel. · View Herald Transcript
ibragimovrinat requested review of this revision.Aug 7 2018, 3:54 PM
cfeck added a subscriber: cfeck.Aug 7 2018, 4:09 PM
cfeck added inline comments.
src/ktar.cpp
777

Does the spec say > 99 or > 100? The comment and the code should match.

Thanks for proposing a patch. for the bug @ibragimovrinat
Myself I only contributed once in former times to karchive, all memories lost, not sure if I have time to dive into that codebase again, despite @broulik here poking me :)

One thing though for sure which would be wanted here is hardening the code by having some unit tests covering ascii-only and non-ascii file names. So whoever might review this patch surely would like to see some samples tested somewhere below autotests/ :)

Sorry, I just added you as I looked at who committed to that repo most, to ensure this review doesn't go unnoticed for the lack of reviewers :)

dfaure added a comment.Aug 8 2018, 8:08 AM

Looks good, but indeed a unittest addition would be very welcome.

Two links that might be useful when writing the unit test:
https://community.kde.org/Guidelines_and_HOWTOs/UnitTests
https://wiki.qt.io/Writing_Unit_Tests
Also, there are already a few unit tests in this repo that you could take a look at.

Thank you for your patch!

ibragimovrinat added inline comments.Aug 9 2018, 4:33 PM
src/ktar.cpp
777

The comment is right in some sense, 100 bytes means 99 bytes of name and 1 byte of '\0'.

I don't know for sure if there is a (single) spec actually. At least GNU tar (http://www.gnu.org/software/tar/manual/html_node/Standard.html) mentions that although name[100] is a 100-byte field, "The name, linkname, magic, uname, and gname are null-terminated character strings." So, there could be string of at most 99 bytes, plus a zero-terminator. However, there is a special case in this source file where 100-byte but non-zero-terminated names are handled (search for "bug:101472" string in the source). Maybe, there are such tar files in the wild.

How should I change the comment?

Added couple of tests. One tests that file name's not truncated, and the other tests that checksum is calculated in a correct way.

dfaure requested changes to this revision.Aug 12 2018, 9:13 AM

Great job, thanks for the unittest.

Just a few minor adjustments and this is good to go in.

autotests/karchivetest.cpp
776

This should be QString::fromUtf8(), because QStringLiteral doesn't support UTF-8 on Windows. I *think* it also doesn't support multiline literals, so make it a single line.

Alternatively, just use QString("...") like you do further below, that supports utf8 and multiline literals.

798

QCOMPARE(listing.count(), 1);

This revision now requires changes to proceed.Aug 12 2018, 9:13 AM

Updated diff to use QString everywhere instead of QStringLiteral and QLatin1String. As those are in tests, performance penalty shouldn't matter. Also using QCOMPARE for comparisons.

ibragimovrinat marked 2 inline comments as done.Aug 12 2018, 5:47 PM
dfaure accepted this revision.Aug 13 2018, 1:06 PM

Excellent, thanks for the contribution.

This revision is now accepted and ready to land.Aug 13 2018, 1:06 PM

Someone still needs to land this in the karchive repo, no? :) @ibragimovrinat I assume this is your first contribution, so you cannot yet push commits yourself, right? For that we will need an email address for the authorship of this patch in the commit, next to your name "Rinat Ibragimov". Which address can we use?

(Only saw this while doing house-keeoing on my phab list, as before not enough clue about katchive currently, so needs someone else to land this, some of the reviewers should be able.

aacid closed this revision.Aug 28 2018, 9:50 PM
aacid added a subscriber: aacid.Aug 28 2018, 9:58 PM

@ibragimovrinat KArchiveTest::testTarShortNonASCIINames is failing both in CI [1] and for me locally

23:54:20 FAIL! : KArchiveTest::testTarShortNonASCIINames() Compared values are not the same
23:54:20 Actual (listing.count()): 0
23:54:20 Expected (1) : 1
23:54:20 Loc: [/home/jenkins/workspace/Frameworks karchive kf5-qt5 SUSEQt5.9/autotests/karchivetest.cpp(815)]

Can you have a look please?

[1] https://build.kde.org/job/Frameworks%20karchive%20kf5-qt5%20SUSEQt5.9/30/console

aacid added a comment.Aug 28 2018, 9:59 PM

@ibragimovrinat ahhhhhhhhh my fault, arc didn't create the tar.gz file correctly from the diff

@ibragimovrinat Actually can you please send me that tarball file to aacid@kde.org ? I don't seem to be able to recreate that from the diff file you attached.

@aacid, I've just sent file to the e-mail. I also attached the same file to this message: