add a string suffix to test data and use for unicode testing of taglibwriter
ClosedPublic

Authored by astippich on Sep 23 2018, 5:39 PM.

Details

Summary

Append a unicode character to all test strings such that
unicode characters are tested.

Test Plan

tests pass

Diff Detail

Repository
R286 KFileMetaData
Branch
taglib_write_unicode
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 3312
Build 3330: arc lint + arc unit
astippich created this revision.Sep 23 2018, 5:39 PM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptSep 23 2018, 5:39 PM
Restricted Application added subscribers: Baloo, kde-frameworks-devel. · View Herald Transcript
astippich requested review of this revision.Sep 23 2018, 5:39 PM
svuorela added inline comments.
autotests/taglibwritertest.cpp
66

I suggest checking that the last bytes of all these extracted things is the actual actual utf8 bytes, so that if someone compiles or saves this file in a broken encoding that the testing doesn't fail.

80

Given stringSuffix is the same for all tests, is it needed to have as a test data thing ?

bruns added inline comments.Sep 23 2018, 6:51 PM
autotests/taglibwritertest.cpp
66

?
The file/tags are written inside this test.

80

My idea here was to check each format twice, once with a simple latin1/ascii string (stringsuffix = "") and a second time with some unicode chars (e.g. "€", probably some more from other code blocks).

This allows to differentiate if only unicode tags are broken.

svuorela added inline comments.Sep 23 2018, 6:56 PM
autotests/taglibwritertest.cpp
66

yeah. given you write and read it, if somehow it gets encoded e.g. as iso-8859-15 rather than utf8, the euro sign would be encoded as 0xa4 rather than 0x20ac.

As you write and read it in the same sequence, there is a possibiliyt for this to pass when it shouldn't.

Unfortunately roundtripping the files with bad editors can make this happen. Especially on windows.

astippich added inline comments.Sep 23 2018, 7:03 PM
autotests/taglibwritertest.cpp
66

To be sure I understand correctly, using stringSuffix.toUTF8() is what you would like to see here?

80

That makes more sense of course, will update later to a unicode and a none unicode test

svuorela added inline comments.Sep 23 2018, 7:25 PM
autotests/taglibwritertest.cpp
66

Something along those lines, yes please, to compare with.

bruns added inline comments.Sep 23 2018, 9:08 PM
autotests/taglibwritertest.cpp
66

This should be covered by the read tests. The read tests use binary artifacts.

smithjd added inline comments.
autotests/taglibwritertest.cpp
60–62

Is wrapping in a QString necessary?

astippich updated this revision to Diff 42264.Sep 24 2018, 6:12 PM
  • implement feedback
astippich updated this revision to Diff 42270.Sep 24 2018, 7:58 PM
  • actually use temporary string variable for unicode
astippich added inline comments.Sep 24 2018, 8:00 PM
autotests/taglibwritertest.cpp
60–62

It does not compile otherwise.

bruns added inline comments.Sep 25 2018, 10:35 PM
autotests/taglibwritertest.cpp
60–62

I don't think it is a good idea to do a stringSuffix.toUtf8().fromUtf8() round trip here for each test case and for each tag.

73

If safeguarding against bad editors is really necessary, better do it here.
Alternatively, you can use C++11 unicode string literals, e.g.
QString(u"€")
If you save that one as e.g. latin1 and try to compile it, gcc reports:

error: converting to execution character set: Invalid or incomplete multibyte or wide character
     QString a{u"�"};
bruns requested changes to this revision.Sep 27 2018, 10:15 PM
This revision now requires changes to proceed.Sep 27 2018, 10:15 PM
astippich added inline comments.Sep 28 2018, 8:17 PM
autotests/taglibwritertest.cpp
73

Can this give failures on windows similar to D14122?

bruns added inline comments.Sep 28 2018, 8:50 PM
autotests/taglibwritertest.cpp
73

Yes, probably.
Although, this gives also a pointer to a different approach:

// Add some unicode characters, use codepoints to avoid any issues with 
// source encoding: "€µ"
static const QChar data[4] = { 0x20ac, 0xb5 };
QString unicodeTestStringSuffix(data, 2);
astippich updated this revision to Diff 42525.Sep 28 2018, 9:00 PM
  • use suggested test pattern
bruns added inline comments.Sep 28 2018, 10:10 PM
autotests/taglibwritertest.cpp
152

^ unicodeTestStringSuffix ?

astippich updated this revision to Diff 42541.Sep 29 2018, 9:11 AM
  • fix after rebase
astippich marked 5 inline comments as done.Sep 29 2018, 9:12 AM
bruns accepted this revision.EditedOct 9 2018, 10:12 PM

Can you fix up the grammar in the Summary?

"Append some unicode characters to each test string such that unicode characters can be tested."

Otherwise good to go ...

autotests/taglibwritertest.cpp
75

can you change this to data[2] ...
Facepalm myself ...

This revision is now accepted and ready to land.Oct 9 2018, 10:12 PM
astippich edited the summary of this revision. (Show Details)Oct 10 2018, 4:32 PM
astippich marked an inline comment as done.
astippich added inline comments.
autotests/taglibwritertest.cpp
75

I should have spotted this myself :)

This revision was automatically updated to reflect the committed changes.
astippich marked an inline comment as done.