icalformat: remove QByteArray->QString->QByteArray roundtrip.
ClosedPublic

Authored by dfaure on Oct 30 2016, 11:40 AM.

Details

Summary

also removes a useless trimmed() call.

Test Plan

I've been using an ical resource with this change for a few months.

Diff Detail

Repository
R172 KCalendar Core
Branch
Applications/17.12
Lint
No Linters Available
Unit
No Unit Test Coverage
dfaure updated this revision to Diff 7754.Oct 30 2016, 11:40 AM
dfaure retitled this revision from to icalformat: remove QByteArray->QString->QByteArray roundtrip..
dfaure updated this object.
dfaure edited the test plan for this revision. (Show Details)
dfaure added reviewers: djarvie, smartins.
dfaure added a subscriber: KDE PIM.
Restricted Application added a project: KDE PIM. · View Herald TranscriptOct 30 2016, 11:40 AM
Restricted Application added a subscriber: kde-pim. · View Herald Transcript
knauss added a subscriber: knauss.Oct 30 2016, 1:05 PM

This roundtrip may makes a difference if you have non UTF-8 encoded files- did you check against those? I know that I added those roundtrips at other places, because otherwise the string weren't formatted correctly.

Removing the trimmed function means, that for files with only spaces, you do not return true, but will push this to fromRawString function. Is this function aware of those files? Do we will get the same result aka true?

It would be nice if you add tests for this, to make sure nothing is changed.

You'll have to explain to me how a textstream with ts.setCodec("UTF-8"); could possibly handle non-utf8 input.

AFAIK it cannot, therefore the objection is moot.

OK for keeping the trimmed(), I hadn't noticed the isEmpty() check below.

Okay I was wrong - it is other way round.

QFile.readAll() is using QTextCodec::codecForLocale() for reading the file, that means, that if the user has a strange locale the file is read with this locale.
If we want to define the encoding we need to use QTextStream ( see Qt documentation for QFile ).

I believe you are wrong. QFile::readAll() doesn't use any codec at all. It reads 8-bit data, straight into a QByteArray, without any conversion.

In D3203#59579, @dfaure wrote:

I believe you are wrong. QFile::readAll() doesn't use any codec at all. It reads 8-bit data, straight into a QByteArray, without any conversion.

Okay yes, this discription is more correct - But still, if we don't do the roundtrip through QTextStream, we don't make sure, that we have utf-8 encoded QByteArray afterwards and libical do not care about the encoding.

See it like this: now if fails to convert a file to UTF-8 at line 94/95 afterwards, it may fail in the middle of something, because now everything can expect that the calendar data is UTF-8 encoded. With your patch this isn't true anymore.

I know this is nippicking and it is all fine for 99% of the users, because nearly everyone has UTF-8 as default file encoding, but still you can't be sure.

Well, this time I believe you are right, but it seems like a high price to pay for the 99% cases where everything is fine (a useless roundtrip conversion), just to detect earlier a possible 1% of broken files (which are broken anyway). And actually the error message might be better after this patch than before (where invalid data just gets mangled by the utf8 encoding/decoding)... (depends on what libical does).

I do not believe in making everything slower just to change the error handling for an invalid case (and not necessarily for the better).

I can accept the argument that a unittest for what happens to invalid non-utf8 data should be added. This runs the risk that I lose motivation for this patch though ;-)

I have fixed issues with strange locales/encodings for kolab-client and there I learn, if you make sense to fail as early as possible :)

I do not believe in making everything slower just to change the error handling for an invalid case (and not necessarily for the better).

Well is it really that much slower?A file should be reread once in some hours/minutes.
I am not happy with the error handling he have at the moment (none). But I see the potential of you patch to add nonvalid data to akonadi and than somewhere in korganzier this can crash and than tell the user, oh that was the wrong file encoding?

UTF-8 encoded text can only be present in iCal property values and not property names (otherwise it's not a valid iCal file and that's up to libical to deal with without crashing). libical itself does not attempt to parse unstructured property values (like Summary or Description), it just returns the string to the caller as a char*. What KCalCore does is it calls QString::fromUtf8() on this char*, which internally calls QUtf8::convertToUnicode(). QTextStream, when used with the "UTF8" codec (i.e. QUtf8Codec), internally uses the same method (QUtf8::convertToUnicode()) to convert the text.

Therefore I don't think there's any benefit in running the entire iCal string through QUtf8::convertToUnicode() if we then do it for each of the property values again when converting them to QString. So +1 to David's patch, just re-add the trimmed() call please.

dfaure updated this revision to Diff 25636.Jan 19 2018, 12:53 PM

re-add trimmed()

dfaure updated this revision to Diff 25637.Jan 19 2018, 12:58 PM

and now it compiles

dvratil accepted this revision.Jan 19 2018, 12:59 PM
This revision is now accepted and ready to land.Jan 19 2018, 12:59 PM
dfaure closed this revision.Jan 19 2018, 1:06 PM