Adapt to QTextCodec changes
Open, Needs TriagePublic

Description

QTextCodec in its current form does not exist in Qt6 (actually it exists, but as part of the qt5compat module).

As a replacement Qt6 has QStringConverter: https://doc-snapshots.qt.io/qt6-dev/qstringconverter.html

The good news is that stuff like textstream.setCodec(QTextCodec::codecForName("UTF-8")); is no longer needed in Qt6 since QTextStream defaults to utf-8.

The bad news is that QStringConverter does not support most of the codecs that QTextCodec supports. It only supports only UTF-8/16/32 and Latin1. See https://bugreports.qt.io/browse/QTBUG-75665 for context. I some places, e.g. Kate we allow the user to choose from a wide range of codecs. https://bugreports.qt.io/browse/QTBUG-86437 indicates that support for other codecs may come back in a future Qt6 release in some form.

There seems to be no equivalent of QTextCodec::codecForLocale in Qt6

QTextCodec is part of the API of KCharSets, which offers additional API around it, e.g. resolving alternative names. Other than that it's not used in public frameworks API, but used internally quite a bit.

Sonnet makes use of QTextCodec internally:

  • One usage is removed by https://invent.kde.org/frameworks/sonnet/-/merge_requests/18.
  • The hunspell plugin uses it to convert from QString to the dict's encoding. All dicts on my system seem to be either UTF-8 or ISO8859-1 (ag -G "\.aff$" "^SET"), if we can assume that QString or QStringConverter API should be enough. It also uses QTextCodec::codecForLocale() as a fallback if the speller's encoding is not found
  • The hspell plugin requires iso8859-8-i, that is not covered by QStringConverter.

KTextEditor uses QTextCodec. Kate should be able to open and save files with all kinds of encodings.

KIO uses QTextCodec internally:

KBookmarks uses QTextCodec/KCharset in the netscape bookmark importer for converting from QTextCodec::codecForLocale() or UTF-8. Given that Netscapt isn't too relevant these days the whole thing can maybe get killed?

KFileMetaData's plaintextextractor uses QTextCodec::codecForLocale and assumes that the processed text is in that locale. Maybe it should use KEncodingProber to try to find the right one?

KCodecAction uses QTextCodec::codecForLocale() as a fallback when the mib isn't found

Related Objects

nicolasfella moved this task from Backlog to Needs Input on the KF6 board.
nicolasfella updated the task description. (Show Details)Feb 24 2021, 4:59 PM
nicolasfella updated the task description. (Show Details)Feb 24 2021, 5:10 PM
nicolasfella updated the task description. (Show Details)
nicolasfella updated the task description. (Show Details)Feb 24 2021, 6:43 PM
nicolasfella updated the task description. (Show Details)Feb 24 2021, 7:00 PM
nicolasfella updated the task description. (Show Details)Feb 25 2021, 1:14 PM
nicolasfella updated the task description. (Show Details)
nicolasfella updated the task description. (Show Details)Feb 25 2021, 1:18 PM
dfaure added a subscriber: dfaure.Feb 25 2021, 1:19 PM

Just a thought: should we fork QTextCodec as KTextCodec and make our lives simpler in the process?

I think it at the moment is still alive in the Qt5Compat module:

https://code.qt.io/cgit/qt/qt5compat.git/tree/src/core5/codecs?h=6.1

Therefore I would tend to say at least for the internal uses, in e.g. KTextEditor, we can first depend on this and if it goes away or the improvements in core aren't in the way we want, we can still fork it afterwards?

nicolasfella updated the task description. (Show Details)Feb 25 2021, 1:39 PM

What slightly worries me is the behavior change of QTextStream. In Qt5 it uses QTextCodec::codecForLocale() by default (Which is UTF-8 for me, but no idea how common other values are). If a Qt5 app saves some data using a non-UTF8 codec implicitly we're in for a nasty surprise when the app is ported to Qt6 and starts reading the same data as UTF-8

The locale encoding is always UTF-8 on Unix and some "more local" locale (latin1, koi8-r, etc) on Windows.
So what you described can only happen on Windows. Still bad, of course, for reading existing text files. But the user base isn't exactly as big as on Unix, for KDE software. This is just very surprising overall for Qt on Windows. In the JIRA task you linked, Thiago initially said "QTextStream needs the ability to select between UTF-8 and the locale's 8-bit charset". But you seem to say this didn't happen?

The locale encoding is always UTF-8 on Unix and some "more local" locale (latin1, koi8-r, etc) on Windows.

Is it always the case? Even when I use a locale like zh_CN.gb18030? That would mean https://invent.kde.org/frameworks/kauth/-/blob/master/src/kauthhelpersupport.cpp#L76 is useless

I agree that there is no rush to fork, since stuff in compat libs usually stay available until the next major version release, so qt5combat will probably still be there until Qt 7 (same story with kdelibs4support, right? :))

In the JIRA task you linked, Thiago initially said "QTextStream needs the ability to select between UTF-8 and the locale's 8-bit charset". But you seem to say this didn't happen?

I didn't see it indicated in the API docs, but I haven't looked at the code, so I don't really know

dfaure moved this task from Needs Input to Waiting on Qt Changes on the KF6 board.Apr 3 2021, 1:59 PM

From building against Qt6, one implication of QTextCodec being moved to the Qt5 compat module, is that QTextStream::setCodec() is gone in Qt6, replaced by QTextStream::setEncoding(), which doesn't have the same range of encodings that are available in QTextCodec.

So even if we "fork" QTextCodec into e.g. KCoreAddons, QTextStream won't have QTextCodec support in Qt6. (Maybe that was obvious already for others, but I've only realised this now :-)).

We might need to use QTextCodec in ark to fix this bug: https://bugs.kde.org/show_bug.cgi?id=393901#c12

Basically we would need something like this:

diff --git a/plugins/libzipplugin/libzipplugin.cpp b/plugins/libzipplugin/libzipplugin.cpp
index e9cb49f2..dc6b0ee0 100644
--- a/plugins/libzipplugin/libzipplugin.cpp
+++ b/plugins/libzipplugin/libzipplugin.cpp
@@ -24,6 +24,7 @@
 #include <utime.h>
 #include <zlib.h>
 #include <memory>
+#include <QTextCodec>
 
 K_PLUGIN_CLASS_WITH_JSON(LibzipPlugin, "kerfuffle_libzip.json")
 
@@ -723,7 +724,10 @@ bool LibzipPlugin::extractEntry(zip_t *archive, const QString &entry, const QStr
                 }
                 setPassword(query.password());
 
-                if (zip_set_default_password(archive, password().toUtf8().constData())) {
+                QString pwd = password();
+                QTextCodec *codec = QTextCodec::codecForName("windows-1250");
+                QByteArray encodedString = codec->fromUnicode(pwd);
+                if (zip_set_default_password(archive, encodedString.data())) {
                     qCDebug(ARK) << "Failed to set password for:" << entry;
                 }
                 firstTry = false;

Is there any alternative for this job? Can KCodecs convert from UTF-16 to other text encodings?

For now using QTextCodec is fine for that, it's still available in qt5compat

I think from a frameworks POV we can consider this done.

The only usage of QTextCodec I can find in frameworks is internally in KEncodingFileDialog

nicolasfella moved this task from In Progress to Done on the KF6 board.Jun 28 2023, 11:31 AM