KDevelop's GDB pretty printer units tests fail due to non-UTF-8 default codec
Closed, ResolvedPublic

Description

Disclaimer: I have no real clue about Python & gdb's pretty printer internals, so far only at outer shell of problem.

On CI KDevelop's GDB pretty printer units tests fail, but not locally for me (same for others). At least not out-of-the-box. Only looking at situation with openSUSE for a start here (fails on BSD & Windows on CI as well, but perhaps for other reasons, also cannot compare).

The error message hints to some encoding problem, compare e.g. the first error from Python being forwarded to be
"$1 = Python Exception <class 'UnicodeEncodeError'> 'ascii' codec can't encode characters in position 4-13: ordinal not in range(128): \n"
(see complete test result)

By the string positions and my intermediate understanding of the code, some Python code fails over the Chinese(?) chars in the string in https://phabricator.kde.org/source/kdevelop/browse/master/plugins/debuggercommon/tests/debuggees/qstring.cpp$4 when it tries to format this for output to the gdb log (not exactly sure, but so far working theory).

Locally I could reproduce that error by setting both the env variables LANG and LC_CTYPE tp C (which normally are LANG=de_DE.UTF-8 and LC_CTYPE=de_DE.UTF-8, LC_ALL not set).
The failing tests could be also reproduced by @pprkut by just setting LANG to C (whose system has LC_CTYPE as empty string, LC_ALL not set as well).

Which for now opens the questions what the respective env vars on CI are, and if perhaps they might need some fix-up to have UTF.8 system encoding promoted as needed. Or if there is some other Python settings which influence things and which might need adapton.

So far my braindump here, hoping that you @bcooksley might already be able to give some helping feedback :)

kossebau created this task.Jul 20 2018, 4:02 PM
Restricted Application added a subscriber: sysadmin. · View Herald TranscriptJul 20 2018, 4:02 PM
kossebau updated the task description. (Show Details)Jul 20 2018, 4:07 PM
kossebau added a project: KDevelop.

Looks like our locale was defaulting to POSIX which probably wasn't going to work...

Is this resolved now following those rebuilds @kossebau ?

arrowd added a subscriber: arrowd.Jul 24 2018, 11:25 AM

This also fails for me on FreeBSD, despite setting LANG to ru_RU.UTF-8. The error message seems to be the same:

QDEBUG : QtPrintersTest::testQString() "$1 = Python Exception <type 'exceptions.UnicodeEncodeError'> 'latin-1' codec can't encode characters in position 4-13: ordinal not in range(256): \n"
FAIL!  : QtPrintersTest::testQString() Unexpected Python Exception
   Loc: [/home/arr/projects/kdevelop/plugins/gdb/unittests/test_gdbprinters.cpp(86)]

I'm also interested how to fix that.

kossebau added a comment.EditedJul 24 2018, 11:51 AM

So here the bits I can tell:

That commit sets LANG to en_US.UTF-8. Now at least for GNU worlds (possibly POSIX) LANG only provides a default for the LC_* variables, in case those are not explicitly set (see e.g. documentation at https://www.gnu.org/software/gettext/manual/html_node/Locale-Environment-Variables.html#Locale-Environment-Variables).

So when it comes to char encoding the rule is: LC_ALL if set overrides LC_CTYPE which defaults to LANG. (my experience though only based on gettext related stuff, no idea how Python interprets those env vars to decide on system/output encoding).

Which means just setting LANG might not have the wanted effect if either LC_ALL or LC_CTYPE specify an encoding.

Is this resolved now following those rebuilds @kossebau ?

openSUSE: seems that change was enough, test no longer fails.

FreeBSD: the related test still fails on CI [F], so possibly the setting of LANG there is not enough. @arrowd Can you tell what your other locale related env vars are and perhaps play a bit with them to see what has an effect for you?

F) https://build.kde.org/view/KDevelop/job/KDevelop%20kdevelop%20kf5-qt5%20FreeBSDQt5.10/45/testReport/junit/(root)/TestSuite/test_gdbprinters/

kossebau updated the task description. (Show Details)Jul 24 2018, 11:53 AM

Only relevant env vars are

LANG=ru_RU.UTF-8
LANGUAGE=en_US.UTF-8
MM_CHARSET=UTF-8

Changing them have no effect.

Some other observations:

# python
>>> import locale
>>> locale.getlocale()
(None, None)
>>> print 'Ȥ'
Ȥ
kossebau added a comment.EditedJul 24 2018, 12:16 PM

@arrowd No LC_ ones set? What is the output of "locale"? Assuming POSIX stuff is still a bit relevant on FreeBSD (cmp. also http://pubs.opengroup.org/onlinepubs/007908799/xbd/locale.html) :)

BTW; LANGUAGE might be only looked at by GNU gettext for deciding on the languages to use for picking translations, not sure if antything else looks at that.
See also https://www.gnu.org/software/gettext/manual/html_node/The-LANGUAGE-variable.html

Which python version did you test with?

Oh, locale outputs

# locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_ALL=

Python is 2.7.15.

Thanks for the info @arrowd . Does not trigger any further idea with me, still hoping for other people to have some :)

Another thing to try perhaps to get more insight (no clue if relevant):
add

execute("show charset");

to GdbProcess constructor (next to other execute() calls) in plugins/gdb/unittests/test_gdbprinters.cpp and have a look what appears in the log when running the test.

For me it is:

QDEBUG : QtPrintersTest::testQString() "show charset"  =  "The host character set is \"auto; currently UTF-8\".\nThe target character set is \"auto; currently UTF-8\".\nThe target wide character set is \"auto; currently UTF-32\"."

Aha, that indeed gives a clue:

The host character set is \"auto; currently ISO-8859-1\".\nThe target character set is \"auto; currently ISO-8859-1\".\nThe target wide character set is \"auto; currently UTF-32

So, the culprit is GDB?

Now that is some trace indeed, and yesterday I also noticed that for the normal gdb sessions there is a set of initial setups commands, where one also explicitly sets the charset.
See all the MI commands here: https://phabricator.kde.org/source/kdevelop/browse/master/plugins/gdb/debugsession.cpp$99

So we might simply need to do that very setting here as well for ourselves and this test.
@arrowd Can you try if adding to the same place as above an

execute("set charset UTF-8");

fixes the bug for you?

I think the problem is how FreeBSD packages GDB.

When I type set charset in GDB prompt and then press TAB I get

ISO-8859-1 auto

And typing set charset UTF-8 returns

Undefined item: "UTF-8"

It seems, UTF support should be enabled when building GDB itself somehow.

kossebau renamed this task from KDevelop's GDB pretty printer units tests fail due to non-UTF-8 default codec (openSUSE) to KDevelop's GDB pretty printer units tests fail due to non-UTF-8 default codec.Jul 25 2018, 10:19 AM

That sounds like a bigger hurdle then :/

Given that the normal gdb session of the GDB plugins try to set this charset as well, by (see above source link to debugsession.cpp)

addCommand(MI::GdbSet, QStringLiteral("charset UTF-8"));

those commands should then fail as well in some way. Could not see any replies in the log of the few tests on CI I quickly looked at, tests always died before the charset command was actually "SEND" (cmp. log notation).

For the rest seems this needs some FreeBSD experts to continue here, so I added the respective tag for a start while stepping back for now myself.

For the rest seems this needs some FreeBSD experts to continue here, so I added the respective tag for a start while stepping back for now myself.

Thanks for your insights. I've opened a PR at FreeBSD Bugzilla about this problem.

devel/gdb package has been updated to support UTF-8 in FreeBSD. With gdb-8.1_5 the test passes.

@bcooksley , you are probably need to update gdb package on FreeBSD CI.

It'll be up to @tcberner and @adridg as they look after the FreeBSD builders.

Let me build new packages then :)

@arrowd, the pkgs are upgraded, and gdb is at 8.1_5.

Now the test says

QDEBUG : QtPrintersTest::testQString() "break qstring.cpp:5"  =  "Undefined command: \"import\".  Try \"help\"."

Smells like python support in GDB turned off or something like that.

Hm, python is on:

BUNDLED_READLINE: off                                                                                                                                                                                                         
DEBUG          : off 
GDB_LINK       : on
GUILE          : off    
KGDB           : on    
PORT_ICONV     : on                                                                                                                                                                                                           
PORT_READLINE  : on                                                                                                                                                                                                         
PYTHON         : on                                                                                                                                                                                                          
SYSTEM_ICONV   : off                                                                                                                                                                                                    
TUI            : on

Any update on this?

KDevelop tests on FreeBSD are disabled now due to hangs, but the last test run shows that python support is working OK. The test fails somewhere in the middle, though, but python gets initialized OK.

Thanks for the update.

Based on that last run, is there anything for us to work on here or is the issue in the test something that needs to be fixed there?

Yes, I think the issue is in the test itself. Thanks for poking with CI.

bcooksley closed this task as Resolved.Aug 13 2018, 12:24 PM
bcooksley claimed this task.

No worries!

Please reopen this if it does end up being a CI problem.