Depending on the locale, python3 may try to decode the source as ASCII
when the file is opened in text mode. This will fail as soon as the
code contains utf-8, e.g. (c) symbols.
While it is possible to specify the encoding when reading the file,
this is bad for several reasons:
- only a very small part of the source is processed via _read_source, no need to decode the complete source and store it as string objects
- the clang Cursor.extent.{start,end}.column refers to bytes, not multibyte characters.
While python2 processes utf-8 containing sources without error messages,
wrong extent borders are also an issue.
The practical impact is low, as the issue only manifests if there is a
multibyte character in front of *and* on the same line as the read token.