smb: add hack to support spaces in workgroup names
ClosedPublic

Authored by sitter on Mar 3 2020, 1:13 PM.

Details

Summary

workgroup names are as best I can tell always still netbios names which
means they can contain a bunch of characters ordinarily not found in valid
host names. e.g. spaces
this causes trouble with the IANA SMB URI draft, as used by libsmbc,
since the workgroup would be the host field of the RI when browsing
a workgroup (i.e. filtering hosts that are member of a given workgroup)
because QUrl does not allow invalid hostnames in the host field.

to bypass this problem we now put the workgroup name into the query of the
url as kio-workgroup, should it cause trouble in the host field. SMBUrl
takes this query into account when constructing the url for smbc.
since the latter has uniquely exciting potential for breakage this entire
dance is only done when absolutely necessary and otherwise we continue with
all the same code and behavior as without this commit.

on a side note: the awkward name flexibility seems to not extend to
computer names anymore (supposedly because of LLMNR) so this entire
use case is already very niche as we (and libsmbclient) currently only
support workgroup browsing for NT1 networks, and NT1 is by default not
supported on windows10 or samba.

FIXED-IN: 20.04
BUG: 204423

Test Plan

builds, test passes, can browse workgroup with space in name

Diff Detail

Repository
R320 KIO Extras
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
sitter created this revision.Mar 3 2020, 1:13 PM
Restricted Application added projects: Dolphin, Frameworks. · View Herald TranscriptMar 3 2020, 1:13 PM
Restricted Application added subscribers: kfm-devel, kde-frameworks-devel. · View Herald Transcript
sitter requested review of this revision.Mar 3 2020, 1:13 PM
thiago added a comment.Mar 3 2020, 4:13 PM

Looks good, but testing should be extended. I suggest:

  1. A workgroup with non-US-ASCII characters in the name, if that's permitted
  2. A workgroup with % in the name: the URL should have "%25"
  3. Roundtrip checking. The test only does toSmbcUrl(), so please test the reverse.
  4. Error-checking the conversion from query to hostname: things like %00 (a NUL byte), URL delimiters (':', '/', '?' and '#'), byte sequences that do not encode UTF-8 (like %FF).

I think that you'll find that % works in one direction only. I have no idea what you'll find for #4.

smb/smburl.cpp
131

For the % issue, you may want to pass QUrl::FullyDecoded as the second argument to queryItemValue.

sitter updated this revision to Diff 76934.Mar 4 2020, 1:40 PM

extend test coverage to % character in wg and umlaut in wg.
I've also changed the construction in browse.cpp to use QUrlQuery so it does not trip over potential hash or question marks in the workgroup

libsmbc is actually incredibly lenient in parsing the input urls, so you can more or less throw anything unencoded at it and it'll work.
that is why the original sambaUrl.toString(PrettyDecoded) call works despite also not carying much for encoding.

@thiago I am not sure I understood points 3 and 4 but I think they aren't really applicable:

  1. libsmbc doesn't give out urls so there is no need to convert the other way around
  2. slash, backslash, colon may not be part of netbios names and by extension the url parsing always works regardless of encoding as the host field cannot be ambigious with a trailing slash (which we always have)

All that said, perhaps it'd make sense to change everything to QUrl::FullyEncoded to be on the safe side for the future? smbc does know how to deal with complete percent encoding from what I can tell

thiago added a comment.Mar 4 2020, 3:35 PM

Still want to see that round-trip.

smb/autotests/smburltest.cpp
115

Please don't use QUrl's URL-correction in the constructor. This is not a valid URL. Use %25 here.

sitter added a comment.Mar 4 2020, 4:14 PM

Still want to see that round-trip.

But why? Converting an smbcUrl to a QUrl would literally be useless code.

Still want to see that round-trip.

But why? Converting an smbcUrl to a QUrl would literally be useless code.

@thiago ^

Still want to see that round-trip.

But why? Converting an smbcUrl to a QUrl would literally be useless code.

Are you saying you never convert the URL coming from the library so it can be displayed in KIO/Dolphin? Are you sure you always decompose its parts and recompose a QUrl from it?

Yep. I'm 100% certain of this. The library in fact has no API that returns a complete URL or anything near a complete URL. It's using dirent-inspired API to let us iterate/stat paths and only ever returns paths relative to whatever input it got, from those paths we then compose the actual output URLs again.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 6 2020, 9:29 AM
This revision was automatically updated to reflect the committed changes.