When generating thumbnails for directory, don't cache the previews for files stored on encrypted filesystems.
CCBUG: 411919
This change is only part of solution and will be added as dependency for the rest of the fix.
Lint Skipped |
Unit Tests Skipped |
I think the patch as is has some structural problems. I think it would be significantly cleaner when:
You can then call the check function once on the directory, at the same place the QDirIterator is instantiated (line 538).
And can you please use arc for uploading the patch, it is really hard to do a review without context.
Use Solid and ask it if the device is on an encrypted device. If Solid does not have such an API, add it.
I tried to research Solid using api.kde.org (https://api.kde.org/frameworks/solid/html/classSolid_1_1Device.html, https://api.kde.org/frameworks/solid/html/classSolid_1_1StorageVolume.html) and looking for usages of both Solid::Device and Solid::StorageVolume in code but I'm not able to get StorageVolume instance for given file/directory. Could someone help me with that?
The good starting point would be the main Solid Tutorial :
https://techbase.kde.org/Development/Tutorials/Solid/Device_Discovery
Solid does not provide straight folder => StorageVolume yet, but I think Solid could have such a utility feature added.
Something like Solid::Device::findByPath(), it would need to canonically and recursively resolves the path parent to pay attention to symlinks.
This would also help D26407
KMountPoint::findByPath has it although it is not perfect. Compared to solid it does not refresh automatically and can't send signals and does not know about encryption.
Could someone help me with that?
Sure. I have been dreaming about such feature for a while.
Solid does not provide straight folder => StorageVolume yet, but I think Solid could have such a utility feature added.
Something like Solid::Device::findByPath(), it would need to canonically and recursively resolves the path parent to pay attention to symlinks.
This would also help D26407
No recursion needed, stat provides the device.
And can you please use arc to upload the patch - it is nearly impossible to do a review with the missing context
Only when the file is not a symlink, if so we need to check the symlink target recursively, that's what I meant.
And can you please use arc to upload the patch - it is nearly impossible to do a review with the missing context
Or pushing it to https://community.kde.org/Infrastructure/GitLab
Haven't checked the code here en detail, but the whole thumbnailing code has to work on the resolved symlinks anyway - for the data, for the thumbnail filename [1], and for the save policy.
[1] https://specifications.freedesktop.org/thumbnail-spec/thumbnail-spec-latest.html#THUMBSAVE
- You need the absolute canonical URI for the original file, as stated in URI RFC 2396. In particular this defines to use three '/' for local 'file:' resources (see example below).
- Calculate the MD5 hash for this URI. Not for the file it points to! This results in a 128bit hash, which is representable by a hexadecimal number in a 32 character long string.
And can you please use arc to upload the patch - it is nearly impossible to do a review with the missing context
Or pushing it to https://community.kde.org/Infrastructure/GitLab
That would even be better, yes.
Ok, so, what I want to do now is to create static method findByPath which is going to return Solid::StorageVolume instance (is there a case in which we can expect something different than StorageVolume?).
Should it be StorageVolume Device::findByPath(QString) or rather StorageVolume StorageVolume::findByPath(QString)?
For implementation itself I want to create structure with mountpoints and StorageVolumes which will be updated if new Device is added/removed and we learn this via Solid notifications.
I am thinking it should either be part of DeviceManagerStorage or separate class similar to DeviceManagerStorage. Not sure which.
I don't know how to get a mountpoint for StorageVolume.
What do you think about it?
I don't think so, Storage Volume is any block device.
This function may fail if the input path is not valid for instance.
Should it be StorageVolume Device::findByPath(QString) or rather StorageVolume StorageVolume::findByPath(QString)?
My preference is StorageVolume Device::findByPath(QString), to keep all entry points to Solid in Device and somewhat similar to the Device::listFrom* functions.
For implementation itself I want to create structure with mountpoints and StorageVolumes which will be updated if new Device is added/removed and we learn this via Solid notifications.
Yeah that's what I imagined as well, but this is already taken care of by the base DeviceManager.
I am thinking it should either be part of DeviceManagerStorage or separate class similar to DeviceManagerStorage. Not sure which.
Either way you can simply wrap the base DeviceManager.
Your method should only be a utility function not adding new Structures, everything is in Solid already.
I don't know how to get a mountpoint for StorageVolume.
You make me realize you should filter StorageVolume (any block device) that are also StorageAccess (anything that can be mounted) which has the mountpoints in filePath().
https://api.kde.org/frameworks/solid/html/classSolid_1_1StorageVolume.html
https://api.kde.org/frameworks/solid/html/classSolid_1_1StorageAccess.html
What do you think about it?
I like it
Ok, so far I have implemented Solid::Device::storageAccessFromPath by talking all StorageAccess devices, going though all of them and and returning proper one.
code:
Solid::Device Solid::Device::storageAccessFromPath(const QString &path) { // TODO check if symlinks are in the path QFileInfo fileInfo = QFileInfo(path); if (!fileInfo.exists()) { //TODO error handling } QSet<QString> checked; //To avoid weird infinete loops checked.insert(fileInfo.path()); while (fileInfo.isSymLink()) { fileInfo = QFileInfo(fileInfo.symLinkTarget()); if (checked.contains(fileInfo.path())) { //TODO error handling } checked.insert(fileInfo.path()); } QDir dir = fileInfo.dir(); QString canonPath = dir.canonicalPath(); QList<Device> list = Solid::Device::listFromType(DeviceInterface::Type::StorageAccess); Device match; int match_length = 0; for (Device device: list) { StorageAccess *storageAccess = device.as<StorageAccess>(); if (canonPath.startsWith(storageAccess->filePath()) && storageAccess->filePath().size() > match_length) { match_length = storageAccess->filePath().size(); match = device; } } return match; }
and in the thumbnail.cpp:
Solid::Device device = Solid::Device::storageAccessFromPath(filePath); if (device.is<Solid::StorageVolume>()) { allowCache = device.as<Solid::StorageVolume>()->usage() != Solid::StorageVolume::UsageType::Encrypted; }
It works, but it might be better to hold tree structure with all StorageAcces devices where position on tree would be determined by mountpoint of device, which would allow us to go straight to the desired StorageAccess without checking all of them.
On the other hand, to do that we would have to slice path into parts and compare those parts with nodes on the tree and also create special structure for that. I'm not sure which approach is better.
Another problem with storing StorageAccess tree is that I looked at DeviceManager classes and it might be too difficult for me, as I don't know Solid well.
Which should it be and do we go about it?
for (Device device: list) { StorageAccess *storageAccess = device.as<StorageAccess>(); if (canonPath.startsWith(storageAccess->filePath()) && storageAccess->filePath().size() > match_length) { match_length = storageAccess->filePath().size(); match = device; }
This search is affected by the same sibling match bug that QStorageInfo was. See https://bugreports.qt.io/browse/QTBUG-49498.
How often do I have to repeat the thumbnailer has to use the canonical path anyway? Please use that.
How often do I have to repeat the thumbnailer has to use the canonical path anyway? Please use that.
storageAccessFromPath converts given path into canonical. I figured it should do so anyway, so I'm not doing it again here.
Ok. Should this be done here in createSubThumbnail or earlier so the whole object uses canonical path?
I made merge request for storageAccessFromPath: https://invent.kde.org/frameworks/solid/-/merge_requests/8
thumbnail/thumbnail.cpp | ||
---|---|---|
776 | No. The m_thumbnailDirDeviceId is set to 0 and only changed if lstat executes properly. Then it takes value of st_dev. I don't know if st_dev can be 0 though and I couldn't find this information in manual either. If it can, we should change it. |
thumbnail/thumbnail.cpp | ||
---|---|---|
776 | I don't see how stat would succeed and st_dev would be 0, but I could be wrong. Maybe assert that it's not 0, at least? | |
779 | print out the path in the warning (same on line 786) | |
785 | missing space after if s/data/constData/ | |
thumbnail/thumbnail.h | ||
93 | This will break compilation on Windows, I assume? On Unix, is the corresponding include present in this file? The lack of context makes reviewing difficult indeed. Any chance to move this to gitlab? (see invent.kde.org) |
thumbnail/thumbnail.h | ||
---|---|---|
93 | On Windows this check is assumed false. So in case encrypted file shares filesystem with thumbnails directory, it can take longer as those won't be cached, but it will work. |
thumbnail/thumbnail.cpp | ||
---|---|---|
781 | Please elaborate, I don't see what the problem is. |
thumbnail/thumbnail.cpp | ||
---|---|---|
776 | If st_dev cannot be 0 on successful call of lstat, I don't think there's need to change anything. m_thumbnailDirDeviceId will remain zero only if lstat returns error and as long it's zero sharesFilesystemWithThumbRoot will return false. |
thumbnail/thumbnail.cpp | ||
---|---|---|
776 | I agree with your second sentence. I never said otherwise. It will work, *if* indeed we are both right that st_dev would never be 0 when stat() succeeds. | |
thumbnail/thumbnail.h | ||
93 | Which check? This is a member variable, and I'm not sure the dev_t type will be known at all on Windows. |
thumbnail/thumbnail.h | ||
---|---|---|
93 | Oh, I missed it, because types.h it's not explicitly included. I also removed lstat usagen on Windows. Thanks for pointing that out. |
thumbnail/thumbnail.cpp | ||
---|---|---|
776 | How about I set it to -1 at start and check for that instead? I checked definitions and it looks like dev_st is unsigned, but it shouldn't be a problem. |
thumbnail/thumbnail.cpp | ||
---|---|---|
776 | -1 isn't exactly a great value for an unsigned int :-) Or rather, it's not great enough (in the "greater than" sense) :-) But yeah, 0xFFFFFFFF is a good "not set yet" value for a dev_t, to be safe against the possibility of 0. |
thumbnail/thumbnail.cpp | ||
---|---|---|
776 | "-1 isn't exactly a great value for an unsigned int :-)" I believe "-1" guaranties unsigned to be set to its maximum value, due to how type conversion works, but I could be wrong. Do you know if size of dev_st is fixed or depends on architecture? I'd like to define const rather than use literals for checks, but I don't know if I can simply define it as 0xFFFFFFFF or if it's more complicated. |
thumbnail/thumbnail.cpp | ||
---|---|---|
776 | unsigned(-1) is guaranteed to be added to the modulo value until it becomes positive. That is, it is (UINT_MAX + 1) - 1 Note that this will generate a warning with some compilers about change of sign, so either make the cast explicit or simply use ~0U. As for what type it is... that's why we have std::numeric_limits. |
Using special value instead of 0 as unset m_thumbnailDirDeviceId.
Changing ifdef checks for Q_OS_WIN for consistency.
Moving ifdev check into sharesFilesystemWithThumbRoot, to reduce number of preprocessor directives.
@marcingu - have you even verified this works? - I am quite sure it does not, neither for fuse.encfs, fuse.cryfs (as used by Vaults), nor for any LUKS encrypted devices.
Second, I have asked for a full context diff, or even better moving this to invent.kde.org, but @marcingu keeps ignoring this.
So a full NACK from me.
Sorry about that. It got lost under all suggestions. I'll post the next change to invent.kde.org and link it here.
I double checked and I believe there's space for discussion with more involved developers/designers.
The code behaves in expected way (thumbnails are not saved), but not because usage for those devices is UsageType::Encrypted, but rather they cannot be treated as StorageVolume (line :728).
This isn't my area of expertise and I don't know what behavior should be expected from different kinds of StorageAccess devices.
Maybe if we got broader information about it, we might prepare some tests as well, so we don't run into problems with this code in the future.
Ping!
I'm remanding about question early, because I could do much more work if I get to do it on weekend.
Question:
This code won't save thumbnail for file on any device that isn't StorageVolume or is StorageVolume with usage UsageType::Encrypded.
Is this fine?
Should we take something else into consideration?
Do we want that feature tested to avoid regression in the future?
The whole block can never return true, so it should just be removed, along with all its dependencies.
Is this fine?
Should we take something else into consideration?
Do we want that feature tested to avoid regression in the future?
This code duplicates functionality already present in the thumbnailer code in KIO core. It can be replaced by a trivial "CacheThumbnail" flag provided by the caller.
I tested it once more and it returns true when it should, as expected. What makes you think it doesn't?
Is this fine?
Should we take something else into consideration?
Do we want that feature tested to avoid regression in the future?This code duplicates functionality already present in the thumbnailer code in KIO core. It can be replaced by a trivial "CacheThumbnail" flag provided by the caller.
I wasn't able to prevent catching of directory thumbnails from KIO. The fact that files used for making a preview can be on different storage, makes it extra tricky.
But I'm new the project, so if you do know how to make it simpler I'm all ears.
Even worse, its almost random:
udi = '/org/kde/fstab///pebbles/foo:/mnt' parent = '/org/kde/fstab' (string) vendor = 'pebbles' (string) product = 'foo:/mnt' (string) description = 'foo:/mnt on pebbles' (string) icon = 'network-server' (string) StorageAccess.accessible = false (bool) StorageAccess.filePath = '/mnt' (string) StorageAccess.ignored = false (bool) NetworkShare.type = 'Cifs' (0x2) (enum) NetworkShare.url = 'smb://pebbles/foo:/mnt' (string)
if (device.is<Solid::StorageVolume>()) -> false, though it should be cached
udi = '/org/freedesktop/UDisks2/block_devices/dm_2d2' parent = '/' (string) vendor = '' (string) product = '' (string) description = '100,0 GiB Hard Drive' (string) icon = 'drive-harddisk-root' (string) Block.major = 254 (0xfe) (int) Block.minor = 2 (0x2) (int) Block.device = '/dev/dm-2' (string) StorageAccess.accessible = true (bool) StorageAccess.filePath = '/' (string) StorageAccess.ignored = true (bool) StorageVolume.ignored = false (bool) StorageVolume.usage = 'FileSystem' (0x2) (enum) StorageVolume.fsType = 'btrfs' (string) StorageVolume.label = '' (string) StorageVolume.uuid = '5832ebfa-bf02-40d2-bdc7-90403b207b62' (string) StorageVolume.size = 107374182400 (0x1900000000) (qulonglong)
This is an LUKS encrypted volume so should not be cached ...
BTW, have you checked if the thumbnails are still generated for non-removable but encrypted filesystems? My whole system is encrypted (except for /boot), so it would be a performance loss if no thumbnails were ever cached.
Ok, so once more, let's go back to my question, but in more detail.
Currently I get device as StorageAccess, because storage volumes don't include all of the mount points, such as mounted encfs encrypted data.
This could be a problem as StorageAccess doesn't have an information about encryption (as far as I can tell) . As mentioned before, from lack of better solution, I assume false for storage devices that aren't StorageVolume, but it's not ideal.
For the rest of devices I was relaying on usage from StorageVolume, but as it was pointed out, it holds FileSystem for LUKS encrypted volume, so it's either a bug in Solid or I'm using it wrong and I should get information about encryption elsewhere. I wasn't able to find anything else that could help though.
When it comes to NetworkShare, I tested it with ssh and MTP. KIO won't even attempt to make previews for directories, so catching is out of question. I don't think we should change this behavior as a part of this story, but it would be good for code to work for them anyway, because we might decide to make those previews in the future or use
the same mechanism for catching rest of the files.
Do we want to store caches for NetworkShare devices and external storage? Currently we do, with exception of directories on NetworkShare and I see no need to change it, but it might be good to make sure.
Are there any other corner cases we're forgetting?
I also have a personal request for @bruns . Your responses tend to be laconic, often missing suggested solution for given issue and sometimes even pointing out what the problem is. This leads to me wasting a lot of time to find out something that would take you few moments to communicate.
I'm willing to spend as much time as it takes to make this code as good as possible and I'm thankful for all responses so far, but you need to remember I'm only starting work with KDE code, so things that are obvious to you, often won't be as clear to me.
Non-removable – not. Currently I do cache encrypted files when they are on the same device as thumbnails directory (~/.cache/thumbnails/). If you think we should cover some different cases as well, it's open for discussion.
!PING.
I need help from someone with good understanding of Solid to continue.
I'm don't know how to determinate if StorageAccess device is encrypted or not. I wanted to use StorageVolume::usage, but it's not available for all types of devices and doesn't equal encrypted for LUKS encrypted volumes.
!PING.
How do we check if file access is encrypted using Solid? Do we need new property/method in StorageAccess?
At a glance the problem here is actually that depending on the setup the mounted volume isn't necessarily the encrypted volume. e.g. if you make a LUKS encrypted btrfs on /dev/sda1 then sda1 is type LUKS encrypted, but to actually use it that device gets decrypted and mapped into a different name /dev/mapper/luks-123 which is type btrfs and not encrypted. It is only the mapped one that gets mounted which is why the encryption context gets lost along the way.
I'm afraid this will need some adjustment in solid because currently we don't carry that information anywhere that I can see. It is however in the data of udiks2 behind the scenes (dbus property org.freedesktop.UDisks2.Block.CryptoBackingDevice which is a dbuspath of the encrypted backing object) so it should be readily available, just needs putting somewhere in solid. This is also represented in the sysfs through a slave relationship between the two block devices, but I'm guessing using that over udiks2 isn't nearly as portable.
Where to put it I don't really know, probably a new method on one of the interface classes CryptoBackingUDI which returns the UDI representing the backing device.
On an entirely unrelated note I for one would simplify the sharesFilesystemWithThumbRoot to check if the thumb root is encrypted and always cache the thumb if that is the case. Whether or not the devices are the same seems unnecessarily nitpicky so long as the results are encrypted if the originals were. Otherwise you run into awkward cases where a users has a system drive and a data drive, both encrypted, but because they are different we effectively disable all thumbnail caching for the data drive.
Could this be delegated onto someone who knows Solid or should I try figuring it out by myself?
On an entirely unrelated note I for one would simplify the sharesFilesystemWithThumbRoot to check if the thumb root is encrypted and always cache the thumb if that is the case. Whether or not the devices are the same seems unnecessarily nitpicky so long as the results are encrypted if the originals were. Otherwise you run into awkward cases where a users has a system drive and a data drive, both encrypted, but because they are different we effectively disable all thumbnail caching for the data drive.
I was thinking about that, but decided against it on the slim chance that person having access to encrypted home might not necessarily have same access to a vault or other volume that got mounted on the system, by for example different user of the same machine.
Could this be delegated onto someone who knows Solid or should I try figuring it out by myself?
Alas, I'm not sure there is anyone who feels responsible enough for solid, so you'll probably have to give it a go yourself.
I think I have the Solid part done, but as I don't know this code well, I'd be grateful if someone more advanced on the subject checked it.
https://invent.kde.org/frameworks/solid/-/merge_requests/19
Unfortunately, I wasn't able to test it on LUKS device, because I had problems mounting it.
Is this unblocked now that https://invent.kde.org/frameworks/solid/-/merge_requests/19 has been merged?
I need to make some small changes. I'll create new merge request on invent.kde.org and link it here.
Here's the new merge request: https://invent.kde.org/network/kio-extras/-/merge_requests/75