sanitizer: Improve device listing
ClosedPublic

Authored by michaelh on Mar 18 2018, 11:33 AM.

Details

Summary

Print more info about devices

Test Plan

Example:

$ baloodb devices
Listing database contents...
! device:41 indexed-items:40
+ device:43 indexed-items:76 fstype:cifs fsname://circulans/VideoPartition mount:/media/circulans/videopart
+ device:42 indexed-items:3627 fstype:fusectl fsname:fusectl mount:/sys/fs/fuse/connections
+ device:22 indexed-items:3 fstype:tmpfs fsname:tmpfs mount:/run
! device:2098 indexed-items:1450
! device:2086 indexed-items:220
! device:2069 indexed-items:1533
! device:2064 indexed-items:99
! device:2066 indexed-items:11664
+ device:2053 indexed-items:6369 fstype:ext4 fsname:/dev/sda5 mount:/home
+ device:2049 indexed-items:22 fstype:ext4 fsname:/dev/sda1 mount:/
Found 11 matching items
Elapsed: 7.26369 secs

Diff Detail

Repository
R293 Baloo
Branch
sanitize-devices (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
michaelh created this revision.Mar 18 2018, 11:33 AM
Restricted Application added projects: Frameworks, Baloo. · View Herald TranscriptMar 18 2018, 11:33 AM
michaelh requested review of this revision.Mar 18 2018, 11:33 AM
michaelh updated this revision to Diff 29827.Mar 18 2018, 11:55 AM
michaelh edited the test plan for this revision. (Show Details)
  • Fix typo
michaelh updated this revision to Diff 29981.Mar 20 2018, 9:42 AM

Rebase on master

Could you please make your tab spacing conform to the style of the
codebase? This should also be fixed in baloodb.

Could you please make your tab spacing conform to the style of the
codebase? This should also be fixed in baloodb.

baloo follows Kdelibs Coding Style that is 4 spaces per tab. Please specify the places this code (or baloodb) deviates from the rules.

Could you please make your tab spacing conform to the style of the
codebase? This should also be fixed in baloodb.

If you browse your recently committed code in Diffusion, you'll notice that your linespaces can be copy and pasted. This doesn't match the current style where the tabs are spaces and the linespaces don't contain tab spaces and thus won't be copyable.

baloo follows Kdelibs Coding Style that is 4 spaces per tab. Please specify the places this code (or baloodb) deviates from the rules.

If you browse your recently committed code in Diffusion, you'll notice that your linespaces can be copy and pasted. This doesn't match the current style where the tabs are spaces and the linespaces don't contain tab spaces and thus won't be copyable.

I see. Revisited my KDevelop settings. Should be fixed now.

michaelh updated this revision to Diff 30514.Mar 25 2018, 2:05 PM
  • Remove trailing whitespace
michaelh updated this revision to Diff 30714.Mar 27 2018, 1:14 PM
  • Reuse fileinfo list
bruns added a subscriber: bruns.Apr 5 2018, 9:42 PM
bruns added inline comments.
src/engine/databasesanitizer.cpp
253 ↗(On Diff #29827)

You can check if your output goes to a tty or is redirected:
https://code.qt.io/cgit/qt/qtbase.git/tree/src/corelib/global/qlogging.cpp#n263

255 ↗(On Diff #29827)

print as hex(major):hex(minor) ?

src/engine/fsutils.cpp
111 ↗(On Diff #29827)

Use of Solid would be preferred.

src/engine/fsutils.h
51 ↗(On Diff #29827)

No need for explicitly calling the default constructor, this is only needed for POD types (e.g. quint64).

michaelh added inline comments.Apr 5 2018, 11:13 PM
src/engine/databasesanitizer.cpp
253 ↗(On Diff #29827)

That's great, thank you. I'll add some color to stdout in another patch.

255 ↗(On Diff #29827)

Good idea was my first thought, but after trying, I must say it does not add really much info and makes the output harder to read.
The decimal is needed, because this function is mainly for selecting a deviceId to be used with 'clean', 'list' and upcoming 'repair' commands. I'd prefer to just print the decimal.

src/engine/fsutils.cpp
111 ↗(On Diff #29827)

Sure? We would have to add Solid as a dependency for libKF5BalooEngine.so.
databasesanitizer currently is experimental not exported by default (You cannot see this because I haven't rebase yet.).
I did not use Solid, because I don't know it well and it seemed easier to copy and adapt FSUtils::getDirectoryFileSystem. I'm not objecting to use Solid though.

The true reason for 'fsutils`'s existence is to disable CoW on btrfs. Can Solid do that? If so we can drop fsutils.

michaelh updated this revision to Diff 31438.Apr 5 2018, 11:14 PM
  • Apply some sugested changes
michaelh added inline comments.Apr 5 2018, 11:35 PM
src/engine/databasesanitizer.cpp
253 ↗(On Diff #29827)

Hey, that's an IIFE! Never seen it in C++ before.

bruns added inline comments.Apr 9 2018, 2:10 PM
src/engine/databasesanitizer.cpp
159 ↗(On Diff #29827)

I can not come up with a reason to use a MultiHash here in the first place - all that is needed is the deviceId and the file count per device.

You can use a QMap<DeviceId, Count> here, and use

usedDevices[info.deviceId] += 1;

(operator[] default-constructs the ValueType, which for e.g. int is guranteed by Qt to be 0).

This reduces the effort for counting the items per device from (D + 3) * F to F.
(D: number of devices, F: number of files).
Currently you walk the infos list once to create the MultiHash, and then walk the Multihash once for uniqueKeys(), and once for each device when calling usedDevices.value(). Cumulated effort for .count() is F.

In your case, this would reduce the time for counting by a factor of 14. You also save the memory for creating a temporary MultiHash with ~20000 items.

src/engine/fsutils.cpp
111 ↗(On Diff #29827)

The question is if this really belongs in the library then.

IMHO the whole printDevices() function should be moved into baloodb.cpp, The DatabaseSanitizer should export either createList or a function returning QVector<DeviceId, Count>. Going from the DeviceId to the DeviceInfo should happen in baloodb.

michaelh marked 4 inline comments as done.Apr 9 2018, 4:53 PM
michaelh added inline comments.
src/engine/databasesanitizer.cpp
159 ↗(On Diff #29827)

Thanks!

src/engine/fsutils.cpp
111 ↗(On Diff #29827)

I agree completely and followed your suggestion. Currently I'm stuck, though.
Using Solid to obtain the accessibilty info of volumes and network shares, it seems only Solid::Block provides the major and minor properties needed to map to deviceId. But network shares don't implement Block. :-/
Any ideas? Maybe that is the reason why BalooEngine isn't using Solid?

michaelh added inline comments.Apr 9 2018, 5:50 PM
src/engine/fsutils.cpp
111 ↗(On Diff #29827)

While implementing your "I can not come up with a reason ..." suggestion, I realized, the 'clean' command also needs DeviceInfo to exclude unmounted devices.

bruns added inline comments.Apr 9 2018, 7:14 PM
src/engine/fsutils.cpp
111 ↗(On Diff #29827)

Which is good, as the device id is completely bogus, and changes dependent on mount order.

Also the inode is not generally stable for at least smb, as not every server supports inodes exported from the server, and can not when several server filesystems are exported as a single share.

baloo on network shares will likely never work.

michaelh updated this revision to Diff 31773.Apr 9 2018, 8:52 PM
  • Optimize device list creation
michaelh added inline comments.Apr 9 2018, 9:16 PM
src/engine/databasesanitizer.cpp
166 ↗(On Diff #29827)

Will try to use QStorageInfo

bruns added inline comments.Apr 10 2018, 2:47 AM
src/engine/databasesanitizer.cpp
164 ↗(On Diff #29827)

instead of checking and filling it in here, just iterate over usedDevices after you have finished iterating over infos.

michaelh updated this revision to Diff 31791.Apr 10 2018, 9:07 AM
  • Optimize device listing even more
michaelh updated this revision to Diff 31972.Apr 12 2018, 2:17 PM
  • (Prematurely) rebased on origin/master From now on -DBUILD_EXPERIMENTAL=on is needed to build
  • Remove unnecessary includes
bruns added a subscriber: ngraham.Apr 12 2018, 5:47 PM

Looks good to me so far. If there are any issues, we can fix it up later IMHO.
@ngraham as you reviewed other parts of the stack, can you do this one as well and accept?

src/engine/databasesanitizer.cpp
259 ↗(On Diff #29827)

Hm how about device:2052 [0:804] or device:2052 [0000:0804]?

I'm afraid you guys' code writing abilities have far surpassed my code review abilities at this point!

Arrgh these deviceIds! Just realized this:
When logged in as root on tty1

$ baloodb devices 
Listing database contents...
+ device:43 indexed-items:3658 fstype:tmpfs fsname:tmpfs mount:/run/user/0
+ device:2049 indexed-items:2 fstype:ext4 fsname:/dev/sda1 mount:/

After logging out from tty1 and re-mounting smb share:

$ baloodb devices 
Listing database contents...
+ device:43 indexed-items:3658 fstype:cifs fsname://circulans/VideoPartition mount:/media/circulans/videopart
+ device:2049 indexed-items:2 fstype:ext4 fsname:/dev/sda1 mount:/

Btw, dolphin also gets a little confused by this, showing the mounted share as unmounted in Places

src/engine/databasesanitizer.cpp
259 ↗(On Diff #29827)

That somehow sneaked back in. Your suggestion is good. I'll use it.

bruns accepted this revision.Apr 12 2018, 7:06 PM

Yes, thats exactly what I meant - device ids are not stable ...
But anyway, thats unrelated to this revision.

This revision is now accepted and ready to land.Apr 12 2018, 7:06 PM
michaelh updated this revision to Diff 31996.Apr 12 2018, 7:19 PM
  • Consider tmpfs as not mounted
  • Bring back major/minor print out
michaelh added inline comments.Apr 12 2018, 7:21 PM
src/engine/experimental/databasesanitizer.cpp
143 ↗(On Diff #31996)

Better safe than sorry here. Otherwise all items of share might be removed with baloodb clean.

michaelh added inline comments.Apr 12 2018, 7:33 PM
src/engine/experimental/databasesanitizer.cpp
144 ↗(On Diff #31996)

Will be removed downstream

This revision was automatically updated to reflect the committed changes.
This comment was removed by michaelh.
bruns added inline comments.Apr 13 2018, 2:01 AM
src/engine/experimental/databasesanitizer.cpp
135 ↗(On Diff #32016)

Completely overlooked this one - should have used QStorageInfo::mountedVolumes() here instead.
http://doc.qt.io/qt-5/qstorageinfo.html#mountedVolumes