Support for xattrs on kio copy/move
Needs RevisionPublic

Authored by tmarshall on Dec 27 2018, 9:38 AM.

Details

Reviewers
dfaure
chinmoyr
bruns
cochise
Group Reviewers
Frameworks
Summary

This patch adds support to KIO to preserve xattr of files and directories on copy or move.
Working for copy/move including overwrite and rename cases.
Not working for "write into" when coping folders have name conflict. I'm against adding to this user case as the name of the action don't imply overwrite.

Tested only on Linux.

FEATURE: 116617

Diff Detail

Repository
R241 KIO
Branch
arcpatch-D17816
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 20245
Build 20263: arc lint + arc unit
There are a very large number of changes, so older changes are hidden. Show Older Changes
cochise added inline comments.Jan 5 2019, 5:53 PM
autotests/jobtest.cpp
533

On linux, the command line bin are list/get/setfattr, on BSD list/get/setxattr, on mac are xattr -l/-p/-w, so this part will need a lot of ifdefs too, to work on many platforms.

dfaure added a comment.Jan 5 2019, 6:44 PM

I'll be happy to review a patch for QSaveFile to preserve ACLs and xattrs.

bruns added a subscriber: bruns.Jan 6 2019, 1:16 AM
bruns added inline comments.
src/core/copyjob.cpp
2192

you can (typically) avoid the double (syscall, i.e. expensive) by preallocating the array and only reallocating if you get errno == ERANGE. Dito for getxattr.

2214

... always ...

2218

There should probably be a if (!xattrkey.startsWith("user.")) continue; here.

2219

getxattr is called on the same file again and again, thus it is much more efficient to use fgetxattr.

cochise updated this revision to Diff 48959.Jan 8 2019, 1:13 PM
cochise marked an inline comment as done.

All use cases working, added a new KIO to copy xattr

Working on copy and move with or without overwrite for calls of
KIO::copy, KIO::copyAs and KIO::file_copy of files and directories.
Due the last problems with the method to copy xattrs being private, I
created a new KIO::copy_xattr. Not very async, but the operation should
be fast enough.
Some more work on the tests, removed the ones that don't make sense for
the case and added checks to preent breackage if xattrs ar not supported.

cochise updated this revision to Diff 48960.Jan 8 2019, 1:21 PM

For some reason the added kio was not included

bruns added inline comments.Jan 8 2019, 7:00 PM
src/core/copyxattrjob.cpp
115

indentation

src/core/copyxattrjob.h
34

"... to another."

60

There are no params

cfeck added a subscriber: cfeck.Jan 9 2019, 6:15 AM
cfeck added inline comments.
src/core/filecopyjob.cpp
519

This waits (i.e. spawns a separate event loop) until the job is finished. Should use a subjob.

cfeck added inline comments.Jan 9 2019, 6:17 AM
src/core/copyjob.cpp
1119

Here, too.

cochise updated this revision to Diff 49487.Jan 14 2019, 9:22 PM
cochise marked 3 inline comments as done.

Use subjobs on file_copy and other small fixes

Not using subjobs in KIO::copy, as it breakes the tests.

cfeck added a comment.Jan 17 2019, 3:48 AM

I am still not fond of having a local event loop in KIO. The whole point of KIO is that it should work asynchronously.

If the tests fail, could they be adapted? Or is the reason why the tests fail unfixable?

cochise marked an inline comment as done.Jan 17 2019, 12:59 PM
This comment was removed by cochise.
src/core/copyjob.cpp
1119

I tried various ways to call this as a subjob, but all of them leads to breakage of non xattr related tests. Maybe some major changes to the class are needed.

But the call can be asynchronous with little effort. All tests still pass if start() is called, instead of exec().

2192

In theory, xattrs have unlimited size on some filesystems. Ext limits it to a fs block (4 Kb on most end user setups). Allocating full 4 kb is a overkill, as most of xattr ae small.

But as most of files don't have xattr, and the function will return here, we have to pay the cost of the second call only if we will use it. If we preallocate memory, we have to pay a cost for every file.

2214

On Linux, at least. Each item gets a \0 terminator and the list itself too. Not tested on other systems.

2218

Doing some research and test...

Currently, the security, system, trusted, and user extended attribute classes are defined as described below. Additional classes may be added in the future.

If a copy a file with kioclient as user, any attribute outside user. is lost and I get a warning for the ones I'm unable to preserve, but if I copy as root, all are preserved.

cochise added a comment.EditedJan 17 2019, 12:59 PM

I'm not sure why the tests fail, and the tests failing are unrelated to xattrs. I think there is some problem in queuing the subjobs and ensuring they all have finished.
After the copy job is finished, the copyLocalDirectory test can't find the file inside the copied test directory. Manually testing recursive copy I can't find problems, but the autotest fail. So, I think some parts of the copy job are executed after the copyjob finishes if I add a subjob.
As I said, we can use start, instead of exec, to be asynchronous, but using a subjob seems to need some refactor, maybe adding a new state for the xattr subjob.

Can we ship it, or using a subjob on KIO::copy is mandatory here?

Using exec() from a job implementation is a big no no, it creates an unexpected re-entrancy, source of all sorts of problems. This cannot be shipped as is.

See KCompositeJob's documentation for how to add a subjob - and/or see DirectorySizeJob for a simple example.

Where are we with this? Would be really nice to finally get this bug fixed for our users. :)

My vacancies ended and I'm making the annual planning for my classes, so I will be busy for at least two weeks, but I plan to work on the subjob bug as soon my schedule allows it.

Thanks, that would be appreciated. This looks like it's sooooo close...

bruns added a comment.Mar 5 2019, 3:25 PM

Does this work correctly when the source is a symlink? - getxattr vs lgetxattr.

src/core/copyxattrjob.cpp
135

if errno=ERANGE, the list has grown after you called listxattr(file, nullptr, 0). This is completetely valid, see man 2 listxattr:

But, bear in mind that there is a possibility that the set of extended attributes may change between the two calls, so that it is still necessary to check the return status from the second call.

Same applies for getxattr.

Also, you can cut the number of syscalls by half by using an opportunistic buffer size. Also saves you from allocating a buffer for each attribute value. As said, you have to check for "ERANGE" anyway.

Does this work correctly when the source is a symlink? - getxattr vs lgetxattr.

Did anyone get to check?

@cochise, @bruns, @dfaure can we make a push to get this in? It's sad to see a patch that fixes an important bug come so close and not quite get merged!

bruns added a comment.Jun 11 2019, 5:05 PM

There are several comments not addressed yet ...

src/core/copyxattrjob.cpp
126

qAsConst(m_keyList)

131

listxattr is wrong here ...

bruns requested changes to this revision.Jun 11 2019, 5:05 PM
This revision now requires changes to proceed.Jun 11 2019, 5:05 PM

@cochise, can you work on fixing those so we can get this in? It would be a shame for all your hard work to be abandoned!

@cochise are you planning to resume work on this? If not, we may have to find someone else to finish it up, because I'd really like to get it in.

arrowd added a subscriber: arrowd.Nov 1 2019, 9:38 AM
tmarshall added inline comments.Dec 23 2019, 9:58 AM
src/core/copyxattrjob.cpp
68

"don'" -> "don't"

It seems like the original author has disappeared, so if you would like to take over this patch, @tmarshall, please feel free. You would click on the "Add Action" combobox above the comment field and select "Commandeer Revision" Then you can check out the branch locally with arc patch D17816, or click on the "Download Raw Diff" item in the hamburger menu on the top of the screen.

tmarshall commandeered this revision.Dec 24 2019, 6:02 AM
tmarshall added a reviewer: cochise.
tmarshall marked 9 inline comments as done.Dec 24 2019, 6:09 AM
tmarshall marked 6 inline comments as done.Dec 24 2019, 6:24 AM
tmarshall marked 7 inline comments as done.Dec 24 2019, 6:49 AM
tmarshall updated this revision to Diff 72131.Dec 24 2019, 6:53 AM
  • Addressed comments
tmarshall marked 4 inline comments as done.Dec 24 2019, 7:04 AM
tmarshall marked 9 inline comments as done.Dec 24 2019, 8:31 AM
dfaure requested changes to this revision.Dec 24 2019, 8:51 AM
dfaure added inline comments.
autotests/jobtest.cpp
535

QVERIFY(....) around all calls to waitForFinished

(repeats)

537

This entire QHash is duplicated as is between both methods, right?

Extract it into a function that returns the QHash (by value).

556

Whenever you see QVERIFY(a==b) it should in fact be QCOMPARE(a, b) so that we can see the value on failure (this isn't gtest).

If needed, cast to int.

[repeats]

660

always use QVERIFY2(job->exec(), qPrintable(job->errorString()));

(repeats)

755–767

Please extract a function from those 12 lines, which are duplicated.

769

const

771

const

src/core/copyjob.cpp
1119

That sounds racy; the subjob might not be finished by the time the main job is finished, if you just "start and forget".

I cannot accept this commit with an exec() in here. The easy and modern way to make this async is just a connect and a lambda (which contains the rest of the code here).

I have to wonder though: do we have a fast way to determine whether we actually need the additional subjob? (to avoid making this slower on systems -- or files -- without xattr)

Hmm, or a much bigger architectural question: why doesn't kio_file copy the xattr as part of FileProtocol::copy(), as it already does with e.g. permissions and ACLs, instead of doing this in a separate job?

This revision now requires changes to proceed.Dec 24 2019, 8:51 AM
tmarshall added inline comments.Dec 24 2019, 9:38 AM
src/core/copyjob.cpp
1119

Please help me understand for a moment. I'm not the original author of this patch and am trying to bring it to completion. I just need to be caught up to speed. In reading this piece of code over, I wasn't entirely sure of the author's intent. It seems like he wants to create a new job which copies the xattrs and then run it asynchronously. Why is the exec call bad? What does a lambda do that the exec call does not?

In terms of determining when the job is actually required, one could test to see if the file has xattrs or indeed if the system has xattrs support. We could surround the invokation of the xattrs copy job with an #ifdef HAVE_SYS_XATTR_H, for example.

tmarshall updated this revision to Diff 72135.Dec 24 2019, 9:42 AM
  • Addressed comments, changed xattr system calls to use file descriptors where possible
tmarshall marked 7 inline comments as done.Dec 24 2019, 10:20 AM
tmarshall updated this revision to Diff 72144.Dec 24 2019, 10:26 AM
  • Added helper functions to clean up tests
tmarshall added inline comments.Dec 24 2019, 10:32 AM
src/core/copyjob.cpp
1119

And do we want this to be sync or async? That much isn't clear to me.

tmarshall added inline comments.Dec 24 2019, 7:26 PM
src/core/copyjob.cpp
1119

Thanks to @ahmad78 for chatting with me about this in irc.

Everything works for me if I replace the exec call with a start call. Currently the result of the exec call is thrown away as the job itself does the error handling and there isn't much to do if the xattrs can't be copied for whatever reason.

As for why this is a different job, that was a design decision by the original author. It seems like a decent enough way forward to me as it leans on the side of modularity. The applications of xattrs are so broad that it's hard to say what they will be used for in the future or what the desired functionality might be. Having a separate job will allow for more flexibility regarding xattrs going forward.

In short, I don't really think it hurts to have a separate job and there is certainly the potential that it comes in handy.

Many thanks to getting this live again, @tmarshall . A year ago my vacations ended and I can't get this ready to ship.

Everything works for me if I replace the exec call with a start call. Currently the result of the exec call is thrown away as the job itself does the error handling and there isn't much to do if the xattrs can't be copied for whatever reason.

If I recall correctly, start() and exec() works to me on manual tests, but start() broke the automated tests. As the maintainer vetoed a sync solution, but the async one cant pass the tests, the patch stalled.

If all tests are passing to you with a async call, I think it's shipabble.

anthonyfieroni added inline comments.
src/core/copyxattrjob.cpp
98–111

indentation

156–158

valuelen there is no vallen

dfaure requested changes to this revision.Dec 31 2019, 9:24 AM
dfaure added inline comments.
autotests/jobtest.cpp
529

prepend static

626

prepend static

638
if (command.isEmpty())
    qWarning() << "ERROR: setfattr/setextattr/xattr not found";

to make it easier to debug than just this method returning empty.

650

QSKIP("This test requires setfattr")

Otherwise it will look like a "PASS" in the test output.

753

QSKIP

src/core/copyjob.cpp
1119

I don't mind that errors don't propagate up here, that's fine, we certainly don't want the copy to fail if attributes can't be copied.

I do mind that this uses exec() (Qt eventloop re-entrancy is a nasty source of bugs), and I do mind a fire-and-forget subjob (if that's what start() here does -- unless the subjob gets registered?).

I have provided the solution already, connect the subjob's result to a lambda that contains the rest of the code in this method [but see below].

I just realized another problem: this is at the wrong layer. It should be in FileCopyJob, not in the high-level CopyJob (which lists directories, creates directories, etc. and delegates the task of copying one file to FileCopyJob). FileCopyJob is also used directly in many places (when a single file has to be copied) so if you want that to preserve xattr (rethorical question, I'm assuming you do) this should go down to FileCopyJob.

About this being a different job: actually, one doesn't prevent the other. Look at permissions. We have ChmodJob for setting permissions, but when FileCopyJob finds that the kioslave supports copy() (see DirectCopyJob on the app side) then copy() takes care of copying permissions too. FileCopyJob only calls ChmodJob for the special case of renaming and because the FileCopyJob API actually allows to change permissions, this doesn't apply to xattr.

When the kioslave doesn't support copy() and it falls back to "data pump" mode (get() + put()), FileCopyJob passes the permissions to the put() job. *That* is the case where it should follow that with a KIO::copy_xattr subjob, unless xattr can be passed by metadata to the put job?

I believe the same idea should be done for xattr (kio_file's copy() should copy xattr all by itself), to minimize roundtrips between the app and the slave. People complain that copying files with KIO is too slow, let's not make it worse.

src/core/filecopyjob.cpp
519

That's what should happen in copy() IMHO. But that's an optimization, I can accept that being done later.

And if you want to support other cases than file->file, like desktop, trash, smb, etc. then this should also be done when the put job finishes.

What's for sure is that doing it both in CopyJob and in FileCopyJob is overkill since the former uses the latter :-)

This revision now requires changes to proceed.Dec 31 2019, 9:24 AM