Deevad's data loss when COMPOSITE COPY
Closed, ResolvedPublic

dkazakov created this task.Mar 15 2016, 8:12 AM
Restricted Application added a subscriber: woltherav. · View Herald TranscriptMar 15 2016, 8:12 AM
fazek added a subscriber: fazek.Mar 19 2016, 2:17 PM
fazek claimed this task.Mar 19 2016, 2:50 PM
fazek added a comment.Mar 19 2016, 2:57 PM

I'm trying to solve this. My suspect the is the atomic int ref() - deref(), it is not safe enough (see the KisTileData::deref() function as an example), since it is theoretically possible for a different thread to ref() a tile between the deref() function and the actual erasing (freeTileData()). But this is very unlikely, so perhaps there are other problems too.

rempt added a subscriber: rempt.Mar 19 2016, 3:17 PM

Hadn't Dmitry figured out that this was caused by the eraser blending mode issue?

fazek added a comment.Mar 19 2016, 3:32 PM

I don't know but I see some problems in the kis_tile_... part too. Maybe not related to this bug and maybe it causes problems only once in a decade, but still.

fazek added a comment.Mar 20 2016, 2:21 PM

So my opinion is this: the KisTileData::ref() and deref() functions are not thread safe. I made a test and it seems these are actually callled from the tile creator thread and from other threads too (probably that's why the atomic ref() used here). A worst case scenario could be like this:

1. thread1 creates tile data:            refCount= 0
2. thread1 calls ref():                  refCount= 1
3. thread2 gets tile data address
4. thread1 calls deref();                refCount= 0
5. thread1 deletes tile data
6. thread2 calls ref():                  ????

There can be other interesting cases, what is important: the access to the tile data and the refCount change should be atomic together, otherwise sometimes the program can access a deleted tile. If the tile remains in the memory and still accessible, perhaps thread2 will delete it again after its deref(). It's a real mess, I think.

Now I'm thinking on a solution using QSharedPointers instead of the ref() - deref() calls. QSharedPointers are thread safe.

rempt added a comment.Mar 20 2016, 2:29 PM

I actually think that Tyson Tan has a bug report that points in this direction, too, but I cannot find it right now.

Committed my solution into master.

Hm, I think that this commit, as it touches the most core part of Krita really should have gone through review. Raghukamath reports he's got trouble loading kra files now.

dkazakov added a comment.EditedMar 21 2016, 12:16 PM

Commit

https://phabricator.kde.org/rKRITAbb523df64b47096eb0d787ee861aef59c82b713e

is unrelated to this bug. The bug happens due to the painting in Composite Copy mode. No tiles are actually "lost".

fazek added a comment.Mar 21 2016, 1:26 PM

@dkazakov: the commit was related to the multi-thread access of the tiles. The behaviour of the Composite Copy bug looks similar to this bug, but of course there can be another reason. Sorry if it's not related, I move it to a different topic then.

@rempt: I found a new locking problem and already added a patch. I do believe some exotic errors are caused by this bug so it's worth to change. The removing of the unused tiles is more asynchronous now so there can be some bugs.

rempt added a comment.Mar 21 2016, 1:40 PM

@fazek : stuff that touches the tiles system must go through review, it's extremely sensitive code.

fazek removed fazek as the assignee of this task.Mar 24 2016, 8:01 PM
dkazakov closed this task as Resolved.Apr 4 2016, 8:44 AM
dkazakov claimed this task.

The fix for the Eraser mode is now in master, so let's treat it as fixed