Description
Details
Related Objects
I'm trying to solve this. My suspect the is the atomic int ref() - deref(), it is not safe enough (see the KisTileData::deref() function as an example), since it is theoretically possible for a different thread to ref() a tile between the deref() function and the actual erasing (freeTileData()). But this is very unlikely, so perhaps there are other problems too.
I don't know but I see some problems in the kis_tile_... part too. Maybe not related to this bug and maybe it causes problems only once in a decade, but still.
So my opinion is this: the KisTileData::ref() and deref() functions are not thread safe. I made a test and it seems these are actually callled from the tile creator thread and from other threads too (probably that's why the atomic ref() used here). A worst case scenario could be like this:
1. thread1 creates tile data: refCount= 0 2. thread1 calls ref(): refCount= 1 3. thread2 gets tile data address 4. thread1 calls deref(); refCount= 0 5. thread1 deletes tile data 6. thread2 calls ref(): ????
There can be other interesting cases, what is important: the access to the tile data and the refCount change should be atomic together, otherwise sometimes the program can access a deleted tile. If the tile remains in the memory and still accessible, perhaps thread2 will delete it again after its deref(). It's a real mess, I think.
Now I'm thinking on a solution using QSharedPointers instead of the ref() - deref() calls. QSharedPointers are thread safe.
I actually think that Tyson Tan has a bug report that points in this direction, too, but I cannot find it right now.
Hm, I think that this commit, as it touches the most core part of Krita really should have gone through review. Raghukamath reports he's got trouble loading kra files now.
Commit
https://phabricator.kde.org/rKRITAbb523df64b47096eb0d787ee861aef59c82b713e
is unrelated to this bug. The bug happens due to the painting in Composite Copy mode. No tiles are actually "lost".
@dkazakov: the commit was related to the multi-thread access of the tiles. The behaviour of the Composite Copy bug looks similar to this bug, but of course there can be another reason. Sorry if it's not related, I move it to a different topic then.
@rempt: I found a new locking problem and already added a patch. I do believe some exotic errors are caused by this bug so it's worth to change. The removing of the unused tiles is more asynchronous now so there can be some bugs.
@fazek : stuff that touches the tiles system must go through review, it's extremely sensitive code.