Separate concerns of IncidenceTreeModel
Open, Needs TriagePublic

Description

Separate tree handling code from Q*Model* code in IncidenceTreeModel

IncidenceTreeModel is quite complex and responsible for many things at once.
It handles all the QAbstractItemModel stuff, but also cares for the Todo nodes to form a tree structure, creates and deletes those nodes, sorts them topologically etc.
The implementation is complex enough to make several issues hard to spot:

  • Lexicographical ordering is used in two places, and lexicographical sorting is in fact implemented twice. Both have quadratic running time, but should be at most amortized n log n. They also works fundamentally differently, although they could have simply used the same algorithm: Reversing an order in which parents are processed before children makes sure that children are processed before parents. The sorting is quite unreliable, as it depends on a depth attribute that can easily get out of sync, and it does not handle hierarchies with missing intermediates.
  • IncidenceTreeModel relies on begin/endResetModel() too much, resulting in the selection getting lost every time something in the underlying model changes.
  • UIDs are assumed to be unique, but Akonadi doesn't seem to make sure that they are. For example, IncidenceTreeModel can crash easily on iCal file resources with duplicate UIDs.
  • Cycles in RELATED-TOs are not detected.
  • Reparenting can break the view.

Several of these tasks can be extracted to (or implemented in) separate classes.
I'd like to do the following:

  1. Implement a class that stores items the UID of which is already known. This allows to greatly simplify the tree-ifying code by never exposing it to duplicate UIDs.
  2. Extract the code that handles the tree structure on the items to a class. The implementation has to be tailored towards QAbstractItemModels. For example, before reparenting an existing node of the tree, we first need to ask the tree handling class about for the row of the node after moving. That is because we have to announce such moves via begin*Rows().
  3. Come up with an efficient way to break cycles in the VTODOs' RELATED-TO hierarchy. This can be handled either by the tree-ifying class, or by a class on its own. The former would make the tree-ifying class quite complex again, but the latter might introduce some redundancies.

Steps 1. and 3. make the model more robust towards skewed source models.
I guess that breaking cycles is to much to ask Akonadi for; I have no clue if it is Akonadi's responsibility to ensure UIDs that are actually unique.
Step 2. is meant as a mere refactoring step, but I think it also prevents some of the crashes I encountered while toying around with simple iCal files.
Of course, IncidenceTreeModel should then make use of those classes and simplified in the process.

Separate classes for the above task should also improve testability.

I already have working code that does the above, except for the cycle-breaking part. It needs polishing, though. This task's task is to make sure that I don't waste any more time in case my patch wouldn't be accepted anyway (thanks Dan for the hint).

Related Objects

dkurz created this task.Feb 5 2018, 11:27 AM
knauss added a subscriber: knauss.Feb 5 2018, 11:50 AM

Akonadi can't change the UID (if you meen the UUID from ical), cause this is needed to be exactly the same when you do meeting etc. where you send eventes around.
That's also why you need to access the original UUID at some parts.
One other issue that breaks the unique UUID: you have access to calendars to persons that are also in the same meeting and this meeting has the same UUID obviously.

In the end no view should ever expect that UUIDs are uniquly availalable outside one calendar.

on iCal file resources with duplicate UIDs.

well this is not allowed from the iCal standard. Do you have a valid test for that? 'Cause one iCal file is one calendar and within a calendar UUIDs need to be unique.

dkurz added a comment.Feb 5 2018, 12:57 PM

Akonadi can't change the UID (if you meen the UUID from ical), cause this is needed to be exactly the same when you do meeting etc. where you send eventes around.

I didn't propose to change any UIDs (or at least I didn't intend to). But if there are problems with UIDs, we need some way to deal with them. This could involve ignoring some offending items, or presenting them to the user and asking for action.

That's also why you need to access the original UUID at some parts.
One other issue that breaks the unique UUID: you have access to calendars to persons that are also in the same meeting and this meeting has the same UUID obviously.

In the end no view should ever expect that UUIDs are uniquly availalable outside one calendar.

The current implementation does not distinguish between items from different calendars. I don't know if the source model has some mechanism to make UIDs unique if more than one calendar is involved. I assume that it doesn't. What is a sensible approach to deal with multiple occurences of the same TODO in different calendars? Displaying it multiple times would probably be confusing.

well this is not allowed from the iCal standard. Do you have a valid test for that? 'Cause one iCal file is one calendar and within a calendar UUIDs need to be unique.

I already guessed that this is forbidden, but it still should not crash KOrganizer. According to the robustness principle, we should certainly never produce duplicate UIDs, but we have to be able to handle them even within one calendar. My approach is to ignore all but one item with the same UID, and if this item is removed from the source model, we present another one with the same UID.

dvratil added a subscriber: dvratil.EditedFeb 5 2018, 1:09 PM

@knauss: You may not have duplicate UID in the same iCal file, but in KOrganizer it is easy to have two different calendars were UID is duplicate (easy: just copy-paste an event from one iCal to another). This is exactly the same problem as we have with Kolab and event sharing-loop that leads to two different calendars having the same instance of an event, thus the same UID in both.

@dkurz: Regarding making this code generic and re-usable for mail threading: I'm not sure. Threading in KMail is highly customizable (including no-threading option) and the Model in MessageList is heavily optimized for mail threading and does a lot of assumptions about how threading works. The threading happens in several passes with various threading conditions considered in each pass, interrupts to keep the application responsive etc. I think if your code is generic enough, all the threading could be handled by it as well, but I'm really concerned by performance. It's one thing to build a tree of couple hundred todos (which are mostly a flat-list anyway) and thousands of emails with deep threads.

Anyway, thanks for writing this up, it's not just about getting rejected (very unlikely to happen!), but it also allows us to discuss things like this and helps all of us to keep track of what's happening in PIM.

Edit: and also thank you for looking into this! IncidenceTreeModel is indeed a little beast :)

knauss added a comment.Feb 5 2018, 1:19 PM
In T7891#126572, @dkurz wrote:

Akonadi can't change the UID (if you meen the UUID from ical), cause this is needed to be exactly the same when you do meeting etc. where you send eventes around.

I didn't propose to change any UIDs (or at least I didn't intend to). But if there are problems with UIDs, we need some way to deal with them. This could involve ignoring some offending items, or presenting them to the user and asking for action.

( you asked " I have no clue if it is Akonadi's responsibility to ensure UIDs that are actually unique.")

so akonadi can't help here. Akonadi is not responsible of the data it caches.

That's also why you need to access the original UUID at some parts.
One other issue that breaks the unique UUID: you have access to calendars to persons that are also in the same meeting and this meeting has the same UUID obviously.

In the end no view should ever expect that UUIDs are uniquly availalable outside one calendar.

The current implementation does not distinguish between items from different calendars. I don't know if the source model has some mechanism to make UIDs unique if more than one calendar is involved. I assume that it doesn't. What is a sensible approach to deal with multiple occurences of the same TODO in different calendars? Displaying it multiple times would probably be confusing.

I know - That's why i created D587. But this was only a very hackerish way to solve this issue. As kolab wasn't focusing anymore on kdepim, this patch was forgotten and never was updated...
Especically for event/todo handling there is a lot of things written for kolab:
https://git.kolab.org/diffusion/KP/
https://git.kolab.org/diffusion/KPL/

the kolab/integration/4.13.0 branch.

Feel free to look what was done there. Some patches are not clean to go upstream- but you may get ideas...

well this is not allowed from the iCal standard. Do you have a valid test for that? 'Cause one iCal file is one calendar and within a calendar UUIDs need to be unique.

I already guessed that this is forbidden, but it still should not crash KOrganizer. According to the robustness principle, we should certainly never produce duplicate UIDs, but we have to be able to handle them even within one calendar. My approach is to ignore all but one item with the same UID, and if this item is removed from the source model, we present another one with the same UID.

crashes shouldn't happen: yes. But hiding randomly events with same UUID don't help either. Why you think, that the first event is that one the user wants to see? it doesn't make sense at all. In the end KOrgnazier needs to handle the case that same UUIDs are available. I think the best approch sofar is to use pointers to incidentence and not using the UUID at all on the code.

@dvratil : lol - that was my example, why Akonadi can't crate a realy unique UUID :D

dkurz added a comment.Feb 5 2018, 1:21 PM

For everyone except @dvratil: Dan's comment is referring to my following thoughts:

While working on this, I also came across the threading code of messagelib's messagelist model. Although I do not understand it yet, it seems to me that there are some similarities. Both that model and IncidenceTreeModel act on a flat source model of Akonadi items, and both impose a tree structure ("reparenting proxy"?). I wonder if my work on IncidenceTreeModel could also simplify messagelist's Model.

In that case, I'd like to port my current implementation ("class TodoTreeManager"), which is based on a binary relation on UIDs, to something more general ("class ItemTree") which would then model a binary relation on Akonadi items. However, I've not been able to figure out yet if Akonadi items can be stored persistently in a datastructure (think QModelIndex vs. QPersistentModelIndex). Probably need to dive deeper into Akonadi's model that is used as IncidenceTreeModel's source model, and specifically which guarantees it makes.

@dkurz: Regarding making this code generic and re-usable for mail threading: I'm not sure. Threading in KMail is highly customizable (including no-threading option) and the Model in MessageList is heavily optimized for mail threading and does a lot of assumptions about how threading works. The threading happens in several passes with various threading conditions considered in each pass, interrupts to keep the application responsive etc. I think if your code is generic enough, all the threading could be handled by it as well, but I'm really concerned by performance. It's one thing to build a tree of couple hundred todos (which are mostly a flat-list anyway) and thousands of emails with deep threads.

@dvratil: I do not intend to replace the threading itself. As I see it, it consists of two steps:

  1. Threading (which is actually many steps), or defining a binary relation on messages: Find a set of statements "Message xyz is parent of message abc", and
  2. Using that relation to build a tree datastructure.

In my opinion, these tasks should be separated, and ideally separately tested. My TodoTreeManager only does the latter.

dkurz added a comment.Feb 5 2018, 1:30 PM

I think the best approch sofar is to use pointers to incidentence and not using the UUID at all on the code.

That's roughly what I would do for a tree of Akonadi items (class ItemTree above). Problem is: There is no relation between TODOs to begin with. The relation is only defined via UID and RELATED-TO, so we have to choose somehow in case of multiple parent candidates. Assume we have

  • UID:A
  • UID:A
  • UID:B, RELATED-TO:A

Even if we display both A's, which one should have a child B? The first? The second? A randomly selected one? Both? None?

Here's a compromise between what I do currently and your objection: We pick one A to be part of the tree structure. Every other A is put somewhere else internally, but still presented to users of the IncidenceTreeModel. This way, we would not hide any items from the user, but the "ItemTree" would not have to deal with duplicate UIDs.

knauss added a comment.Feb 5 2018, 1:34 PM
In T7891#126577, @dkurz wrote:

For everyone except @dvratil: Dan's comment is referring to my following thoughts:

While working on this, I also came across the threading code of messagelib's messagelist model. Although I do not understand it yet, it seems to me that there are some similarities. Both that model and IncidenceTreeModel act on a flat source model of Akonadi items, and both impose a tree structure ("reparenting proxy"?). I wonder if my work on IncidenceTreeModel could also simplify messagelist's Model.

In that case, I'd like to port my current implementation ("class TodoTreeManager"), which is based on a binary relation on UIDs, to something more general ("class ItemTree") which would then model a binary relation on Akonadi items. However, I've not been able to figure out yet if Akonadi items can be stored persistently in a datastructure (think QModelIndex vs. QPersistentModelIndex). Probably need to dive deeper into Akonadi's model that is used as IncidenceTreeModel's source model, and specifically which guarantees it makes.

Please do not make more code depend on Akonadi items. Akonadi is a implementation detail and we want to make sure, that we will be able to replace Akonadi, (yeah it is long term goal). The idea is to have only a small layer that interacts with Akonadi and everything else is independend of Akonadi items. That's why you shouldn't need to dive into Akonadi at all :D I know some parts of KOrgnaizer are still using Akonadi items... That's is surely also a goal to reach to get rid of AKonadi items inside KOrganzier...

@dkurz: Regarding making this code generic and re-usable for mail threading: I'm not sure. Threading in KMail is highly customizable (including no-threading option) and the Model in MessageList is heavily optimized for mail threading and does a lot of assumptions about how threading works. The threading happens in several passes with various threading conditions considered in each pass, interrupts to keep the application responsive etc. I think if your code is generic enough, all the threading could be handled by it as well, but I'm really concerned by performance. It's one thing to build a tree of couple hundred todos (which are mostly a flat-list anyway) and thousands of emails with deep threads.

@dvratil: I do not intend to replace the threading itself. As I see it, it consists of two steps:

  1. Threading (which is actually many steps), or defining a binary relation on messages: Find a set of statements "Message xyz is parent of message abc", and
  2. Using that relation to build a tree datastructure.

    In my opinion, these tasks should be separated, and ideally separately tested. My TodoTreeManager only does the latter.
dkurz added a comment.Feb 17 2018, 6:51 AM

I fought my way through some messagelist code yesterday, mostly model(_h)?.* and item(_h)?.* I think they solve (many problems at once and) some of the problems that IncidenceTreeModel solves, too, but totally differently. I have a vague design in mind right now that might be used by both, but before I continue working it out any further, I have a few questions.

  • Is there any kind of benchmark for things like threading, grouping or simply populating of a messagelist? I saw claims like "this speedup here only works in 30% of the cases" or "this change gave me a 20% boost in population time". I thought "this quadratic code could be linear" in some places, but I'd like to test that. I could subscribe to some kernel mailing list and wait a few weeks, but that would be quite time-consuming.
  • My current vision involves as many QPersistentModelIndices as there are items in the resulting tree. Problem is, I cannot estimate its performance hit. I searched the internets for claims about QPMI performance, but didn't find any.
  • Where would I put code that would be used by both messagelib and eventviews? Most of the classes wouldn't be aware of Akonadi or even anything KDE related, and work with QModelIndices instead. Users of those classes would require one or two Strategy classes; those would go to messagelib and eventviews, so I'm only interested in where to put the general purpose flat-QAIM reparenting classes.
dkurz added a comment.Feb 17 2018, 6:59 AM

A solution for the multiple-instances-of-the-same-UID problem might be this: For every UID, there is a unique model index. However, we add a way to ask them for their containing calendars/collections. For example, we could add some ContainingCollections role, or an additional column, that reports a vector of collections. A view could then decide to

  • display an array of symbols/colored rectangles, one for each collection that contains the task, and
  • report conflicting information in multiple instances of the same event (e.g., the same task in two calendars, but with different parent tasks) in some way.

What do you think?

In T7891#128706, @dkurz wrote:

I fought my way through some messagelist code yesterday, mostly model(_h)?.* and item(_h)?.* I think they solve (many problems at once and) some of the problems that IncidenceTreeModel solves, too, but totally differently.

The entire MessageList::Core::Model and ::Item code is heavily optimized for the one and only case of threading emails. There is little separation between the storage (the tree of Items) and the actual threading, but once again: the performance gain here is very important.

  • Is there any kind of benchmark for things like threading, grouping or simply populating of a messagelist? I saw claims like "this speedup here only works in 30% of the cases" or "this change gave me a 20% boost in population time". I thought "this quadratic code could be linear" in some places, but I'd like to test that. I could subscribe to some kernel mailing list and wait a few weeks, but that would be quite time-consuming.

Usually you just put a lot of QElapsedTimers into the code to relevant places and use a large folder with complex threads to test it on (I usually use my fedora-devel archive and we also have a very large lkml folder archived from one other community member that we could probably use) and just measure how long it takes to open the folder and complete threading.

  • My current vision involves as many QPersistentModelIndices as there are items in the resulting tree. Problem is, I cannot estimate its performance hit. I searched the internets for claims about QPMI performance, but didn't find any.

Each QAIM has a QHash<QModelIndex, QPersistentModelIndexData>. Any change in the model means that the hash needs to be updated. For example QAIMPrivate::rowsAboutToBeInserted() iterates over the entire hash and looks for QMIs that will be moved by the insert and stores them in a list to be updated after the insert. For prepending a new row at the beginning of the model,the code will end up copying all QMIs from the hash to a vector and then in QAIMPrivate::rowsInserted() it will iterate over the vector again and update all the affected QPMIDs in the QAIM's QHash. Similar for all other changes. Having a QPMI for each row in the model means potentially a *lot* of QPMIs, thus a large hash and a lot of copying and iterating. That's one of the reasons why the MessageList::Core::Model is avoiding the use of QPMIs.

  • Where would I put code that would be used by both messagelib and eventviews? Most of the classes wouldn't be aware of Akonadi or even anything KDE related, and work with QModelIndices instead. Users of those classes would require one or two Strategy classes; those would go to messagelib and eventviews, so I'm only interested in where to put the general purpose flat-QAIM reparenting classes.

libkdepim sounds like a good candidate, it's used by both eventview and messagelib.

In T7891#128707, @dkurz wrote:

A solution for the multiple-instances-of-the-same-UID problem might be this: For every UID, there is a unique model index. However, we add a way to ask them for their containing calendars/collections. For example, we could add some ContainingCollections role, or an additional column, that reports a vector of collections. A view could then decide to

  • display an array of symbols/colored rectangles, one for each collection that contains the task, and
  • report conflicting information in multiple instances of the same event (e.g., the same task in two calendars, but with different parent tasks) in some way.

    What do you think?

You can't be sure that two instances of the same UID are actually identical events. They should be displayed as two different events and should be each uniquely identifiable and selectable.

To take this even further, calendaring *viewer* code should not get affected by duplicated UIDs at all. The iCal standard forbids that, but we also synchronize events from for example Google, which has nothing to do with iCal :)

It is OK to rely on Akonadi ID even in the higher-level, which is an absolutely unique identifier. Yes, there is a plan to have a proper separation between logic and Akonadi, but that's so far in the future that we should approach it practically and not hinder any development because of it (but of course we try to avoid leaking Akonadi into new high-level code if possible).