This is the result of three weeks of experimenting how to get rid of glFinish() on OS X and this is the only/best solution I can come up with.
The glFinish() is still in place in KisOpenGLCanvas2::paintGL, but it's usually no longer called (only very sporadically like once per minute and during startup).
The logic extends KisOpenGLCanvas2::isBusy to usually work around having to call glFinish(). It does this by tracking the texture upload status using an Apple specific call for tracking the texture status.
A major problem is that I cannot verify this solution as I can no longer reproduce the original problems with removing glFinish() altogether. I did see the problems sporadically happening two weeks ago but can no longer see them on any of the three Macs I test with. I also tried bisecting git up to August without success. I still believe there is a problem so I would suggest against removing glFinish() altogether at this point. I wonder if we could seed this to other Mac users.
As for the benefit, glFinish() currently takes about 10% in the main thread (that takes about 50% of the total load), which means about 20% of the time in the main thread is blocked inside glFinish() on OS X, which is a problem for event consumption.
Other things I tried and abandoned:
(1) Blocking texture upload altogether if an update event has been scheduled via QT so that paintGL will never be in a wait state. This added complex, brittle logic and sporadically failed to upload textures for reasons I do not understand.
(2) Using a standard OpenGL fence for detecting when the texture upload is finished; this did not work as the fence did not get signaled, neither with glFlush() before or after the fence.
(3) Using double buffering (i.e. two textures for each tiles). This sounds like a charming idea at first until you recognize that you need to accumulate changes, i.e. if you have two double buffered textures T1 and T2, the changes that arrive for T1 need to be later reapplied to T2 in addition to the changes for T2. The logic gets complex very soon.
(4) Moving the OpenGL rendering to a separate thread. Basically not possible as long as QT hogs the OpenGL contexts for its compositing of widgets in an unpredictable manner. I also investigated moving the whole OpenGL context into its own QOpenGLWindow but that had lots of other problems.
(5) Moving the fence done detection logic to a separate thread to work around polling. Only possible on platforms that support multithreading with OpenGL and shared contexts. You would need a shared OpenGL context that shares fences with the main context. You also need to duplicate the whole polling logic for platforms that do not support it. After two days of investigating this, this seemed like a whole lot of effort for very little benefit.