One of our goals for 2020 should be to get animation audio ready for non-experimental support on all supported platforms.
In the event that we can't get QtMultimedia to work, I think we should focus on using a simple, cross platform, low-level audio API instead of one focused on media playback. As of now, the two best options in that category seem to be SDL2 and PortAudio. Both are relatively low level and simple, and both allow us to pass a simple buffer of audio samples to an audio device using either "pull" (asynchronous callbacks) or "push" (synchronous blocking functions) I/O methods.
QtMultimedia
What we're already using, and part of Qt. Has options for both high-level (QMediaPlayer) and low-level audio (QAudioDecoder + QIODevice).
Apparently we've got issues between AppImage and GStreamer on Linux--we need to see if those can be fixed or whether we can run an alternative backend before committing to a library switch.
We should also consider switching from the high-level API to the low-level one, as it will give us better control of exactly how audio we pass audio to the device and allows for greater possibilities (mixing, stretching, dsp, etc.).
SDL2
SDL2 is a bigger library and contains a lot of functionality that we don't need, but subsystems can be initialized and used individually, allowing us to use it for audio file I/O and streaming only. (https://wiki.libsdl.org/SDL_Init). SDL2 supports Windows, MacOSX, and Linux, as well as Android and iOS.
SDL uses the zlib license.
PortAudio
PortAudio is a smaller library that's focused only on providing a simple, cross-platform interface for streaming audio; opening audio streams and pulling/pushing sample buffers to them. We would probably need to use Qt for audio file I/O. While PortAudio supports Windows, MacOSX and Linux, another potential drawback is a lack of support for iOS and Android--platforms which we don't fully support yet, but are potential areas of expansion.
PortAudio uses the MIT license.
Other Considerations
Q: "Pull" vs "Push" streaming?
Many low-level audio libraries allow for "pull" and "push" methods of streaming. In the "pull" method, the library invokes an asynchronous callback function whenever the audio stream is ready to receive a new buffer of samples, while with the "push" method audio buffers are provided to a stream using a synchronous/blocking function call.
For our needs, which requires us to synchronize audio buffers to animation frames both in and out of order at uneven intervals (when scrubbing), I think that the "push" method might be a better fit. Each time we change frames, via scrubbing or playback, we would queue a buffer of N stereo audio samples. (Where N is secondsPerAnimationFrame * audioSampleRate).
Q: Latency Compensation?
There is always some latency involved in audio playback, which varies depending on the efficiency of the OS's host API. In order to keep our visual animation in sync with our audio, it may become necessary to compensate by adding L milliseconds of delay time to the video playback. (Where L is determined by the one-way trip latency of a buffer of N samples through a given host API, and would probably end up being less than 1s.)
Q: Waveform Visualizer?
It would be really helpful to see some kind of visual preview of the audio track on the Timeline Docker, so that the animator can see exactly where the transients and peaks lie in relation to the framerate of their animation. We could render this once, when an audio file is opened.
Q: Playback Speed?
We should preserve the users ability to playback their animation at different speeds, audio stretching included.
Right now audio shifts pitch with speed (repitch), worth considering other stretching options?
Q: User Interface?
We need to take a hard look at how an audio-synced animation workflow feels in Krita.
Everything from adding an audio track, to navigating the timeline, to placing keyframes in time with the sound should feel good and easy to use.