In this development diary, I provide an update on progress made on Koto in the second half of March 2021!
My goals for the second half of March for Koto were centered around finally playing back our local music, fleshing out the playback engine and hooking up various parts into the UX components. Substantial progress has been made on those fronts and it is almost in a state where I can actively dogfood it (as in, use it myself).
During the second half of March, my objective was to start building out the KotoPlaybackEngine with functionality for loading new tracks from our file system and our global HashTable of KotoTracks, play them back, perform seeking (jumping to different parts of a song, as an example), volume manipulation, and more. In my opinion, using Gstreamer to accomplish this was a no-brainer. GStreamer is one of the most well-known and actively developed open source multimedia frameworks. Gstreamer provides a versatile model for the creation, management, and manipulation of audio and video pipelines, in addition to providing an abundance of plugins for features / functionality such as:
- Audio amplification
- Audio playback rate manipulation
- 10 band and 64 band equalization functionality
- Interleaving (mixing) multiple audio streams
- HLS livestream sinks and demuxers
To start out, I leveraged GStreamer’s playbin, a
GstElement capable of automatic file recognition and demuxing, buffering, and volume management. This playbin is provided the local URI to the KotoTrack’s file, immediately buffering it until it is ready to be played back, communicating varous relevant messages over its Bus, for example:
- When the duration of the audio stream loaded is determined / has changed.
- When the file loading is complete, so we know when to start playback.
- When we have reached the end of the audio stream (like the end of a song).
- Various state changes such as playing, paused, and pending.
This pipeline bus messaging provided an avenue for setting, tracking, and updating various state in our new
KotoPlaybackEngine class, as well as emitting useful signals to allow for communication with any parts of the UI that may find new state relevant. Some places we use it are:
- Updating the Playerbar’s GtkScale widget minimum / maximum values based on progress in a track or the duration of it.
- When a track is paused or starts playing, updating Koto’s Playerbar play / pause button iconography.
- When we reach the end of the track, not only should we get the next one in the current playlist, but also communicate the new track details to the Playerbar.
I was very quickly able to get basic playback up and running with this, with the path to this currently being:
- Clicking on an album in the KotoAlbumView for a given artist will generate an ephemeral (temporary) playlist, setting it to the current playlist in our
- This class emits a change to all listeners, notifying that we now have a new current
KotoPlaybackEnginehas a listener for this change via
koto_playback_engine_current_playlist_changed, which will immediately get the “next” (first) song in the
KotoPlaylistand start its playback by calling our
This was pretty exciting to get working the first time, but having to listen to the same song over and over again while testing an album can get old pretty fast, so I immediately hooked our
koto_playback_engine_forwards methods into the set track method, tied it into the user experience, and voila we were able to go back and forward in the current playlist.
This did pose an interesting problem however, which was the duration sometimes would not immediately update, even after playback started, so using my track tick timer as a reference point (this timer emits a signal every 100ms when a track is playing, ensuring our progress bar updates accordingly) I did roughly the same, though with a slower tick rate. It usually works though there seems to still be some files it can’t quite figure out the duration for. More dogfooding and testing is needed on that for sure.
Performing audio seeking is done through our
koto_playback_engine_set_position, which calls GStreamer’s “simple” seek method to quickly set the position relative to the start of an audio stream. Not only do we use it for setting the position when we are scrubbing through our position scrollbar (or at least trying, I will get into that later), but for setting it back to zero when we have song repeating enabled.
Yep, song repeating is implemented already as well, in case you really wanted to make sure everyone knew Rick Astley was never gonna give you up and never let you down.
As I talked about in Dev Diary 4, one of my biggest frustrations with basically all shuffle functionality in audio players is the fact that none of them are truly random. Some players seem to pre-seed playlists so they end up having a “random” yet actually predictable order (I get this a lot with various Spotify playlists) or they can result in playing the same songs multiple times (looking at you, YouTube). I addressed this by never having a pre-seeded value for playlists and we keep track of our played songs in a
KotoPlaylist. Overkill? Maybe. Worth not being frustrated by the feeling of being able to predict the future when I in fact cannot? Absolutely. This has been put to the test in our
Volume management is pretty self-explanatory. Click the button, move the slider, volume changes. Probably needs no further explanation.
Just having to get this rant off my chest because I imagine I am a fairly early adopter of GTK4 and it has some considerable teething pains / regressions compared to GTK3, primarily around its Event Controller and pre-built gestures.
You know how earlier I said “trying” when it comes to trying to scrub through our GtkScale (builds on the GtkRange)? Well let us dive into that a bit.
When playing back content in any sort of player, a progress bar will update as you watch or listen to it, which is useful for indicating your position relative to the start and end of the content. If you click somewhere on the progress bar, user experience conventions inform you that it should jump to that position you clicked. If you press and hold with left click, then drag the thumb, you expect it to stop updating the progress bar and put the “thumb” underneath your cursor. Once you let go of it, that is where it should stay and it should (re-)start playback from that position.
To handle that in GTK land, you can either use signal blockers, or some sort of logic to prevent updating the value from the respective function (in our case,
koto_playerbar_handle_tick_track prevents calling with the updating progress if we have a
is_progressbar_seeking boolean set to
To do this in GTK3, you would just connect to the
button_release_event signals on the
GtkScale. On press, you set the variable to
TRUE, on release you seek based on the new position then set to
FALSE. Easy. No song-and-dance. At least in my experience it just worked.
In GTK4, you no longer have those events / signals as part of
GtkWidget. Instead, you solely have a “controller” model and some default “gestures”. The closest parallels to that prior model would be, at least you would think,
GtkGestureLongPress. Click is for handling a single mouse button or touch point, LongPress is for pressing and holding (typically touch).
Except neither of those do it. When pressing to drag the thumb, GtkGestureClick will emit
pressed like it should. However, it also creates a timeout set to the value of GTK’s
gtk-double-click-time, which by default is 400ms. Did not release within 400 milliseconds? Too bad. The
stopped signal is emitted regardless. Now you do not really know when it is actually released, because the
stopped signal is emitted and
unpaired-release is not any help either.
Okay, that is fine. After all, it is meant for short clicks and double clicks. Clearly if we wanted something longer, that is what long press is for!
Nope. With GtkGestureLongPress, the only thing you have control over is the “delay” factor. If it exceeds the delay, then
pressed also gets emitted even if you are still pressing. Moved away from the area you first “pressed”? The
cancelled signal will get emitted. Even if it did not emit the cancelled signal, for the delay you can only set a maximum of a 2.0 factor for it, which is multiplied by the
gtk-long-press-time of 500ms. So a maximum of 1 second. So that is no help either.
Fine. It is a GtkRange after all. That is basically a fancy horizontal scrollbar, right? So surely I could just use the
GtkEventControllerScroll event controller and their
scroll-end to know when we stopped scrolling the range!
You guessed it, answer is no.
Okay, if it is none of those, then I guess you are just dragging the thumb right, so maybe it is a
GtkGestureDrag and you can just handle
drag-end, right? No.
What about the primitive
GtkGesture signals for
end that are leveraged in the other gestures? Well
end would get called when the
stopped does, so not beneficial either.
I think you get the point. I did too. So right now the scale is just gonna spaz out a bit until I either write a custom event controller, leverage GtkGesture myself, or do something like have the progressbar stop updating when you move your mouse into the area, and start again when it leaves. That will not help any touch usage though, which is slightly frustrating even though it is not something I am catering towards to begin with. Some form of weird obtrusive re-working. We will get there in the end.
This was all pretty late into a stream though and thought I would stop wasting further time on it for now and get to more fancier things again. Maybe someone can propose a magical solution that solves all my problems (plz).
Since I was still in the land of the Playback Engine and not quite ready to leave it, I started work on MPRIS support. For those not familiar with the specification, the “Media Player Remote Interfacing Specification” provides a standardized interface for communicating media player state and manipulating it. If you have every used Budgie’s Raven to change songs or play / pause, with fancy album art and all, that is all communicated to a MPRIS-supporting media player.
To facilitate this, the media player creates a D-Bus “Server” to serve clients (in my case Budgie) and respond to their client method / API requests. As an example, a player must implement methods for going to the previous or next track, toggling playback (PlayPause), explicitly pausing / playing / stopping playback, etc. Beyond this, the
org.mpris.MediaPlayer2.Player D-Bus interface must expose a read-only property called “Metadata”, providing any supported metadata it can to the client, such as a “url” to artwork, track information, artist / album info, and more. This would then could be leveraged by the client, like creating a fancy user experience in the best desktop environment on Linux (giggles) for controlling it.
There is still considerable work on supporting this in Koto but I expect this to be implemented over the next couple weeks, at which point I think I could probably start actively dogfooding Koto, playing music from Koto while working on it.
During the first half of April, my goals are:
- Implementing MPRIS and MediaKey (
- Polishing more of the playerbar design like the track info.
- Start working on graphical playlist management (was intending on doing this during second half of March, but more time was spent on the playback engine).
The later is a fairly large item that I imagine will easily push into the second half of April. It provides the backbone of a considerable amount of the user experience, even if it is transparent to the user.
All development streams happen on my Twitch every Tuesday and Thursday from 12pm-5pm GMT+3 / EEST (Eastern European Time). Remember the daylight savings time change.
If you miss these streams, I upload all of them to Odysee so be sure to check them out! I am part of Odysee’s Viewer Rewards Program, so if you have an Odysee account, you can now get a daily watch reward of a bit of LBRY Credits (LBC) when you watch my videos, and I get some too!