Featured image for Dev Diary 1: Koto - Foundations

Dev Diary 1: Koto - Foundations

January 25, 2021

In this development diary, I lay out some fundamentals of Koto, a new in-development audio manager for Linux.

#Background

I have been daily driving Linux since 2008 when I first started using Ubuntu 8.04. Like everyone, my preferences for various applications has changed throughout the years. Where Totem used to be my preferred video player, GNOME MPV / Celluloid eventually took its place. OpenOffice was dethroned by LibreOffice. The myriad of editors and IDEs was eventually distilled into Visual Studio Code. Where Pidgin used to be the hub of all things instant messaging: Facebook Messenger, Google Talk (remember that thing?), IRC, Skype and more; now has been largely replaced by silos and platform-specific clients.

#Streaming

For audio players, my journey has been akin to a game of Frogger, jumping from app to app and trying to dodge the various discontinuations of them, as well as music streaming / music purchasing services. In the last decade, We have seen a significant shift in how we consume media, moving from an age of binders full of CDs to iPods capable of holding hundreds of songs to being able to stream almost any song to our devices in a few clicks. The ability to stream music is not without its downsides. I am not going to spell out the obvious and talk about the requirement to have a network connection, which billions around the world still do not have access to. However, I would like to highlight a couple other significant downsides of streaming services.

#Access is Ephemeral

Ephemeral access applies to numerous aspects of streaming services. For starters, there is no guarantee that the service you are streaming from is going to exist the next day. Maybe it gets bought out by another company or gets shut down because it just was not making enough profit for some big shot investors. All those playlists you have collected could be gone the next day, with no means of exporting the data if there was no open API or applications that supported it. All those songs you discovered and "liked" are suddenly forgotten unless you happened to mirror those to other services or bought the music. Whether that is Grooveshark, Zune Marketplace to Xbox Music to Groove Music, Ubuntu One Music, Rdio, Google Play Music morphing into YouTube Music. Honestly, I could go on. Here is a section in the Terms of Service of YouTube where they explain exactly that.
YouTube is constantly changing and improving the Service. We may also need to alter or discontinue the Service, or any part of it, in order to make performance or security improvements, change functionality and features, make changes to comply with law, or prevent illegal activities on or abuse of our systems. These changes may affect all users, some users or even an individual user. Whenever reasonably possible, we will provide notice when we discontinue or make material changes to our Service that will have an adverse impact on the use of our Service. However, you understand and agree that there will be times when we make such changes without notice, such as where we feel we need to take action to improve the security and operability of our Service, prevent abuse, or comply with legal requirements.
If the service is yet another prong of a mega-corp like Apple, Google, Microsoft, etc. and you happened to even possibly violate their Terms of Service for a completely different product they offer, say goodbye to your account and your access to every product in their portfolio.
YouTube may suspend or terminate your access, your Google account, or your Google account’s access to all or part of the Service if (a) you materially or repeatedly breach this Agreement; (b) we are required to do so to comply with a legal requirement or a court order; or (c) we believe there has been conduct that creates (or could create) liability or harm to any user, other third party, YouTube or our Affiliates.
But hey, you are a good lady or gent. You give the mega-corp money every month and keep on the straight and narrow. Does that mean the music you listen to is sticking around? Not at all. Music licensing can and does expire, one company can decide the other is paying too little, or the other things they are paying too much. The music is gone, you lost access to it, and chances are it just disappears and you may have no idea what the song was to begin with. Here is a screenshot of my "Good Music" playlist on YouTube where that is exactly the case. Access is ephemeral and should be treated as a luxury.

#Privacy

Privacy Polices can be helpful in communicating the what, where, when, why, and how to data collection, usage, and sharing. Whether you read them or not before you sign up to a service, you agree to them. That data collection enables these companies to curate their content portfolio to you, offering you suggestions for new music and targeted ads from third-parties. As part of Spotify's Privacy Policy, they state:
We work with advertising partners to enable us to customize the advertising content you may receive on the Spotify Service. These partners help us deliver more relevant ads and promotional messages to you, which may include interest based advertising (also known as online behavioral advertising), contextual advertising, and generic advertising on the Spotify Service. We and our advertising partners may process certain personal data to help Spotify understand your interests or preferences so that we can deliver advertisements that are more relevant to you.
In the event Spotify is acquired or is even in negotiation to sell, they have the right to share your personal information as well.
We will share your personal data in those cases where we sell or negotiate to sell our business to a buyer or prospective buyer.
None of this should come as a surprise. Spotify has made billions of euros annually off hundreds of millions of users. That is a lot of data, a lot of advertisements, and a metric butt load of partners with access to that information. I will be the first to admit that music discovery is one of the biggest reasons I have used streaming services like Spotify. My library of music would not be half as broad as it is now if it was not for it. That being said, in my opinion it does not justify the quantity of personal information these companies are capable of collecting, and relying on their "good will" to not abuse what they have or broaden it under the guise of prior consent is not reasonable. We are all trying to find the right balance for ourselves on what data we are okay with sharing. For myself, taking back ownership of "my music" and my data is one of the biggest reasons I am reducing my reliance on music streaming.

#App Preferences

Before I start getting into examples and talking about my preferences for apps, I want to state for the record that I have nothing personal against any of the developers of the apps I am referencing. Whether or not the application is still developed, I am happy that I was able to experience them at all. Some of my reasons for not wanting to use various applications may be more minor than others, but regardless it all came down to "nothing checked all of my boxes, so I am building one myself". Welcome to open source.

#Building for the Desktop

I use Linux on "modern home computing" devices every single day. These devices, being my laptop and desktop, have what I could only describe as a luxurious amount of screen real estate. Across the three monitors on my desktop, I have a total resolution of 7360x2160. On my laptop, I have a 1920x1080 screen. So I am sure you can understand my frustration when I launch a music player only to be greeted with an experience that is clearly designed for mobile and just happens to scale up. This was my experience with Lollypop, an application that has an over reliance on a sidebar, with a user experience that when scaling down, turns said sidebar into the only thing you see. This is a side-effect of the developer choosing to use libhandy, a library developed by Purism to "help with developing UI for mobile devices using GTK/GNOME". This mobile-oriented library is one of the biggest drivers for me to develop new desktop-focused Linux applications. Building on this, there is too much of an emphasis on artist and album art, that in my experience, actually made it harder to find the artist I was looking for quickly and introduced weird side-effects such as using completely inaccurate artwork for Solus, seemingly from an external service called "FANART.TV". There is a balance to be had for mixing artwork and the ease of browsing artists quickly, something I think GNOME Music actually strikes well in their Artists view. Many applications (Lollypop included), use a Headerbar as the primary means of controlling playback, leaving very little room to actually grab and move the window. The location of these playback controls feel unnatural coming from an experience where the primary means of application switching is either alt+tab or the bottom panel in Budgie, which places my mouse in a location with minimal travel to the desired playback controls. If you are used to using applications like Spotify or Rhythmbox with the Alternate Toolbar plugin (which has shipped by default in Solus for years and is one of the best ways to make Rhythmbox feel modern), it is going to feel even more unnatural. My views are:
  1. Headerbars should not be abused. Per GNOME's own Human Interface Guidelines, "ensure that there is room for a headerbar to be dragged". Stuffing as much functionality and control into the Headerbar is not going to make for a good user experience.
  2. Full HD (1920x1080) makes up over 20% of the global market. There is a limit to how scaled down of an experience you should cater to and is especially true for a modern home computing operating system like Solus. As time goes on you are less likely to see devices shipping with low resolution screens, aside from Chromebooks and laptops on the extreme lower end (like some Lenovo IdeaPads). Focus your attention more on higher resolutions, leveraging the real estate for functionality and using negative space to drive focus towards various elements of the user experience.
  3. The priority when browsing music is artists, followed by albums, ending with individual songs. If you need to find an album or song without remembering the artist, search should be leveraged instead of dedicated album and song views. For playlists, the priority should be chronological, with most recent additions first.

#Limited to Music

One of my biggest problems with music players is exactly that, they are typically oriented towards playing music. Duh, I know, it is in the name. However, I am not looking for just a player for my music. I do not want three pieces of software for handling audiobooks, music, and podcasts. They are all indexable media files, with similar requirements and some format-specific features. Rather than building out a completely different application, why can I not just have my audiobooks know my last known position and let me pick up where I left off? Same goes for podcasts. Instead of re-implementing OAuth and misc. API key / token support over-and-over in individual apps (even when using liboauth), why not have a part of a single codebase that I can leverage to support multiple services and integrate from there?

#Multi-Device Support

A huge benefit to streaming platforms is they offer a centralized storage of content that can be accessed across multiple devices. You just log in and done, you can start streaming music. I elaborated earlier on the fact that access is ephemeral, so you can not guarantee that you will continue to have that level of convenience on a day-to-day basis. Fortunately, portable device synchronization has existed for around 20 years now, with Windows Media Player 7 being one of the first (if not the first) to offer it in 2000. In fact, Windows Media Player supported tons of "power user" features that are harder to come by in the age of music streaming clients, such as:
  • Optional automatic transcoding to formats which support the plugged in device, which was pretty useful in the late 1990s and early 2000s when just about every company you have heard of started producing one, typically with vendor-specific proprietary formats or compression (Sony had one called ATRAC, as an example).
  • "Auto Sync" introduced as far back as 2004 enabled you to specify with granularity the music you would want to synchronize, such as high rated or new music. Newer releases offered the ability to copy from the portable device back to the PC in question. You could even sync between multiple PCs.
Unfortunately, device sync support is becoming rarer in audio managers / players these days. GNOME Music seemingly lacks any form of device synchronization. Some have arbitrary restrictions, like elementary Music's inability to sync multiple playlists. Melody / PlayMyMusic, another "designed for elementary" app, supports MTP but lacks libimobiledevice integration to support newer iOS devices like the iPod Touch. It is no wonder that if you make it harder for people to quickly and even automatically synchronize between their PCs and mobile devices, they are going to gravitate towards cloud services instead. It is time to bring back good synchronization options. Rhythmbox does it quite well with drag-and-drop support but it should be more effortless than that.

#Sharing is Caring

One of my favorite things to do is share to social media new songs I discover or get curated for me that I like. Not necessarily every day mind you, but enough that being able to share the artist and song and possibly search for a reference on another service right from my audio manager would be beneficial. Most music players have integrated support for Last.fm or Libre.fm "scrobbling", which is "a way to send information about the music a user is listening to", with Last.fm supporting various cloud services like Pandora and Soundcloud via browser extensions. Some have forged their own path for sharing activity, like Spotify and their Friend Activity panel. Music is part of our culture and this is our modern form of passing around CDs and mix-tapes.

#Technologies and Other Nitpicks

Honestly, some of my biggest issues with some music players on Linux is the underlying languages or libraries they use. Generally speaking, I am against using software which leverages Granite or libhandy. I do not want to imply either of those are necessarily bad libraries, I think they actually both have great value-add, such as:
  • Granite makes building welcome screens and in-app notifications easy.
  • LibHandy makes building preference windows and pages painless. Their HdyPreferenceGroup/Page/Row/Window, HdySqueezer and HdyViewSwitcher are widgets that frankly should have been implemented for GTK4, yet were omitted in such a "next-generation" toolkit that even GNOME developers end up using Libhandy in their applications.
However at the end of the day, Granite is designed for elementary OS and Libhandy's aim is to "help with developing UI for mobile devices". I am not building an application that is specific to an operating system in this specific case, I am building a Linux application. I am not building an application that caters towards a handful (pun intended) of devices running PureOS, I am building an application for modern home computing devices. If you want to build an application that re-use major UX components across mobile and desktop, then libhandy is there for you. If you're building for elementaryOS (no shame in that), Granite is the de-facto standard on their platform, use it. I'm doing neither, so simply neither make sense in my use case. Some applications are written in Python, such as Lollypop. Others, namely GNOME Music, are written in JavaScript. Both of these languages are interpreted languages. In the case of JavaScript, it lacks a static type system (for that you need TypeScript and its compiler tooling). GJS uses ESR versions of Spidermonkey, which means it can lag behind in adopting new features in ECMAScript until a new GJS moves to a new mozjs, while leaving performance improvements largely in the hands of the version of Spidermonkey that gets used. Python is in many ways type safe. However it is dynamically typed, with the interpreter freely converting from one type to another when necessary. In my years of experience as a maintainer of an operating system and unfortunately of tooling that is written in Python, I have learned that there is just a certain level of risk involved when using an interpreted versus a compiled, linked language, so I am prone to avoid software written in Python when I can. They are not bad languages, just not my cup of tea. Some other nitpicks are simpler but collectively annoy me enough to drive me to make my own audio manager:
  • Some applications such as GNOME Music and Melody (PlayMyMusic) lack separate volume control, which is trivial to implement (I can say that with some authority as the person that wrote the per-application volume management in Budgie's Raven).
  • Most applications lack search experiences that offer filtering.
  • GNOME Music requires the use of the Tracker indexer, which has been known for high CPU usage during certain indexing operations.
  • When closing elementary Music, the application does not in fact close it, but rather it runs in the background and the music will keep playing. To be fair, this is "by design" per their HIG, however runs counter to the user experience of most applications on any other operating system. When we had elementary Music ("Noise") in the Solus repository, I ended up patching it to not have that by default.
  • elementary Music lacks gapless playback, which some artists design music around, with elegant fading from one track into the next.
  • Melody lacks an equalizer, nor does it have integration with Last.fm or Musicbrainz.
Unfortunately over the years, many music players / audio managers have come and gone.
  • Amarok's last release was in 2018 and development has more-or-less come to a stop over the last 5 years. They have a new maintainer, which is fantastic, however by-and-large KDE folks have moved to Elisa. Hopefully we'll see a proper revival of Amarok though to provide KDE folks more options!
  • Banshee's last release was in 2014.
  • Clementine Music Player's last release was in 2016.
  • Melody has not seen any active development in a year, primarily new commits are translations.
  • Songbird was discontinued in 2013 and its fork Nightingale only lived for year more.
  • Tomahawk stopped seeing development in 2017 with its last stable release occurring in 2015.
As time goes on, it is possible we will see more discontinuations of music players / audio managers. Not to be some doomsayer, but the doors are closing and it is time to make a new one.

#Koto

Koto is my desired amalgamation of audiobook software, music players, and podcast managers. It is intended to be unification of the benefits of both streaming services and local content, as well as a bridge where necessary. It is designed for and caters to the desktop. Development and focus may be opinionated, however we will actively use user polling to determine what features are highest priority and need the most (or least) attention. In fact throughout this section, I will be referencing the first "Gaining Insight" I did will all of you via Patreon!

#Focus / Pillars

The primary pillars / focus of Koto are:
  • Exceptional local and remote search capabilities.
  • First-class portable device and multi-PC synchronization.
  • Management and playback of audiobooks, music, and podcasts.

#Supercharging Your Local Experience

While the primary focus will always be on local content, providing bridges to third-party services will be important to migrating away from cloud services and/or supercharging local content consumption. For audiobooks: Audible purchase downloading will be integrated, as well as the LibriVox public domain audiobook platform. For music The intent is to support a myriad of Spotify APIs (unfortunately requires the user to have Spotify Premium) to help facilitate a migration off the platform, such as exporting playlist information to help rebuild your personal profile in a private context. Musicbrainz will be supported for ID3 / ID3v2 metadata generation and management for music. Other services which provide an API may also be supported in time. Internet Radio Stations will be supported, for example soma fm. For podcasts We will leverage the iTunes Store / Search API to help you discover podcasts, in addition to natively supporting podcast Atom feeds.

#Home

Koto will feature a Home dashboard that shows what you recently listened to. If you were listening to an audiobook and switched over to music, you will be able to pick up where you left off. Jump right into listening to your morning podcasts and see what is upcoming (episodes you have not listened to, the next chapter of an audiobook, etc.). Once automatically generated playlists are implemented, those will have a place on the Home screen as well!

#Device Sync and Casting

Koto will feature a unified device synchronization section, with both "global" and per-device sync configurations. You will be able to choose specific audiobooks, artists, and podcasts to synchronize to portable devices. If you use multiple PCs, in the future we want to provide the ability for Koto to synchronize against a centralized self-hosted backend such as Nextcloud, automatically updating its index when there are filesystem modifications to the centralized backend store. I also want to bring Koto to the forefront of the modern smart home, eventually featuring integration with third-party smart speakers, TVs, and more to enable you to easily cast and share content to those devices! I was surprised by the amount of users in our polling that have used casting capabilities to cast music, so I look forward to bringing Koto front-and-center for enabling that.

#Music

Library Koto's Local Library will offer a hybrid list and grid layout similar to other music players like GNOME Music. Artists will be provided in a vertical list, allowing you to quickly scroll to the artist you want. When you have an artist selected, we will present a view showing off the artwork of each album, alongside its songs. If you have multiple albums from artist, we can optionally show a "jump banner" to jump you to the part of the view with that album and its songs. If you are a bit more of a traditionalist, Koto will provide an option to have your content sorted as a table instead of our modern view. Playlists Playlists will come in three forms:
  1. A self-curated, persistent playlist that lasts across play sessions.
  2. A session-specific playlist that can be used as a temporary queue.
  3. Automatically generated playlists based on your specifications.
Playlists are not limited to music. You will also be able to add audiobook chapters and podcast episodes, making it perfect for a morning or late night routine. Speaking of which, you will be able to designate one or more playlists to be recommended for different times of the day, and leverage automatic generated playlists to specify artists, albums, songs, or a episodes of a variety of podcasts to make sure you can keep up with fresh content. For the over half of individuals polled that periodically or frequently curate their playlists, I am sure this is something that will be appreciated. Other Options and Functionality Music is not mastered for our speakers, headphones or earbuds. But that does not mean we cannot provide an equalizer to make it sound incredible on them. Two our of three individuals polled answered that they have used in-app audio equalizers in the past and for music is makes perfect sense.

#Search

Koto will feature an exceptional unified search experience that will prioritize local indexed audiobooks, music, and podcasts; with support for searching portable mounted devices. For connected external services, Koto can perform a deferred search as well. Search will also play a key part in discovery, an area we will explore heavily throughout early development, however more thought is required on how to ensure content discovery is done in a manner that respects your privacy and keeps the focus on maintaining local content.

#Technologies

Throughout early development, I will be developing prototypes using both Enlightenment Foundation Libraries (EFL) as well as the latest generation of GTK, GTK4. Depending on the solution used, we will be either using GStreamer directly, or through an intermediate library such as Emotion / EFLPlayer, for multimedia playback. Regardless of the solution, Koto will be written in C, with a borderline obsessive focus on memory profiling and optimization through tooling like Sysprof and Valgrind. For the backing data store, we will be using investigating both Postgresql as well as Sqlite (leaning more towards Sqlite).

#Priorities

The "Gaining Insight: Audio Management and Listening Habits" polling was very informative on providing me a clear sense of priorities for Koto development. Naturally building a solid, format agnostic platform for audiobooks, music, and podcasts, will be the highest priority before moving onto user-focused feature-sets.

#Breakdown

Audiobooks and Podcasts Feedback on if an individual would be more likely to listen to audiobooks if they were directly integrated into Koto was about middle-of-the-road, with a little over 1/3 saying they would, 1/3 saying they would not, and another roughly 1/3 being undecided. My interpretation of this data is that it should not be the highest priority item, however should be one of the primary focuses after the bulk of music-related functionality is implemented. Podcasts tell a different story however. Over half of individuals polled (58.1% precisely) said they would be more likely to listen to podcasts if they were integrated into Koto. My interpretation of this data is that it should be one of the highest priority item after the bulk of music-related functionality is implemented. Content Discovery and Purchasing Over half of individuals polled (54.8%) have purchased music through an audio player in the past. 75% of individuals polled stated that they would be more likely to store music locally if discovery services and purchasing was integrated. This is pretty promising to me, as someone that wants to migrate away from streaming services as well. I cannot speak to where in the priority list this will go yet. I still need to explore third-party platforms that offering both purchasing capabilities in addition to APIs to facilitate it, security implications of that, as well as privacy implications and concerns of content discovery mechanisms. Ideally I would like to build or support an open platform that allows folks to provide their recommendations for artists and leverage scrobbling capabilities to know what artists we should recommend, but more thought needs to be put into this. Clearly folks find music discovery to be important. If you have any ideas, I would love to hear from you. Device Synchronization 71.4% of individuals polled stated they would be more likely to store and play music locally if it could be more easily synchronized between devices. To me, accomplishing this would significantly reduce the benefits of multi-device streaming services and allow us to not only take back our data, but curate our own experience as well. So this is a high priority item. Gapless Transitions 66.7% of individuals polled stated that they do not have gapless transitions enabled. This is one of those items that may be addressed when implementing the playback engine, but is less of a priority. Lyrics According to polling, the majority of users rarely or infrequently use song lyric support. This will be a very low priority item. Self-Hosted Streaming Integration Something I was especially surprised by was the response to how important self-hosted streaming solution integration is. Less than 10% of users indicated that it was somewhat important to very important. This will be a low priority item. Tag Editing In-app ID3 / ID3v2 tag editing for songs swung more towards being less important. This is a medium priority item, I am considering it more important despite the polling since ID3 support plays a big part in correcting metadata mistakes and ensuring browsing experiences and search remains usable.

#Rough Order

In a rough order of priorities for Koto, it is as follows: High Priority
  1. Building an agnostic multimedia indexer and playback engine, with support for choosing which directories hold audiobooks, podcasts, and music. By default, Audiobooks and Podcasts will be outside XDG_MUSIC_DIR in your XDG_HOME_DIR (home directory) since they are not "music". However, you will be able to put audiobooks and podcasts in their respective dedicated directories inside XDG_MUSIC_DIR or other arbitrary locations should you desire.
  2. Modern browsing view for music.
  3. A multitude of functions in the playback engine for equalizing, playback speed, and volume.
  4. All things playlists
  5. Device synchronization and granular controls.
  6. Search with local support
Discover capabilities is also somewhere in high priority. Purchasing is TBD. Medium Priority
  1. Podcasts with iTunes Store API and generic Atom feed support.
  2. Audiobooks with LibriVox support
  3. Search with remote support
  4. ID3 / ID3v2 Tag Editing
Low Priority
  1. Gapless playback
  2. Internet Radio Stations
  3. Self-Hosted Streaming Integration
  4. Song Lyrics
Looking to the Future: Smart Home integration such as Philips Hue, Google Cast, Smart Assistant integration, and more. Some of these may be provided at an OS-level (spoiler?), others at an application level.