Building the Next Generation of Personalized Themed Playlists

Published in

Algorithm and Blues

8 min readMay 23, 2018

Claire Dorman, Oriol Nieto, and Erik Schmidt

Music is one of the most powerful and personal human art forms, and our connection to it evolves with our constantly changing circumstances and tastes. For instance, within a single day, we may start with a morning workout that requires a particular energy level, and end it with an evening study session that calls for a specific genre. As a conduit of emotion and culture, music can be naturally organized by mood or genre, and optimized for activity. But crafting the perfect playlist for a given theme can be an arduous task. Additionally, the time and effort needed to keep playlists fresh from week to week are well beyond what the average listener can afford.

This week we launched Personalized Soundtracks, a collection of themed playlists that are automatically created for each Pandora Premium listener. There are dozens of available themes spanning a wide range of moods, activities, and genres such as “Energy,” “Party,” or “Dubstep.” Each playlist is personalized to the listener, both in terms of the selection of the playlist themes and the songs contained within. New playlists are delivered weekly and evolve in synchrony with your musical preferences.

We chose to create algorithmically generated personalized playlists because they are the natural way to marry the novel interactive features of Pandora Premium’s ad-free, on-demand listening with the sophisticated, highly personalized, lean-back listening experience that Pandora’s data science team has perfected over the last decade plus.

Generating millions of engaging playlists each week that update weekly is no easy task, and we had several challenges to overcome in bringing this product to fruition. We decided that:

Each playlist should have an informative name. Instead of a broad, generic “Your Personal Soundtrack,” we created focused playlist experiences around single moods, activities, and genres. Naming an algorithmically generated playlist is complex and risky as it immediately conveys a promise to the listener that every song in that playlist is inspired by the named theme.
Each playlist theme should be relevant to the listener and the tracks within should be entirely personalized selections. For instance, you should only receive a hip hop soundtrack if you are interested in hip hop. Furthermore, based on personal preferences, a given hip hop playlist could skew towards Atlanta-style rap for one listener, whereas another could skew entirely towards West Coast rap.
Playlists should be a major driver in music discovery. The playlists should have a balance of novel and familiar music from a wide range of artists.
We should be able to refresh each playlist weekly with additional high quality, personalized, on-theme music.

In the rest of this post, we will describe how we model the music themes, model the listener to select relevant themes and songs, and balance novelty and familiarity in song selection.

Understanding the Musical Universe: Learning the Moods and Genres of Songs

Delivering a highly-targeted listener experience such as a personalized mood or genre playlist requires deep understanding of each piece of music. Pandora is in a unique position to accomplish this because of our proprietary Music Genome Project (MGP), the largest, richest music content database in the world. Over the last 18 years, Pandora’s in-house music analysis team has hand-labeled a subset of the 60 million tracks of our catalog with over 400 specific attributes that span genre, vocals, emotions, and instrumentation, as well as deeper musicological features that describe in detail the key, mode, tempo, and compositional elements, among many others. We supplement this dataset with third party metadata, which contains a smaller set of tracks annotated by experts with a breadth of interesting mood, activity, and genre tags.

While expert content analysis provides a concrete ground truth, the musical universe is much larger than the tracks contained in the MGP. In order to scale to the tens of millions of tracks that exist we need to leverage powerful machine learning techniques.

This dataset, with a large number of songs with known themes (genre or mood labels), lends itself naturally to a supervised machine learning problem. We can use methods such as deep neural networks and gradient boosted trees to unpack the relationship between the properties of a song and its themes. Then we can use this relationship to predict theme labels for every song in our catalog.

As an example, in Figure 1 we show hundreds of songs plotted in a 2D space annotated with two different moods (Aggressive and Calm). These two sets of songs are clearly separable in this space, and our models automatically learn these relationships across our full set of themes. Thus, by exploiting our unique MGP data and third party sources via multimodal supervised machine learning, we are able to generate theme predictions for our full catalog.

**Figure 1:** 2D projection (using t-SNE) of high-dimensional song embeddings with two hand-labeled moods.

In Figure 2 we illustrate the process of obtaining theme predictions for each song. We leverage not only expert labels from the MGP and other metadata sources, but also the enormous trove of Pandora user interaction data (billions of thumbs, skips, hours spent listening, etc.) gathered over the past decade plus.

**Figure 2:** Pipeline to predict song-level themes from user feedback, the MGP, and third party metadata.

There are many ways to approach a supervised learning problem like this. We take a two-step approach. First, we construct a matrix that encodes user-item preferences. For example, this matrix could have one row per song and one column per user, with the entries being the number of times the listener has heard that song, or if the user has thumbed up or down that song. This matrix is sparse; most listeners have never heard most songs. Then, we employ a state-of-the-art variant on weighted matrix factorization [1], which takes into account the confidence level of each matrix entry and yields an embedding (dense vector) representation of each song that encodes all of the interaction history of users with that song. An embedding is simply a point in space, and the goal of factorization is to put users nearby to the items that they enjoy. Figure 1 depicts a subset of these song embeddings projected down onto a 2D space using t-SNE [2].

In the second stage, these song embeddings become the input to supervised machine learning techniques that exploit the MGP and third party data, as explained above. The final output is a set of mood, genre, and activity tags for each song. Using such embedded song representations as input to ML models is novel by itself, and in internal experiments by Pandora’s Music Information Retrieval team, it has proven to be far superior to using other representations such as raw audio and/or metadata.

A downside to this approach is that the embedding vector — and thus predicted theme labels — can be of low quality or not exist for artists that have an extremely small number of spins on Pandora (i.e., long-tail artists). To address this, we use multimodal convolutional neural networks trained on raw audio content and other metadata sources to predict embeddings [3]. These techniques use powerful estimations of the MGP attributes using deep learning and other supervised methods [4,5]. This allows us to scale this pipeline for the entire Pandora catalog out to tens of millions of tracks.

Modeling Listener Preferences and Generating Delightful Playlists

Now that we understand the moods and genres of the content in our catalog, the next challenge is to connect the dots between the music and each unique listener. There are two tasks here: to identify the best playlist themes (Energy, Rainy Day, Classic Country, Cajun, etc.) for a listener, and then to choose the best songs for each listener-theme combination.

To choose themes, we use an exponentially weighted moving average over one’s listening history to quantify the affinity between each listener and theme. Higher-affinity themes are more highly prioritized for creation and delivery.

Once the music collection has been automatically tagged for each available theme and the listeners are assigned themes based on their preferences, we choose songs for each listener-theme combination (see Figure 3). We employ the ensemble of over 75 song recommendation techniques that Pandora has developed over the last decade to produce the potential set of personalized tracks that will appear on these playlists. These techniques span collaborative filtering, collective intelligence, and content-based techniques such as similarity based on the MGP attributes or convolutional neural networks on raw audio. To be chosen for a playlist, a song must be recommended by the ensemble and fit both the listener’s musical tastes and the theme.

**Figure 3:** Ensembling diagram for computing listener-song-theme scores.

The final set of recommendations are filtered according to the predicted song score for the given theme and listener, which ensures that these playlists will contain the right songs for the predicted theme.

We further ensure theme coherence by exploiting data from the MGP. Through it, we can require that, for example, songs in the Happy soundtrack are tagged with “joyful lyrics” or “major key.” Our expert music analysts, who created the MGP, designed individual filters based on combinations of MGP genes for each of the themes.

After choosing the pool of songs that are relevant to both the listener and the theme, we apply post-weighting to adjust the balance of novelty (songs or artists that are new to the listener) and familiarity (known favorites), to ensure a wide range of artists within each playlist, and to ensure fresh songs week over week. Finally, based on these scores, we sample the final set of songs for the playlist.

Listen for yourself!

Find these playlists in the “Browse” section under “Featured Playlists” on the mobile app. If you love a playlist, share it with your friends, who can listen even if they are not a Premium subscriber. They’ll just watch a quick 15 second ad first. And don’t forget to come back each week to hear refreshed playlists and new themes!

Acknowledgments

This feature was the result of a true cross-functional effort across the company. In particular, this post also highlights work by Devon Bryant, Mohitdeep Singh, Taylor Kirch, Keki Burjorjee, Patrick Marchwiak, Evan Paul, Matthew Prockup, and Amelia Kim-Nybakke.

References

Hu, Y., Volinsky, C., & Koren, Y.. Collaborative Filtering for Implicit Feedback Datasets. Proceedings — IEEE International Conference on Data Mining, ICDM, 263–272, 2008.
Van Der Maaten, L., Hinton, G., Visualizing Data Using t-SNE, Journal of Machine Learning Research (9), 2579–2605, 2008.
Oramas, S., Nieto, O., Barbieri, F., Serra, X., Multi-label Music Genre Classification From Audio, Text, and Images Using Deep Features. Proc. of the 18th International Society for Music Information Retrieval Conference. Suzhou, China, 2017.
Prockup, M., Ehmann, A. F., Gouyon, F., Schmidt, Erik M., and Kim, Y. E. Modeling Musical Rhythm at Scale Using the Music Genome Project. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2015.
Pons, J., Nieto, O., Prockup, M., Schmidt, E., Ehmann, A., Serra, X., End-to-end Learning for Music Audio Tagging At Scale, Machine Learning for Audio Signal Processing Workshop at NIPS, Long Beach, CA, USA, 2017.