How to use R to remove audiobooks from your Spotify liked songs

I walk through my process of using R’s purrr, spotifyr and httr to remove unwanted content from my Spotify liked songs playlist.

Looking for the code? You can find it in this GitLab Repo!

I’m not a good Spotify citizen: I can listen to the same Spotify-curated “This is..” artist playlist for weeks and I rarely venture out to discover new music. And because I’m too lazy to curate and maintain my own playlists, I often lose track of songs / artists I enjoyed listening to at some point.

So I was over the moon when I realized a couple of weeks ago that there was a “Liked Songs” playlist

Someone please give an “Introduction to Spotify” course…I need it.

¹ that I could “fill” by simply “hearting” a song. And even better, the playlist already contained over 1400 songs that I apparently had added … somehow.

I instantly added some more songs and started listening to the playlist while coding at the CorrelAid website. It was working - until my flow was interrupted by a narrator voice reading something… What?? I opened my Spotify app and unliked the “song”. But it happened again and again - apparently three whole audiobooks - each with > 40 tracks - had found their way into my “Liked Songs”.

I tried to solve the problem using the app: I liked and unliked the audiobooks but nothing worked - the “songs” did not disappear from my “Liked Songs” playlist. So of course, instead of “unliking” >200 songs by hand in the app, I decided to use my programming skills and the Spotify Web API to (semi-)automate the problem away.

First, of course, I loaded some packages. As always with R, there’s already an excellent API wrapper package for the Spotify Web API, the spotifyr 📦.

# spotify api package
library(spotifyr)

# usual suspects
library(dplyr)
library(purrr)
library(readr)
# for my own custom function to remove songs from playlist
library(httr)
library(usethis)
library(glue)

Get “Liked Songs”

First, I used the spotifyr 📦 to get the liked songs playlist from Spotify. To do so, I followed the instructions from the GitHub README to create an app and obtain the client id and client secret. I stored them in a local .Renviron file:

SPOTIFY_CLIENT_ID="myclientid"
SPOTIFY_CLIENT_SECRET="myclientsecret"

I load the contents from the .Renviron file with the baseR function readRenviron and use spotifyr to obtain the access token:

readRenviron(".Renviron")
access_token <- spotifyr::get_spotify_access_token()

API Limits & Pagination

The access token is the “key” to interact with the Spotify API, so I was good to go. The spotifyr package thankfully offers a function for almost every endpoint of the Spotify API, so spotifyr::get_my_saved_tracks exists.

Unfortunately, most endpoints do not return all items at once when called, but only up to a certain limit. In the case of spotifyr::get_my_saved_tracks, the API can only return a maximum of 50 tracks in response to a call. To work around this limit, I made use of the offset parameter. The offset tells the API “where to start” with returning the next 50 items. From the Spotify documentation:

limit: Optional. The maximum number of objects to return. Default: 20. Minimum: 1. Maximum: 50.

offset: Optional. The index of the first object to return. Default: 0 (i.e., the first object). Use with limit to get the next set of objects.

(source)

So instead of making one “big” call to get all saved tracks, I needed to make several smaller calls, while increasing the offset until I had “reached” the total number of tracks.

Conceptually:

call 1: offset = 0, limit = 50 -> gets tracks 1-50
call 2: offset = 50, limit = 50 -> gets tracks 51-100
call 3: offset = 100, limit = 50 -> gets tracks 101 - 150
continue until the offset is larger than the total number of liked tracks

In the context of APIs, such a pattern is called pagination.

To implement the pagination I needed to know the total number of tracks in the “liked songs” playlist. I could get it from the API by making a call to the me/tracks endpoint using the get_my_saved_tracks function with the include_meta_info parameter set to TRUE. This returned a total number of saved tracks of 1516.

# get total number of saved tracks and calculate the offsets (can only get 50 tracks with a call)
meta <- spotifyr::get_my_saved_tracks(limit = 50, offset = 0, include_meta_info = TRUE)
total <- meta$total # total number of saved tracks
total # 1516

Now, I could’ve implemented the conceptual pagination pattern using a while or until loop - after all the “continue until” bullet point totally reads like it implies a while / until loop. However, I decided to use a functional programming solution instead. Why? While while (haha!) loops are totally a-okay, writing functions forces me think more about my code. You can read more about this in Advanced R.

I have a blog post draft about this topic…I hope I get around to publishing it at some point!

Functional approach

For my “functional approach” to work, I needed to calculate the vector of offsets ahead of time. In order to do so, I made use of the seq function which “generate[s] regular sequences”.

offsets <- seq(0, total + 50, 50)
offsets

Because I couldn’t run the API call above for knitting this blog post (I’ve already deleted the relevant tracks), here’s a mockup with the total number of tracks hardcoded:

# for the blog post
total_fake <- 1516
offsets_fake <- seq(0, total_fake + 50, 50)
offsets_fake

 [1]    0   50  100  150  200  250  300  350  400  450  500  550  600
[14]  650  700  750  800  850  900  950 1000 1050 1100 1150 1200 1250
[27] 1300 1350 1400 1450 1500 1550

Adding 50 to the total number of tracks is required because otherwise, the sequence would stop at 1500.

I then defined a simple wrapper function that takes an offset as a parameter and feeds it to the get_my_saved_tracks function from the spotifyr package. I also added some simple logging. I map the function to my offsets vector using map_dfr. This function does two things:

takes each offset from the offsets vector and feeds it to my get_chunk function, essentially working like a for loop and implementing our “conceptual” pagination from above.
If I had used a simple map the return value would’ve been a list of data frames. In contrast to map, map_dfr binds all the data frames together into one big data frame.

# define function to get 50 saved tracks depending on offset 
get_chunk <- function(offset) {
  new <- spotifyr::get_my_saved_tracks(limit = 50, offset = offset, include_meta_info = FALSE)
  usethis::ui_done(glue::glue("got from offset: {offset}"))
  return(new)
}

# map over offsets, bind to dataframe 
all_tracks <- purrr::map_dfr(offsets, get_chunk)

Finally, I wrote the data to disk to make sure that I could use them for this blog post :wink:.

# write to disk
readr::write_rds(all_tracks, "saved_tracks.rds")

Find the audiobooks 🔎

Here is a glimpse at the data:

all_tracks <- readr::read_rds("saved_tracks.rds")
glimpse(all_tracks)

Rows: 1,516
Columns: 30
$ added_at                           <chr> "2020-08-11T14:00:47Z", …
$ track.artists                      <list> [<data.frame[2 x 6]>, <…
$ track.available_markets            <list> [<"AD", "AE", "AL", "AR…
$ track.disc_number                  <int> 1, 1, 1, 1, 1, 1, 1, 1, …
$ track.duration_ms                  <int> 296213, 186101, 178666, …
$ track.explicit                     <lgl> FALSE, FALSE, FALSE, FAL…
$ track.href                         <chr> "https://api.spotify.com…
$ track.id                           <chr> "6L5iRhYgVPaEFqmGaVxWrN"…
$ track.is_local                     <lgl> FALSE, FALSE, FALSE, FAL…
$ track.name                         <chr> "Хочу перемен", "Cliff's…
$ track.popularity                   <int> 41, 26, 0, 83, 73, 21, 2…
$ track.preview_url                  <chr> "https://p.scdn.co/mp3-p…
$ track.track_number                 <int> 1, 3, 9, 1, 1, 17, 3, 18…
$ track.type                         <chr> "track", "track", "track…
$ track.uri                          <chr> "spotify:track:6L5iRhYgV…
$ track.album.album_type             <chr> "album", "single", "albu…
$ track.album.artists                <list> [<data.frame[2 x 6]>, <…
$ track.album.available_markets      <list> [<"AD", "AE", "AL", "AR…
$ track.album.href                   <chr> "https://api.spotify.com…
$ track.album.id                     <chr> "7trila5XMOsUUkcujWqzcn"…
$ track.album.images                 <list> [<data.frame[3 x 3]>, <…
$ track.album.name                   <chr> "Виктор Цой 55 (Выпуск в…
$ track.album.release_date           <chr> "2017-06-21", "2016-03-2…
$ track.album.release_date_precision <chr> "day", "day", "day", "da…
$ track.album.total_tracks           <int> 55, 5, 27, 1, 2, 23, 5, …
$ track.album.type                   <chr> "album", "album", "album…
$ track.album.uri                    <chr> "spotify:album:7trila5XM…
$ track.album.external_urls.spotify  <chr> "https://open.spotify.co…
$ track.external_ids.isrc            <chr> "FR59R1744876", "USAT216…
$ track.external_urls.spotify        <chr> "https://open.spotify.co…

track.type or track.album.type seemed like an obvious choice to find out which tracks belonged to an audiobook. However:

print(table(all_tracks$track.type))


track 
 1516

table(all_tracks$track.album.type)


album 
 1516

Unfortunately, it seems like the data did not offer a indicator for whether the track belonged to an audiobook or not - in Spotify’s eyes, there’s no difference between music albums and audiobooks.

Hence, it was time for a good old heuristic: I decided to look at the album lenghts because usually, audiobooks are quite long compared to “normal” music albums. Tidyverse to the rescue:

# group by album, sort by duration 
all_tracks_by_album <- all_tracks %>% 
  group_by(track.album.id, track.album.name) %>%  # group by album id and album name (only the id would be necessary, but i wanted to keep both)
  summarize(total_duration_album =  sum(track.duration_ms)) %>% # sum up all the tracks
  arrange(desc(total_duration_album)) # sort descending 

# determine what are the audiobooks by looking at the data
# the longest albums should be the audiobooks 
knitr::kable(head(all_tracks_by_album, 10))

track.album.id	track.album.name	total_duration_album
2Hso705hbz70g2ywUyBSXK	Über uns der Himmel, unter uns das Meer (Gekürzte Lesung)	29832918
6DBCctTaza5w2rWrkK1I1D	Inferno	25746930
00fshMmQEnqmP8Gja8aEe4	Das Joshua-Profil	23468946
1kLscSc6HEAonyvwbZO3XK	Love Actually	11707091
1716XPsNUeHok477AtTRhX	Best of Classical - Die 50 größten Werke der Klassik	11127462
7xl50xr9NDkd3i2kBbzsNZ	Stadium Arcadium	9103257
3CBMpoI2vZlKXs3wgnNWGn	20 The Greatest Hits	8786053
4jytUDY4LPrvwkReW4S2gE	Greatest Hits 1992-2010 Es asì	8052950
2OXv5X4J2y9CQ7eVSNEHad	Greatest Hits 1992-2010 E da qui	8040528
3dVI5svXoD3X3HR2Y4P1qt	Projekt Seerosenteich (Live - Deluxe Version)	7171975

I instantly recognized the first three entries as the annoying audiobooks that had kept popping up in my “Liked songs” playlist. :tada:

Remove the audiobooks from the Liked Songs

To remove the tracks from the audiobooks, I needed all their IDs. First, I extracted the album ids from the audiobooks:

# select the audiobooks / the n longest albums and extract the ids
audiobooks_id <- all_tracks_by_album %>% 
  head(3) %>% # from the manual investigation, i had three audiobooks
  pull(track.album.id)
audiobooks_id

[1] "2Hso705hbz70g2ywUyBSXK" "6DBCctTaza5w2rWrkK1I1D"
[3] "00fshMmQEnqmP8Gja8aEe4"

Then, I filtered the original all_tracks data frame for those albums to get all the track IDs that I wanted to delete:

# filter tracks belonging to audiobooks and extract the ids we need to delete
to_delete <- all_tracks %>% 
  filter(track.album.id %in% audiobooks_id)
to_delete_ids <- to_delete$track.id # extract the ids
length(to_delete_ids)

[1] 327

Now, the only thing left was the actual deletion of the tracks from my “Liked Songs” playlist. Unfortunately, this is not in the scope of spotifyr so I had a look at the relevant API docs, took inspiration from spotifyr source code (for the authentication part) and implemented a small function quick-and-dirty style - without any error handling or retry mechanisms 🙈 :

# define function to delete ids (limited to 50 at a time) -> not part of spotifyr
# cf https://developer.spotify.com/documentation/web-api/reference/library/remove-tracks-user/
delete_ids <- function(ids) {
  httr::DELETE("https://api.spotify.com/v1/me/tracks", config = httr::config(token = spotifyr::get_spotify_authorization_code()),
             query = list(ids=paste0(ids, collapse = ",")))
}

Because this endpoint was limited as well, I had to use some dark stackoverflow magic to split the to_delete_ids vector with 327 track ids into 7 chunks of size 50:

# can only delete 50 at once, so split
del_groups <- split(to_delete_ids, ceiling(seq_along(to_delete_ids) / 50 )) # from https://stackoverflow.com/a/3321659 
str(del_groups)

List of 7
 $ 1: chr [1:50] "0r8CnP1ri7Op1K6pYBAIIS" "04cWxUJpQNmQzPx3oerRIe" "0TcwSjGcRLP0qANZ0pn5S2" "0K0UOpucV0mUMEVpvVioqI" ...
 $ 2: chr [1:50] "4999R4NWDhX4dxHuexgRQk" "1b5t5yfZL0gtw7kBO37Cag" "3E46vLZaOoiGVGwLEnfqae" "44qSTlrLcvZpZL5bipSU6g" ...
 $ 3: chr [1:50] "5dcDqtSNzKmi7X6leDTGji" "7HTLLCS0GuEFt6mZksBNPK" "7ddOMjwAggaPDArUtwjbgz" "5DANy9Hla7MaWtJQpdVPVI" ...
 $ 4: chr [1:50] "2vcGDeYkUJej4R7hUkUgYd" "4SqR4H9THJNTB0JQMcipwy" "2O6MlAfSk6I070p0zvV7qr" "2kM5gjsLeaeHZFTlDsYqBC" ...
 $ 5: chr [1:50] "0qWX4kYBRQYr0HjDAIgIHh" "0Nvzj7ma1dDrxREoYv5cpb" "0XaoLn8FXHGb1fhytPAtcl" "0mpjzZA3jjzHOvvIewzixs" ...
 $ 6: chr [1:50] "4IiEy7SnLy8jVwaxNHExsU" "1kcPZTzNfOp4vCa4fvWHJa" "1ZikjFqoNWhPjRrJBwyBPU" "2sQoA08e4hezq0rbpRaFqf" ...
 $ 7: chr [1:27] "4gwPvHVH7Rvz6nZ52CTio5" "5Mmn5wr79RVIlXjSR43Tep" "6E6gHBhv9t8wvdssiTzzmb" "5hSjrtbCWIWeLFmkLwNfOf" ...

Finally, I used map to apply my function to the chunks:

# apply function
del_groups %>% 
  purrr::map(delete_ids)

Thankfully, it worked out of the box, those annoying audiobook tracks were gone from my “Liked Songs” playlist and I was a happy coder again: 🙏

via GIPHY

What’s next?

I definitely want to “optimize” my Liked Songs playlist even further. For example, there are a lot of complete albums in it which are artifacts from liking whole albums instead of individual songs. Ideally, I would like to have access to the stats on how often I listened to each song so that I could just pick out the songs that made me like the albums but it seems like there is no way to access those stats because of GDPR. So I might end up building a sort of interactive CLI with usethis which allows me to quickly accept or reject songs from the playlist. Maybe I can even integrate this with the player so that I could listen to the song for some seconds before making my decision.

Or…I might become a lazy Spotify citizen again and drop this whole project 🤷 🙈

In either case…until next time: Keep coding ❤️

The code

The complete code is here. I adapted it slightly for this blog post but it should (hopefully) work.

Footnotes

Someone please give an “Introduction to Spotify” course…I need it.[↩]
I have a blog post draft about this topic…I hope I get around to publishing it at some point![↩]

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://gitlab.com/friep/blog, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".