I walk through my process of using R’s purrr, spotifyr and httr to remove unwanted content from my Spotify liked songs playlist.
Looking for the code? You can find it in this GitLab Repo!
I’m not a good Spotify citizen: I can listen to the same Spotify-curated “This is..” artist playlist for weeks and I rarely venture out to discover new music. And because I’m too lazy to curate and maintain my own playlists, I often lose track of songs / artists I enjoyed listening to at some point.
So I was over the moon when I realized a couple of weeks ago that there was a “Liked Songs” playlist1 that I could “fill” by simply “hearting” a song. And even better, the playlist already contained over 1400 songs that I apparently had added … somehow.
I instantly added some more songs and started listening to the playlist while coding at the CorrelAid website. It was working - until my flow was interrupted by a narrator voice reading something… What?? I opened my Spotify app and unliked the “song”. But it happened again and again - apparently three whole audiobooks - each with > 40 tracks - had found their way into my “Liked Songs”.
I tried to solve the problem using the app: I liked and unliked the audiobooks but nothing worked - the “songs” did not disappear from my “Liked Songs” playlist. So of course, instead of “unliking” >200 songs by hand in the app, I decided to use my programming skills and the Spotify Web API to (semi-)automate the problem away.
First, of course, I loaded some packages. As always with R, there’s already an excellent API wrapper package for the Spotify Web API, the spotifyr
📦.
First, I used the spotifyr
📦 to get the liked songs playlist from Spotify. To do so, I followed the instructions from the GitHub README to create an app and obtain the client id and client secret. I stored them in a local .Renviron
file:
SPOTIFY_CLIENT_ID="myclientid"
SPOTIFY_CLIENT_SECRET="myclientsecret"
I load the contents from the .Renviron
file with the baseR function readRenviron
and use spotifyr
to obtain the access token:
readRenviron(".Renviron")
access_token <- spotifyr::get_spotify_access_token()
The access token is the “key” to interact with the Spotify API, so I was good to go. The spotifyr
package thankfully offers a function for almost every endpoint of the Spotify API, so spotifyr::get_my_saved_tracks
exists.
Unfortunately, most endpoints do not return all items at once when called, but only up to a certain limit. In the case of spotifyr::get_my_saved_tracks
, the API can only return a maximum of 50 tracks in response to a call. To work around this limit, I made use of the offset parameter. The offset tells the API “where to start” with returning the next 50 items. From the Spotify documentation:
limit: Optional. The maximum number of objects to return. Default: 20. Minimum: 1. Maximum: 50.
offset: Optional. The index of the first object to return. Default: 0 (i.e., the first object). Use with limit to get the next set of objects.
(source)
So instead of making one “big” call to get all saved tracks, I needed to make several smaller calls, while increasing the offset until I had “reached” the total number of tracks.
Conceptually:
In the context of APIs, such a pattern is called pagination.
To implement the pagination I needed to know the total number of tracks in the “liked songs” playlist. I could get it from the API by making a call to the me/tracks
endpoint using the get_my_saved_tracks
function with the include_meta_info
parameter set to TRUE
. This returned a total number of saved tracks of 1516.
# get total number of saved tracks and calculate the offsets (can only get 50 tracks with a call)
meta <- spotifyr::get_my_saved_tracks(limit = 50, offset = 0, include_meta_info = TRUE)
total <- meta$total # total number of saved tracks
total # 1516
Now, I could’ve implemented the conceptual pagination pattern using a while or until loop - after all the “continue until” bullet point totally reads like it implies a while / until loop. However, I decided to use a functional programming solution instead. Why? While while (haha!) loops are totally a-okay, writing functions forces me think more about my code. You can read more about this in Advanced R.2
For my “functional approach” to work, I needed to calculate the vector of offsets ahead of time. In order to do so, I made use of the seq
function which “generate[s] regular sequences”.
offsets <- seq(0, total + 50, 50)
offsets
Because I couldn’t run the API call above for knitting this blog post (I’ve already deleted the relevant tracks), here’s a mockup with the total number of tracks hardcoded:
# for the blog post
total_fake <- 1516
offsets_fake <- seq(0, total_fake + 50, 50)
offsets_fake
[1] 0 50 100 150 200 250 300 350 400 450 500 550 600
[14] 650 700 750 800 850 900 950 1000 1050 1100 1150 1200 1250
[27] 1300 1350 1400 1450 1500 1550
Adding 50 to the total number of tracks is required because otherwise, the sequence would stop at 1500.
I then defined a simple wrapper function that takes an offset as a parameter and feeds it to the get_my_saved_tracks
function from the spotifyr
package. I also added some simple logging. I map the function to my offsets vector using map_dfr
. This function does two things:
get_chunk
function, essentially working like a for loop and implementing our “conceptual” pagination from above.map
the return value would’ve been a list of data frames. In contrast to map
, map_dfr
binds all the data frames together into one big data frame.# define function to get 50 saved tracks depending on offset
get_chunk <- function(offset) {
new <- spotifyr::get_my_saved_tracks(limit = 50, offset = offset, include_meta_info = FALSE)
usethis::ui_done(glue::glue("got from offset: {offset}"))
return(new)
}
# map over offsets, bind to dataframe
all_tracks <- purrr::map_dfr(offsets, get_chunk)
Finally, I wrote the data to disk to make sure that I could use them for this blog post :wink:.
# write to disk
readr::write_rds(all_tracks, "saved_tracks.rds")
Here is a glimpse
at the data:
Rows: 1,516
Columns: 30
$ added_at <chr> "2020-08-11T14:00:47Z", …
$ track.artists <list> [<data.frame[2 x 6]>, <…
$ track.available_markets <list> [<"AD", "AE", "AL", "AR…
$ track.disc_number <int> 1, 1, 1, 1, 1, 1, 1, 1, …
$ track.duration_ms <int> 296213, 186101, 178666, …
$ track.explicit <lgl> FALSE, FALSE, FALSE, FAL…
$ track.href <chr> "https://api.spotify.com…
$ track.id <chr> "6L5iRhYgVPaEFqmGaVxWrN"…
$ track.is_local <lgl> FALSE, FALSE, FALSE, FAL…
$ track.name <chr> "Хочу перемен", "Cliff's…
$ track.popularity <int> 41, 26, 0, 83, 73, 21, 2…
$ track.preview_url <chr> "https://p.scdn.co/mp3-p…
$ track.track_number <int> 1, 3, 9, 1, 1, 17, 3, 18…
$ track.type <chr> "track", "track", "track…
$ track.uri <chr> "spotify:track:6L5iRhYgV…
$ track.album.album_type <chr> "album", "single", "albu…
$ track.album.artists <list> [<data.frame[2 x 6]>, <…
$ track.album.available_markets <list> [<"AD", "AE", "AL", "AR…
$ track.album.href <chr> "https://api.spotify.com…
$ track.album.id <chr> "7trila5XMOsUUkcujWqzcn"…
$ track.album.images <list> [<data.frame[3 x 3]>, <…
$ track.album.name <chr> "Виктор Цой 55 (Выпуск в…
$ track.album.release_date <chr> "2017-06-21", "2016-03-2…
$ track.album.release_date_precision <chr> "day", "day", "day", "da…
$ track.album.total_tracks <int> 55, 5, 27, 1, 2, 23, 5, …
$ track.album.type <chr> "album", "album", "album…
$ track.album.uri <chr> "spotify:album:7trila5XM…
$ track.album.external_urls.spotify <chr> "https://open.spotify.co…
$ track.external_ids.isrc <chr> "FR59R1744876", "USAT216…
$ track.external_urls.spotify <chr> "https://open.spotify.co…
track.type
or track.album.type
seemed like an obvious choice to find out which tracks belonged to an audiobook. However:
Unfortunately, it seems like the data did not offer a indicator for whether the track belonged to an audiobook or not - in Spotify’s eyes, there’s no difference between music albums and audiobooks.
Hence, it was time for a good old heuristic: I decided to look at the album lenghts because usually, audiobooks are quite long compared to “normal” music albums. Tidyverse to the rescue:
# group by album, sort by duration
all_tracks_by_album <- all_tracks %>%
group_by(track.album.id, track.album.name) %>% # group by album id and album name (only the id would be necessary, but i wanted to keep both)
summarize(total_duration_album = sum(track.duration_ms)) %>% # sum up all the tracks
arrange(desc(total_duration_album)) # sort descending
# determine what are the audiobooks by looking at the data
# the longest albums should be the audiobooks
knitr::kable(head(all_tracks_by_album, 10))
track.album.id | track.album.name | total_duration_album |
---|---|---|
2Hso705hbz70g2ywUyBSXK | Über uns der Himmel, unter uns das Meer (Gekürzte Lesung) | 29832918 |
6DBCctTaza5w2rWrkK1I1D | Inferno | 25746930 |
00fshMmQEnqmP8Gja8aEe4 | Das Joshua-Profil | 23468946 |
1kLscSc6HEAonyvwbZO3XK | Love Actually | 11707091 |
1716XPsNUeHok477AtTRhX | Best of Classical - Die 50 größten Werke der Klassik | 11127462 |
7xl50xr9NDkd3i2kBbzsNZ | Stadium Arcadium | 9103257 |
3CBMpoI2vZlKXs3wgnNWGn | 20 The Greatest Hits | 8786053 |
4jytUDY4LPrvwkReW4S2gE | Greatest Hits 1992-2010 Es asì | 8052950 |
2OXv5X4J2y9CQ7eVSNEHad | Greatest Hits 1992-2010 E da qui | 8040528 |
3dVI5svXoD3X3HR2Y4P1qt | Projekt Seerosenteich (Live - Deluxe Version) | 7171975 |
I instantly recognized the first three entries as the annoying audiobooks that had kept popping up in my “Liked songs” playlist. :tada:
To remove the tracks from the audiobooks, I needed all their IDs. First, I extracted the album ids from the audiobooks:
# select the audiobooks / the n longest albums and extract the ids
audiobooks_id <- all_tracks_by_album %>%
head(3) %>% # from the manual investigation, i had three audiobooks
pull(track.album.id)
audiobooks_id
[1] "2Hso705hbz70g2ywUyBSXK" "6DBCctTaza5w2rWrkK1I1D"
[3] "00fshMmQEnqmP8Gja8aEe4"
Then, I filtered the original all_tracks
data frame for those albums to get all the track IDs that I wanted to delete:
# filter tracks belonging to audiobooks and extract the ids we need to delete
to_delete <- all_tracks %>%
filter(track.album.id %in% audiobooks_id)
to_delete_ids <- to_delete$track.id # extract the ids
length(to_delete_ids)
[1] 327
Now, the only thing left was the actual deletion of the tracks from my “Liked Songs” playlist. Unfortunately, this is not in the scope of spotifyr
so I had a look at the relevant API docs, took inspiration from spotifyr
source code (for the authentication part) and implemented a small function quick-and-dirty style - without any error handling or retry mechanisms 🙈 :
# define function to delete ids (limited to 50 at a time) -> not part of spotifyr
# cf https://developer.spotify.com/documentation/web-api/reference/library/remove-tracks-user/
delete_ids <- function(ids) {
httr::DELETE("https://api.spotify.com/v1/me/tracks", config = httr::config(token = spotifyr::get_spotify_authorization_code()),
query = list(ids=paste0(ids, collapse = ",")))
}
Because this endpoint was limited as well, I had to use some dark stackoverflow magic to split the to_delete_ids
vector with 327 track ids into 7 chunks of size 50:
# can only delete 50 at once, so split
del_groups <- split(to_delete_ids, ceiling(seq_along(to_delete_ids) / 50 )) # from https://stackoverflow.com/a/3321659
str(del_groups)
List of 7
$ 1: chr [1:50] "0r8CnP1ri7Op1K6pYBAIIS" "04cWxUJpQNmQzPx3oerRIe" "0TcwSjGcRLP0qANZ0pn5S2" "0K0UOpucV0mUMEVpvVioqI" ...
$ 2: chr [1:50] "4999R4NWDhX4dxHuexgRQk" "1b5t5yfZL0gtw7kBO37Cag" "3E46vLZaOoiGVGwLEnfqae" "44qSTlrLcvZpZL5bipSU6g" ...
$ 3: chr [1:50] "5dcDqtSNzKmi7X6leDTGji" "7HTLLCS0GuEFt6mZksBNPK" "7ddOMjwAggaPDArUtwjbgz" "5DANy9Hla7MaWtJQpdVPVI" ...
$ 4: chr [1:50] "2vcGDeYkUJej4R7hUkUgYd" "4SqR4H9THJNTB0JQMcipwy" "2O6MlAfSk6I070p0zvV7qr" "2kM5gjsLeaeHZFTlDsYqBC" ...
$ 5: chr [1:50] "0qWX4kYBRQYr0HjDAIgIHh" "0Nvzj7ma1dDrxREoYv5cpb" "0XaoLn8FXHGb1fhytPAtcl" "0mpjzZA3jjzHOvvIewzixs" ...
$ 6: chr [1:50] "4IiEy7SnLy8jVwaxNHExsU" "1kcPZTzNfOp4vCa4fvWHJa" "1ZikjFqoNWhPjRrJBwyBPU" "2sQoA08e4hezq0rbpRaFqf" ...
$ 7: chr [1:27] "4gwPvHVH7Rvz6nZ52CTio5" "5Mmn5wr79RVIlXjSR43Tep" "6E6gHBhv9t8wvdssiTzzmb" "5hSjrtbCWIWeLFmkLwNfOf" ...
Finally, I used map
to apply my function to the chunks:
# apply function
del_groups %>%
purrr::map(delete_ids)
Thankfully, it worked out of the box, those annoying audiobook tracks were gone from my “Liked Songs” playlist and I was a happy coder again: 🙏
I definitely want to “optimize” my Liked Songs playlist even further. For example, there are a lot of complete albums in it which are artifacts from liking whole albums instead of individual songs. Ideally, I would like to have access to the stats on how often I listened to each song so that I could just pick out the songs that made me like the albums but it seems like there is no way to access those stats because of GDPR. So I might end up building a sort of interactive CLI with usethis
which allows me to quickly accept or reject songs from the playlist. Maybe I can even integrate this with the player so that I could listen to the song for some seconds before making my decision.
Or…I might become a lazy Spotify citizen again and drop this whole project 🤷 🙈
In either case…until next time: Keep coding ❤️
The complete code is here. I adapted it slightly for this blog post but it should (hopefully) work.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://gitlab.com/friep/blog, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".