Machine Learning Fall 2020 Project
This project is maintained by rtiruveedhi3
Revanth Tiruveedhi, Abhishek Mattipalli, Shehruz Khan, Rahul Patel, Jesse Du
Link to Youtube Video: https://www.youtube.com/embed/VEZtcpYOgpE
Music today has become a staple of a user's daily lifestyle. Especially during the pandemic, music platforms have seen a great increase in subscriptions which shows how much people love their music [4]. Platforms such as Spotify, Apple Music, and Soundcloud facilitate access to music have a recommendation system in place that isn’t completely transparent to the listener. For example when you play through a playlist on Spotify, songs which it deems similar to your playlist get played and it also provides recommended songs to add while creating a playlist. While we've been recommended great music on these platforms, there’s been times where we’ve been unsatisfied as well.
Specifically, we have trouble finding new songs because the current music recommendation systems recommend similar artists which tend to be quite popular. While there’s nothing wrong with recommending similar popular songs, it would be nice to get recommended new similar songs that we probably have not heard. Then, we thought it would be useful to create a model that predicts the genre of an input song.
We look to predict a song’s genre and recommend new songs from that genre or new closely related songs based on quantifiable features.
With our project, we require a set of songs and it’s respective features. To elaborate, the dataset of songs along with characteristic features requires a variety of genres. As stated in the earlier sections, the goal is to take an arbitrary song, ‘X’, and be to poll against other datapoints to understand what genre the song is associated with. From there the algorithm will be able to deliver and recommend songs that are most alike song “X”.
Spotify’s platform generates playlists of music based on genres for their audience to discover new music in each of their music categories, such as; Classical, Country, Hip Hop, Rap and more. To collect songs from these different genres, we isolated the top and upcoming playlists of music generated by Spotify. Together, we collected a total of 10 unique genres and took a note of the songs from each playlist:
Each playlist has approximately 70 – 150 songs and the dataset consists of a total of 964 unique songs organized by genre gathered from.
While exploring ways we can accrue characteristic information on Music, we came across a service offered by Spotify and its Developer tools to get Audio Features [6] for a particular track. The API service requires us to suggest a song and upon that it returns quantitative information on the track. More specifically, this chart below well defines the various features.
In terms of cleaning the data, some data features were insignificant to our data set, such as year and date added. All the songs in our data set were added recently and would not indicate any significant added benefit to our recommendation system. However, a datapoint that was altered was the genre of the song. Taking a deeper look at the system, Spotify assigns the genre of the song based on the top genre of the artist for the song. For example, while Travis Scott is known for this contribution to the rap genre and his top genre is rap, any song he makes regardless of its true genre is categorized as rap music. Therefore, to account for the erroneous labeling of genre for individual songs on Spotify’s part, the genre of the song was replaced by the genre category of the playlist the song was derived from.
Doing some further research, Spotify offers an interactive web application [7] that is able to scrape an input playlist based on the playlist ID (Spotify Playlist URI) and return each song’s data features. By utilizing this tool, we were able to input each of the 9 genre unique playlists to gather the features pertinent to this project.
Initially, we thought it would be good to explore our dataset further. For this, we decided to first create a correlation matrix that would show us linear relationships between our features. For example, it is evident that acousticness and energy have a strong negative relationship and loudness and energy have a strong positive relationship. This was expected as acoustic generally means no amplification, so songs with a high acoustic level should have low energy. However, it was slightly surprising to see such moderate to weak correlations between many features which implies that there is not a great deal of overlap in what each feature captures.
Then, we thought it would be useful to look at each specific feature to see how that differs between every genre. We averaged each feature on a genre-by-genre basis and created scatterplots for each feature as shown. This was helpful to see differences between genres in a visual format. For example, classical music frequently appears as an outlier for most features which makes sense as classical music is so different than music from other genres. Hip-hop was also a prominent outlier in speechiness which makes sense as hip-hop songs are lyrically dominant, and often contain a lot of words spoken very quickly. These findings led us to implement PCA, as we wanted to visualize differences or similarities between the “average” song in each genre and songs as a whole.
We decided to use PCA on our dataset because it’s very tough to visualize features in a 10-dimensional space. Our goal with this was not to reduce our feature set down to use further, but purely to view our feature plane reduced to a 2-dimensional level. Because of the variable ranges of each feature, we decided to scale our features utilizing StandardScaler, which standardizes each feature by normalizing it. We then used sklearn’s PCA implementation to fit a 2-component basis to our data set, with genre as the dependent variable. After this, we thought it would be valuable to look at the “average” song in each genre (average principal components within a genre) as well as every song in our dataset plotted.
For this plot titled “2D PCA Visualization for Genre Averages”, we took the “average” song for each genre by averaging each of the two principal components within each genre. For the plot titled “2D PCA Visualization”, we included every single song’s principal component breakdown and it’s associated genre on a single plot. Plots were created using matplotlib’s functionality.
After this, we decided to look at the explained variance ratio for different numbers of principal components and graphed this using matplotlib again. We primarily plotted this to see if dimensionality reduction would be useful to us going forward, but as we touch on in the results section, we concluded that it ultimately wouldn’t be too useful.
For the plot where we averaged the two principal components within a genre, we plot 10 points with 1 for each genre. From this plot, it’s evident that the average classical and metal songs are quite different from the average song from any other genre. For the rest of the genres, it does not look like there’s a significant difference in the feature space between them. This was a little concerning at first, as we thought it would be difficult to predict a genre if the average song from a decent number of genres was fairly similar.
With this plot, we decided to plot every single song with it’s associated genre on one plot. It’s evident again that classical music is much different than the rest of the songs in our data set, and metal music to some extent is quite different. However, there is a large cluster of points that exists with a moshpit of different genres. This again was a bit worrisome, as it seems to us like there is something beyond features of a song that dictates genre which is tough to pick up on. However, we think that a big part of our project is not only predicting the genre but also recommending songs that are close on a feature basis.
Explained variance =
After implementing PCA with 2 components, we wanted to see if dimensionality reduction was something that would be beneficial for us. With 2 components, the transformed dataset captures roughly 50% of variance which isn’t a great amount but it gives some credibility to the 2-dimensional plots that we have. We start to see an asymptote around 8 components, however because 10 features are not that much more computationally expensive, we decided to stick with all 10 features for our data set going forward. Now that we have analyzed relationships between features and genres, our next step is to create the part of our model where we predict an input song’s genre.
With our initial test results, we collected data in a similar fashion to the overall dataset on the Astroworld album. We attempted to showcase initial findings for genre prediction by using the first method that was explained above. To detail the process, we used euclidean distance between each of the songs in the Astroworld album and the average of the features in each genre. This provided us a range of distances at which we concatenated the smallest distances to each song in the album. After running through this process, we have ended with the following results.
While on Spotify’s end has each of these songs listed as rap due to the artist of the album, Travis Scott, heavily makes rap music. The results seem to show a somewhat close relationship to the song features and the song’s true genre (based on our own listening experience and our general experiences with classifying music by listening). While SICKO MODE and STARGAZING are not necessarily best described as metal genres from a qualitative standpoint (our personal experiences with listening to the songs), we wanted to test our closest genre center method against albums from different genres.
We decided to use a country album, If I Know Me by Morgan Wallen, and a classical album, Beethoven: Triple Concerto & Symphony No. 7 (Live) by Yo-Yo Ma, to see how our closest genre center method would predict the genre of each song in these albums.
For the country album, our method classified each song in this album within four main genres: indie, rock, folk & acoustic, and country. From our qualitative listening, these genres are very diverse and have a lot of overlap so it makes sense that there’s a mix of these genres in an album. This album and Astroworld highlight how an album can have very different songs on a quantitative basis, but can be classified as one genre due to other factors that are outside the scope of our data.
For the classic album, our method classified each song in this album as classical. As opposed to Astroworld and If I Know Me, our method classified every song as the same genre. This hints at the quantitative homogeneity of classical music as well as how different classical music is than every other genre in our main dataset, as often there are no vocals in classical music. Then we hypothesized that a homogenous genre on a quantitative basis can be better classified with our model, as opposed to the other two albums that we looked at.
Now we wanted to experiment with our other method of predicting a song’s genre using the nth-nearest neighbors algorithm. The thought process behind this is that if we pass in some arbitrary song, then it would be better to consider the n-closest songs to predict a song’s genre. For example, if a given genre has a wide range of songs on a quantitative basis, then the center of that genre might not be representative of that genre. In this way, we use an unbiased measure with nth-nearest neighbors rather than the mean of a genre with our genre center approach.
As an exploratory step, we wanted to graph how many songs of each genre we had in our main dataset. This is because if we had a lot of songs from one genre relative to another one, our model could be unbalanced as the more songs a genre has, the more likely it is to have outliers that could skew recommendations. From this graph, it is evident that R&B and dance & electronic do not have many songs in their genre relative to other genres so there might not be enough representation of these genres when predicting. However, for the most part, we have about 80 - 120 songs per genre which we considered good enough to move forward.
We used KNeighborsClassifier from sklearn to create all iterations of our KNN models. First, we simply passed in our main dataset and created a test and training set, where genre was the dependent variable and we achieved a weighted accuracy of .57 as printed by our classification report. Taking a closer look, we had the highest precision with classical and metal and the lowest with R&B which was 0. We hypothesize that this is due to very few R&B songs relative to the other genres, and we wanted to improve our weighted accuracy by tuning our parameters.
Now to tune our model, we decided that the number of neighbors, leaf size, and p-metric for distance. After using GridSearchCV to tune these parameters, one interesting takeaway we had was that our parameter was tuned to use Manhattan versus Euclidean distance. Then after tuning our model, we achieved a weighted accuracy of .63, which was a moderate improvement and one that we were happy with. We achieved a precision of 1 for classical, metal, and pop and a 0 for R&B again which hints at the scarce number of R&B songs. Now we wanted to try using StandardScaler to see if we could improve our accuracy further.
Then using StandardScaler, we achieved a weighted average accuracy of .65, which was a very small improvement over our original tuned KNN model. Again, we used GridSearchCV in order to tune this model to see if we could squeeze out any improvement but we got a weighted average accuracy of .65 again. Then we decided that it was not worth to use StandardScaler, as the improvement was very slight and it would not contribute a great deal to our results.
Going forward we use our original tuned KNN model as our go-to when predicting the genre of an input song or album. Although our weighted accuracy might not seem too high and R&B is not represented at all, we think that this hints at the qualitative features that go into a song’s genre that we cannot capture with this model.
First as a sanity check, we wanted to make sure that our KNN model accurately classified the classical album from Yo-Yo Ma that we used earlier. With our tuned KNN model, this predicted the genre of every song in this Yo-Yo Ma album as classical which gave us a good first indication to continue forward.
Then, we decided to predict the genre of each song in Astroworld again using our tuned KNN model, and the results are below. In fact, there were many songs that were now predicted with a different genre than using our center genre method. From a qualitative standpoint, we agree with the majority of these genre changes as we are all big fans of this album. Some changes we would like to point out is that SICKO MODE and STOP TRYING TO BE GOD are not predicted to be metal which is good as these songs are definitely not anywhere near metal. In these judgements that we’re making, we want to point out that these are qualitative judgements that we are trying to resolve with a quantitative system. Now that we have predicted our genres using our KNN model, we want to give songs that are in the same genre as the predicted one.
To reiterate the previous section, we will be using the hypertuned version of the KNN model. With the results of the KNN algorithm, the input song will be able to return a genre that best relates to what the song closely aligns with. Moving forward, this will be our justified genre for every input given.
The section will discuss the process in which we chose to recommend songs given that we are using the approach using the predicted genre from the hypertuned KNN model. Given an input song “X”, we will use our KNN model to output a genre. Then within our entire data set of songs, we keep only the songs that are in the outputted genre. In doing this, we are recommending songs that are in the same genre as the input song has been calculated to be. Finally, through using euclidean distance we are taking the top 10 closest by distance songs to the input song in the genre. This procedure is done through the sklearn library’s access of euclidean distance. Finally the output will be 10 songs that are similar to the input song.
Classical -
To show this in practice, we have found and displayed the results for a song in the classical album (Yo-Yo Ma - Triple Concerto in C Major, Op. 56: 1). We picked to use Classical songs since we have found the accuracy for the classical albums to be effective in the KNN and wanted to represent an accurate portrayal of a song in the recommending system. Through this process, the results are as shown:
The recommendations at a high level from this KNN model seems to be effective in pairing closely related songs in terms of the features that we were able to submit. Therefore, based on closeness of the features through the euclidean distances model, we were able to access that the Classical song was able to return closely aligned music in the same genre.
Astroworld -
With Astroworld, as we discussed prior, there seems to be a “hit-or-miss” situation with the labeling of the songs through the KNN model. While some songs were best portrayed with pop/ hip-hop genres through intuition and self classification, there were others that did not fit the model as well. For example, the song “SKELETONS” was classified as rock and this is something by listening to the song that can be understood that it is not a “rock” genre song. We wanted to show two cases, one which represents a “good” assignment of genre and a “bad” assignment.
With the “SKELETONS” assignment as Rock genre:
With the “ASTROTHUNDER” assignment as Hip-Hop genre:
Reflecting on these assignments and the resulting pairing of the recommended songs, personally our team can confirm that for “SKELETONS” there are significantly more inconsistencies while the recommendations for “ASTROTHUNDER” were much less. Something to note in this process is that the genre assignment for the recommended songs are not effective at time and therefore the recommended songs are not the best categorically or genre.
We think that our model performs well given very distinct genres such as classical or metal, which indicates how quantitatively songs can be very similar across genres. With classical and metal, there are very obvious differences that can be quantified but with other more similar genres, there are qualitative differences that listeners pick up on.
Then, we think that a song is much more than its quantifiable features - listening to a song is an experience that can be described much more easily with thoughts and feelings, rather than numbers.
Initially, our goal with our project proposal was to create clusters of songs based on features of the songs rather than focus on their genre. We thought that songs of the same genre were often very different, so it would be good to find clusters of songs. This section will include a discussion on the progress we made on that initial goal, and why we decided to pivot.
We used a dataset from Kaggle that included a list of top songs from 2010-2019 which included the same 10 features that our new dataset contains. We thought it would be good to visualize this data, and maybe cut down on the number of dimensions, so we used both PCA and tSNE to experiment with both. With PCA, we used StandardScaler on the dataset and fit our PCA with 2 components. The results from this were slightly skewed as the top genre in our data set, “dance pop”, had more than 300 songs while the second most populous genre in our data set only had 60 songs. We then decided to plot a random sample of the principal components of 15 songs for each of the top three genres, and we saw from this that there was no real pattern in principal components within songs of the same genre. This was highlighted when we decided to plot the principal components of two genres which we thought would be very different, “dance pop” and “british soul” - however there was sizable overlap between the principal components and no real discernable pattern for either genre.
We then decided to plot explained variance for 1 through 10 principal components, and had that two principal components gets us about 40% of our total explained variance which isn’t great. What we got from this was that 10 components would not be that computationally expensive so we did not have to worry about dimensionality reduction. We also decided to use tSNE to further test our hypothesis, and the plot below shows that the visualization we obtained with tSNE was still random and not really dependent on genre.
Our next step was then to create clusters based on the features of our dataset, rather than relying on genre. We decided that DBSCAN would be best as we wanted the ability to hypertune our model and not rely on concentric measurements only, and got to work hand tuning the model. We decided to use the Nearest Neighbors algorithm included in sklearn to tune our epsilon parameter, and we used the 10th nearest neighbor as our benchmark. We found a knee point detection package in Python, called “kneed”, that would find the elbow point for our nearest neighbor plot and we used this to find that our epsilon should roughly equal 3. Then, we got to work tuning our minPts parameter by hand. We found that basically any value of minPts would lead to a vast majority of the dataset to be included in the same cluster, with the other two to three clusters being much smaller and only having a few songs. This was a tough problem to overcome, and after consulting with a TA, we came to the conclusion that it would be best for us to pivot by compiling a dataset on our own and shifting the focus of our project.
We then decided to continue the predictive portion of our original idea, but this time we decided to focus on genre as well. Us thinking that features could substitute for genre of a song was a bit naive, as we all know that two songs with a very similar feature profile could belong in two completely different genres intuitively.
Regarding future steps in our project now, we touched on them in our methods section. Our goal is to create a model that allows us to input a song, predict its genre, and recommend new songs from that genre or very similar songs on a feature basis. We think that this would be very useful in our own lives and our friends lives, as we all experience the struggle of listening to the same couple of songs on repeat.
[1]https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/
[2]https://www.kaggle.com/leonardopena/top-spotify-songs-from-20102019-by-year
[3]https://kwellesly.github.io/ML4Anime/
[5]https://github.gatech.edu/pages/bpatil7/music.generation/#ref2
[6]https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/
[7]http://organizeyourmusic.playlistmachinery.com/