How to
Discover Songs.
By Michael Gingras, Kat Bau, and Nikhil Saraf

Our project uses k-means clustering to show how similar music is through statistical modeling. We use k-means clustering to predict whether a person will like a song dependent on certain criteria that the user can input by clustering it with an existing base playlist of new finds. We were able to use the spotify api to pull information about songs and want to use it to recommend songs to users through interaction - they can play with the dimensionals used, the number of clusters involved in the k-means clustering, and can personally add their own playlists or songs by logging in above. If they do not want to personally add music, they can still learn a lot more about the k-means clustering process through the interactive web page and browse music by hovering over the existing data. If you do not have a spotify account, you can one created for this visualization. The username is 3300testaccount and the password is csinfo3300.

Step 1: Dimensions

Plot our base playlist against two dimensions! The default dimensions are Energy and Danceability. To choose your own dimensionals - unclick the existing ones and choose two new ones to the right!

Danceability: Suitability for dancing.
Energy: Intensity and Energy
Acousticness: Measure of track's acousticness
Speechiness: Amount of spoken words
Instrumentalness: Lack of Vocals
Valence: Music positiveness conveyed

Step 2: K Clusters

Choose the number of clusters you would like the songs to cluster in - our default value is 4 clusters. To choose your own number of clusters - use the drop down menu to the right above the graph!

Continue Scrolling

Creating clusters

Clustering or cluster analysis is grouping a set of objects in a way that the objects in the same cluster are more similar to each other than to the other objects in other groups. K-Means Clustering is a popular method of clustering in which data is partitioned into k clusters, where a piece of data belongs to a cluster with the nearest mean that is the “prototype” of the cluster. The mean is called the “centroid”. K-means clustering works by doing two major steps. The first step is having each point assigned to a cluster centroid. Each centroid is then recomputed to averaging the points that were assigned to it. These two steps are repeated until the algorithm “converges” and the centroids stop moving.

Continue Scrolling

Nudging the centroids

Sometimes the centroids need to be nudged. Each data point is assigned to a centroid and the centroid will be recomputed, "nudged", based on the average of the data points in its cluster.

Continue Scrolling

Testing the data

In order to determine your music taste, we need to classify your music into the defined clusters. If you logged in below, your playlists should be shown below. Click one to add the entire playlist to the graph. If you would perfer to add individual songs, use the search bar. Either option will work, but the more data you enter, the better the results will be.



Recommending songs

Songs you like are in the same cluster. We should write more about this.