My Soundtrack
January 2021

Music is very important to me and plays a large role in my life, so I decided to combine my two worlds and use predictive modeling in Python to analyze my music tastes and create a recommendation system based on my favorite tracks.

I curated a training data set by reading the audio features for all the songs in my Spotify library; I marked all of my favorite songs with a “1”, and the rest with a “0”. The test data set consisted of songs from three Spotify playlists that I regularly listen to. After exploring the data further, I decided to use a random forest model to output the likelihood for each track in the test data that I would add that track to my favorites.

Read about the full process in my blog.

Titanic Survival Predictions
December 2020

About a week ago, I took an online Python for Machine Learning course through General Assembly. Through that class, we were tasked with creating a set of predictions for the survival of passengers
aboard the Titanic. The original challenge was posted on Kaggle: we were given a training data set with the actual survival data of passengers, and a test set without survival data for which we had
to provide the predictions. This allowed me to use the supervised learning classification models K-Nearest Neighbors and Random Forest to output a set of predictions for data that I cleaned first.

My model scored a 73% accuracy.

2016 Presidential Election Analysis
March 2018

The culmination of the Data Mining class I took at UCSB was an exploratory statistical analysis of the 2016 presidential election in the statistical programming language R.

  1. Cleaned voting data by removing N/A values, sorting into federal/state/county levels, and ensuring consistency
  2. Plotted the voting data on a U.S. county map to visualize which counties voted for which candidate, in order to determine what to further investigate
  3. Inspected county voting outcomes on the basis of average poverty, to observe whether there was a significant difference in poverty levels between Trump and Clinton voters
  4. Ran PCA, random forest, and clustering to determine the most influential factors on 2016 voting results.

Blog at WordPress.com.