My Soundtrack
January 2021
Music is very important to me and plays a large role in my life, so I decided to combine my two worlds and use predictive modeling in Python to analyze my music tastes and create a recommendation system based on my favorite tracks.
I curated a training data set by reading the audio features for all the songs in my Spotify library; I marked all of my favorite songs with a “1”, and the rest with a “0”. The test data set consisted of songs from three Spotify playlists that I regularly listen to. After exploring the data further, I decided to use a random forest model to output the likelihood for each track in the test data that I would add that track to my favorites.
Read about the full process in my blog.
Titanic Survival Predictions
December 2020
About a week ago, I took an online Python for Machine Learning course through General Assembly. Through that class, we were tasked with creating a set of predictions for the survival of passengers
aboard the Titanic. The original challenge was posted on Kaggle: we were given a training data set with the actual survival data of passengers, and a test set without survival data for which we had
to provide the predictions. This allowed me to use the supervised learning classification models K-Nearest Neighbors and Random Forest to output a set of predictions for data that I cleaned first.My model scored a 73% accuracy.
2016 Presidential Election Analysis
March 2018
The culmination of the Data Mining class I took at UCSB was an exploratory statistical analysis of the 2016 presidential election in the statistical programming language R.
- Cleaned voting data by removing N/A values, sorting into federal/state/county levels, and ensuring consistency
- Plotted the voting data on a U.S. county map to visualize which counties voted for which candidate, in order to determine what to further investigate
- Inspected county voting outcomes on the basis of average poverty, to observe whether there was a significant difference in poverty levels between Trump and Clinton voters
- Ran PCA, random forest, and clustering to determine the most influential factors on 2016 voting results.