Hello! You've found your way to my GitHub portfolio
Here you'll find a bunch of Data Science projects that showcase my skills!
Environmental Sound Classification ππ§
- Investigating Different Spectrograms and Audio Augmentation Methods on Convolutional Learning: Medium, GitHub
- Investigated the use of four different types of audio spectrograms on training: Mel Spectrograms, Mel Frequency Cepstral Coefficients, Tempograms, and Chromagrams.
- Investigated the use of stacked spectrograms on top of each other to produce 2D, 3D, and 4D images for training.
- Investigated the use of different audio augmentations on training, and different proportions of audio augmentation on the training set.
Pokemon vs Sklearn: Predicting 50,000 Battles with 8 Different ClassifiersπΎπ€
- Engineered features from a dataset of 800 Pokemon and 50,000 battles to predict the winner using 8 different classifiers
- Classifiers tested: Logistic Regression, Decision Tree, Random Forest, XGBoost, Gaussian Naive Bayes, Support Vector Machines, Stochastic Gradient Descent Classifier, K-Nearest Neighbor Classifier
- Used Hyperopt to tune the hyperparameters of best performing classifier (XGBoost) to 98.5% test accuracy
Artwork Style Prediction: Using Vision Transformers with Shifted Patch Tokenization and Locality Self Attention πΌπ
- Classified 7,000+ art pieces into five classes: drawing, painting, iconography, engraving, and sculpture.
- Used Vision Transformer suitable for training on small datasets by implementing shifted patch tokenization and locality self attention to force the attention module to pay more attention to the inter-token relations.
- Reached 82% test accuracy with this method.
Data Visualization in R (various projects) π¨π½βπ»π
- Projects:
- Age differences for male and female Olympic gymnasts who were successful or not in earning a medal, and how the age distribution changed over the years.
- Is GPA related to student income, the fatherβs educational level, or the studentβs perception of what an ideal diet is? Visualization and ANOVA analysis.
- What are the differences between taxons when looking at their expected gestation length, litter size, age of conception of the mother and father, and weight? PCA analysis
- Homework assignments:
- Chicken weights vs type of feed
- Highway fuel economy versus number of cylinders in cars and the distribution of each carβs city fuel economy by class and type of drive train with boxplots and ridgelines
- Popularity of college majors time series (growth vs decline) and Texas housing data pie charts
- Linear trendlines for animal vore types (carnivore, herbivore, etc), weight vs amount of sleep
Abstract Data Types/Data Structures π»πΎ (Private Repo, see projects document for access)
- Impossible Hangman: Full game implementation, impossible to win because the computer chooses from a word list that maximizes the number of possible words.
- Impossible Boggle: Full game implementation with Tries and Depth First Search. Guess words in a N by N grid of letters, the computer prints out all possible words.
- Treaps: Self balancing Binary Search Tree that maintains BST properties and heap properties with randomly assigned priorities.
- B+ Trees: Implementation of a B+ tree, an m-ary tree optimal for storing large amounts of data on disk, and traverses the structure by minimizing the number of reads to disk.
Tableau Sales Insights Dashboard πΈ π
- Used to SQL extract, transform and load data from database of sale transactions, customer accounts, markets, and products for a fictional online store AlitQ.
- Revenue analysis: revenue by market, sales quantity by market, top 5 products and customers, and revenue by year.
- Profit analysis: Profit by market, profit trend, customers table, customer type (E-commerce vs brick and mortar).
- Fully interactable, filter by market, year, quarter, month, customer type, etc.
COVID-19 OWID Dataset SQL and Tableau Analysis π·π
- Used MySQL to create views from the COVID-19 Dataset supplied by Our World In Data
- Built an interactive dashboard in Tableau to visualize the results.
- Fully Vaccinated % by country over time, highest infection rate, total cases by country over time, deaths by continent, global numbers.
Data Cleaning Exercise - Nashville Housing Market Dataset π§Όπ‘
- Straightforward data cleaning exercise on data from the Nashville housing market.
ARCGIS Map πΊπ
- Used ARCGIS to generate a map of Hurricane Evacuation Routes and Median Household Income by zipcode in Leon County
- Are our current hurricane evacuation measures underserving low income communities? Decide for yourself on my ARCGIS Webapp.
Alteryx Exercise π¨π½βπ»ππ½
- Used Alteryx to clean a dataset of Sales Opportunities
Musical Genre Classification π§ π¨
- Used Convolutional Neural Networks in Tensorflow to predict musical genres from audio files with multilabel classification
- Created and cleaned a dataset of 9,000+ ethically sourced audio files using the Spotify, Deezer and StreamRip APIs.
- Studied the effects of audio data augmentation, sample length, type of spectogram, and use of delta features as channels embedded in the image on test and validation accuracy.
- Made a cool little video showing off what this model can do on some of my favorite songs.
School Projects and Assignments π’ π
- Double majored in Scientific Computing and Statistics, graduated Spring of '22 from Florida State University. Go noles!
- 2018: Computational Thinking, Introduction to Scientific Computing
- 2019: Discrete Algorithms, Programming for Scientific Applications, Symbolic and Numerical Computations
- 2020: Continuous Algorithms I and II, Data Mining, Introduction to Deep Learning
- 2021: Computational Evolutionary Biology
- 2022: Applied Machine Learning, High Performance Computing