Welcome to my portfolio. Below are a few of the computer science projects I have done recently. I believe that they give a brief snapshot of some of my coding skills, as well as insight into some of my interests within the field.
Detecting AI Generated Text
As a personal project, I trained models to differentiate AI-generated text from essays written by humans. The purpose of this project was to get more hands-on experience with natural language processing, and to learn more about a very important application of machine learning today, and in the future. After experimenting with various vectorizers and models, this is what I came up with:
In python, I removed urls, punctuation, html, and emojis from the dataset using re.
After splitting the data, I vectorized the texts, using SKlearn’s TF-IDF vectorizer with 1-3 ngrams. While including unigrams may seem like it would just add noise to the data, their inclusion captures the overuse of common words such as ‘the’ in AI-generated text.
I trained the model using Support Vector Classification, with a Radial Basis Function Kernel. I avoided a polynomial kernel due to the time complexity and had no reason to believe a linear or sigmoid kernel would be better for capturing trends in this specific dataset.
I also used hyperparameter tuning on a model with a linear and RBF kernel, using a grid search to find the optimal C-value, which happened to be 0.5 for the RBF kernel, giving me an F1 score of 0.964, which I was very satisfied with as a final result. Shown on the left are the results of a grid search for a linear kernel with C-values from 1 to 5.
Dating App Backbone
In a computer science class, my final project was to code a dating app in C++. I was able to do this, but with a standard trie instead of a radix trie (shown left), a space-optimized trie that stores data in the most compressed format possible, rather than uniformly by single character. Keys are computed part-by-part at each step instead of all at once at the beginning. The program worked, and given the size of my dataset, there was no practical need for a radix trie, but I was not satisfied. I came back to the project later to implement a working radix trie as I had originally envisioned.
Music Genre Classification
In a machine learning class, I had an open-ended project where I decided to train a model to classify music by genre. To begin, I did some standard preprocessing and prepped my data for multilabel classification using numpy and re.
After experimenting with k-nearest neighbors, Gaussian Naive-Bayes, XGBoost’s classifier, and a decision tree classifier, I decided to create my own convolutional neural network.
I built the model shown on the left using tensorflow. Its first dense layer begins with 512 neurons, and decreases in dimensionality with each layer. Each dense layer uses the Rectified Linear Unit activation function to capture complex patterns by introducing non-linearity to the data. After each dense layer is a 20% dropout layer to prevent overfitting. The final layer is the output layer, which uses a softmax activation function to classify the data into the ten categories.
After training this model over 300 epochs using the Adam optimizer, a stochastic gradient descent method, and sparse categorical cross entropy loss, it was able to correctly classify 93% of songs in the test dataset into one of ten genres.
Super Peach Sisters
Here, I was given the visual and audio components of a Super Mario Bros-like game, and tasked with completing it in C++.
The key to this project is the hierarchical class and subclass structure. All game objects are organized under a general Actor class, grouped with similar-behaving objects to maximize efficiency. I define interactions and behavior for all of these classes, as well as world-generation, interaction with the input stream, and behind-the-scenes elements of the game.
While this project lacks a singularly complicated concept, it showcases a different variety of coding from my other projects, this time with a very concrete goal, which I successfully achieved.