Below are some data science and machine learning projects from academia and personal studies. Also feel free to visit my Github page.

Implementation of Transformer

My implementation of transformer architecture from scratch using several different resources. The implementation is of the “vanilla” transformer descripted in Attention is All You Need Paper.

GitHub Repo

Yelp Reviews Q&A Bot

My mom runs a small business that is service oriented, and she would often go read Yelp reviews to understand how the customers felt about her bsuiness. To help her with this, I built a GPT based question and answering bot to retrieve relevant information using conversational query.

GitHub Repo

Gun Violence in the United States

This project tries to determine causes of gun deaths in the United States at the county level. I use various demographic, economic, and educational features to predict # of deaths from gun violence for a county in a given year. Resulting model is used to inference the most salient factors that are leading to gun violence in the US.

GitHub Repo

Video Similarity Search Engine

This is the capstone project for the MS Data Science program I am finishing up at University of Washington. Using various featurization methods, we convert videos into feature vectors which are queried using k-nearest neighbor search.

Github Repo

Flight Delays and Twitter Sentiment Analysis

Analysis of flight delays dataset from US Department of Transportation and Twitter sentiment of large US airlines.

GitHub Repo

Implementation of Linear Support Vector Machine

My implementation of Linear Support Vector Machine classifier in the style of scikit-learn. It can perform both binary classification and multi-class classification using one-vs-one and one-vs-rest methods.

GitHub Repo

Tools: Python (numpy, pandas, scikit-learn, matplotlib)

News Articles Recommender and Analyzer

News articles recommender and analyzer performs topic modeling using Latent Dirichlet Allocation to recommend to the user the best matching national news articles.

It also tags topics to each article and outputs word cloud visualization in addition to the recommendation system.

A link to working UI is in progress… currently the picture links to the GitHub repo!

GitHub Repo

Data Source: Kaggle

Tools: Python (nltk, guidedlda)

UFO Sightings Visualization/Dashboard

Visualization of UFO sightings in the United States in the past 70 years. Data source is from Kaggle, and contains info about ~80,000 UFO sightings around the world.

Data Source: Kaggle

Tools: Tableau, Python (folium)