Projects
Below are some data science and machine learning projects from academia and personal studies. Also feel free to visit my Github page.
Implementation of Transformer
My implementation of transformer architecture from scratch using several different resources. The implementation is of the “vanilla” transformer descripted in Attention is All You Need Paper.
Yelp Reviews Q&A Bot
My mom runs a small business that is service oriented, and she would often go read Yelp reviews to understand how the customers felt about her bsuiness. To help her with this, I built a GPT based question and answering bot to retrieve relevant information using conversational query.
Gun Violence in the United States
This project tries to determine causes of gun deaths in the United States at the county level. I use various demographic, economic, and educational features to predict # of deaths from gun violence for a county in a given year. Resulting model is used to inference the most salient factors that are leading to gun violence in the US.
Video Similarity Search Engine
This is the capstone project for the MS Data Science program I am finishing up at University of Washington. Using various featurization methods, we convert videos into feature vectors which are queried using k-nearest neighbor search.
Flight Delays and Twitter Sentiment Analysis
Analysis of flight delays dataset from US Department of Transportation and Twitter sentiment of large US airlines.
Implementation of Linear Support Vector Machine
My implementation of Linear Support Vector Machine classifier in the style of scikit-learn. It can perform both binary classification and multi-class classification using one-vs-one and one-vs-rest methods.
Tools: Python (numpy, pandas, scikit-learn, matplotlib)
News Articles Recommender and Analyzer
News articles recommender and analyzer performs topic modeling using Latent Dirichlet Allocation to recommend to the user the best matching national news articles.
It also tags topics to each article and outputs word cloud visualization in addition to the recommendation system.
A link to working UI is in progress… currently the picture links to the GitHub repo!
Data Source: Kaggle
Tools: Python (nltk, guidedlda)
UFO Sightings Visualization/Dashboard
Visualization of UFO sightings in the United States in the past 70 years. Data source is from Kaggle, and contains info about ~80,000 UFO sightings around the world.
Data Source: Kaggle
Tools: Tableau, Python (folium)