William Chen
Master of Data Science student at UC Irvine (graduating Fall 2026), actively seeking data science internships.
I move across the stack, from statistical modeling, A/B testing, and machine learning to LLM applications, building in Python and R with a focus on turning analysis into decisions.
Experience
- Expanded underrepresented scene data using GAN-based augmentation to reduce false positives in scene classification, improving precision by 5%.
- Applied PCA and t-SNE to visualize scene distribution gaps, revealing that a single general model struggled across different scene types; grouped scenes by distribution and fine-tuned per group, improving accuracy by 15%.
- Designed an AI agent pipeline using Microsoft AutoGen to automate end-to-end data processing and model training, replacing manual intervention between steps with a single-trigger pipeline.
Cookie Cats A/B Testing
A/B testing on Cookie Cats with Frequentist and Bayesian comparison across retention and engagement metrics.
Customer Churn Prediction
End-to-end churn prediction with distribution shift analysis and model comparison.
Two-Tower Retrieval for Recommendation
Two-Tower retrieval model for movie recommendations with systematic embedding tuning.
Bike Sharing Demand Forecasting (Poisson)
Demand forecasting using Poisson and NB2 regression with overdispersion diagnostics and rolling-origin CV.
Bike Sharing Demand Forecasting (OLS)
Demand forecasting for bike rentals using OLS, Ridge, and Lasso regression with diagnostic analysis.
Multi-Class Skill Classification in StarCraft II
Multinomial classification of StarCraft II player skill levels with class imbalance strategies.
UCI Dataset Assistant (RAG)
RAG-powered chatbot that recommends UCI ML Repository datasets from natural language queries.
PR Description Generator
Turns git commits into professional PR descriptions.