Hi, I'm William Chen
I build statistical models and data systems that turn messy data into decisions, from Bayesian A/B tests to end-to-end cloud pipelines.

Get to Know Me
About Me
I'm William Chen, an Irvine-based Data Scientist with a strong focus on Statistical Modeling, Machine Learning, and Cloud Data Engineering. I turn messy data into clear decisions through rigorous analysis and end-to-end systems.
What I Do
I specialize in statistical modeling, A/B testing, and causal inference, building rigorous analyses in R and Python. Recent work spans Bayesian inference, GLMs, and end-to-end ML pipelines deployed on AWS.
Credentials & Experience
Master of Data Science candidate at UC Irvine. Industry experience at SHINSOFT in applied computer vision, where I lifted classification accuracy by 15% on 200K+ images through embedding analysis and model fine-tuning.
Outside the Code
Outside of work, I stay active through weightlifting and enjoy giving back through community service and volunteering.
What I'm Looking For
I'm actively seeking full-time roles in data science or data engineering where I can apply statistical rigor to real product decisions. I thrive in environments that value experimentation, clean pipelines, and cross-functional collaboration.
Trust the process. Learn from failure. Stay humble.
Arsenal
My Tech Stack
Languages & Databases
Statistical Methods
ML & Modeling
Cloud & Tools
Career
Work Experience

- Lifted classification accuracy by 15% on 200K+ camera-captured images by diagnosing indoor vs. outdoor distributional gaps through EfficientNet embedding analysis (PCA, t-SNE), revealing that a single general model failed across scene types, and fine-tuning a dedicated model on the underperforming segment.
- Identified data scarcity, not model capacity, as the root cause of false positives; designed a GAN-based augmentation strategy to expand the minority-scene training set, improving precision by 5%.
Portfolio
Featured Projects
Showing 9 projects
Bayesian Prior Sensitivity
Prior sensitivity in Bayesian logistic regression on birthwt (n=189), showing that rare predictors, not small n, determine when the prior stops mattering.
Citi Bike Data Pipeline (AWS)
AWS pipeline integrating NYC Citi Bike trips with Open-Meteo weather on S3 and Athena, structured as raw / processed / analytics layers.
Marketing A/B Testing
Bayesian and Frequentist A/B testing on 588K users, exposing the gap between statistical significance and practical effect.
Cookie Cats A/B Testing
Mobile game retention experiment on 90K players, showing why 1-day and 7-day metrics tell different stories about gate placement.
Bike Sharing Demand Forecasting (Poisson)
Count regression diagnosing severe overdispersion (variance/mean = 833) and resolving it with Negative Binomial GLM.
Bike Sharing Demand Forecasting (OLS)
Linear baseline with OLS, Ridge, and Lasso under rolling-origin CV, including full residual diagnostics.
Customer Churn Prediction
Telecom churn classifier on 500K+ records, including a train/test distributional inconsistency diagnosis that lifted accuracy from 57% to 94%.
Multi-Class Skill Classification in StarCraft II
Reformulated a published pairwise task into 6-class multinomial classification on 3,340 players, outperforming the baseline in 3 of 4 league pairs.
Two-Tower Retrieval for Recommendation
Two-tower neural retrieval on 100K implicit feedback interactions, indexed in ChromaDB and served via FastAPI.
Academic Background
Education
Master of Data Science
University of California, Irvine
Bachelor of Computer Science & Information Engineering
National Ilan University
Let's Connect
Let's Work Together
Passionate about turning data into actionable insights and building solutions that make an impact. Feel free to connect with me via email or your preferred platform.
Get In Touch