Data Scientist · Data Engineer

Hi, I'm William Chen

I build statistical models and data systems that turn messy data into decisions, from Bayesian A/B tests to end-to-end cloud pipelines.

MDS @ UC IrvineGraduating Dec 2026Open to opportunities
William Chen

Get to Know Me

About Me

I'm William Chen, an Irvine-based Data Scientist with a strong focus on Statistical Modeling, Machine Learning, and Cloud Data Engineering. I turn messy data into clear decisions through rigorous analysis and end-to-end systems.

What I Do

I specialize in statistical modeling, A/B testing, and causal inference, building rigorous analyses in R and Python. Recent work spans Bayesian inference, GLMs, and end-to-end ML pipelines deployed on AWS.

Credentials & Experience

Master of Data Science candidate at UC Irvine. Industry experience at SHINSOFT in applied computer vision, where I lifted classification accuracy by 15% on 200K+ images through embedding analysis and model fine-tuning.

Outside the Code

Outside of work, I stay active through weightlifting and enjoy giving back through community service and volunteering.

What I'm Looking For

I'm actively seeking full-time roles in data science or data engineering where I can apply statistical rigor to real product decisions. I thrive in environments that value experimentation, clean pipelines, and cross-functional collaboration.

Trust the process. Learn from failure. Stay humble.

Arsenal

My Tech Stack

Languages & Databases

PythonRSQL

Statistical Methods

Causal InferenceA/B TestingBayesian InferenceGLMHypothesis TestingRegularized Regression

ML & Modeling

Random ForestLogistic Regressionscikit-learnstatsmodelsPyTorchRAGTwo-Tower Models

Cloud & Tools

AWS S3AWS AthenaAWS EC2DockerFastAPIStreamlitGitpandasNumPyTypeScript

Career

Work Experience

SHINSOFT
SHINSOFT CO., LTD.
Taipei, Taiwan
Project Engineer — Data Science Focus
AUG 2024 – FEB 2025
  • Lifted classification accuracy by 15% on 200K+ camera-captured images by diagnosing indoor vs. outdoor distributional gaps through EfficientNet embedding analysis (PCA, t-SNE), revealing that a single general model failed across scene types, and fine-tuning a dedicated model on the underperforming segment.
  • Identified data scarcity, not model capacity, as the root cause of false positives; designed a GAN-based augmentation strategy to expand the minority-scene training set, improving precision by 5%.

Portfolio

Featured Projects

Showing 9 projects

Bayesian Prior Sensitivity

Prior sensitivity in Bayesian logistic regression on birthwt (n=189), showing that rare predictors, not small n, determine when the prior stops mattering.

StatisticsAWSR

Citi Bike Data Pipeline (AWS)

AWS pipeline integrating NYC Citi Bike trips with Open-Meteo weather on S3 and Athena, structured as raw / processed / analytics layers.

Data EngineeringAWSPython

Marketing A/B Testing

Bayesian and Frequentist A/B testing on 588K users, exposing the gap between statistical significance and practical effect.

StatisticsA/B TestingR

Cookie Cats A/B Testing

Mobile game retention experiment on 90K players, showing why 1-day and 7-day metrics tell different stories about gate placement.

StatisticsA/B TestingR

Bike Sharing Demand Forecasting (Poisson)

Count regression diagnosing severe overdispersion (variance/mean = 833) and resolving it with Negative Binomial GLM.

StatisticsR

Bike Sharing Demand Forecasting (OLS)

Linear baseline with OLS, Ridge, and Lasso under rolling-origin CV, including full residual diagnostics.

StatisticsR

Customer Churn Prediction

Telecom churn classifier on 500K+ records, including a train/test distributional inconsistency diagnosis that lifted accuracy from 57% to 94%.

StatisticsMachine LearningR

Multi-Class Skill Classification in StarCraft II

Reformulated a published pairwise task into 6-class multinomial classification on 3,340 players, outperforming the baseline in 3 of 4 league pairs.

StatisticsMachine LearningR

Two-Tower Retrieval for Recommendation

Two-tower neural retrieval on 100K implicit feedback interactions, indexed in ChromaDB and served via FastAPI.

Machine LearningDeep LearningPython

Academic Background

Education

University of California, Irvine

Master of Data Science

University of California, Irvine

Irvine, California
2025 – 2026
National Ilan University

Bachelor of Computer Science & Information Engineering

National Ilan University

Ilan, Taiwan
2020 – 2024

Let's Connect

Let's Work Together

Passionate about turning data into actionable insights and building solutions that make an impact. Feel free to connect with me via email or your preferred platform.

Get In Touch

Location

Irvine, California, USA

Available for on-site, remote, and internship roles globally.