Hi, I’m

Christoffer Tan

I turn messy data into useful products.

Computer Science x Data Science @ UofT.

About

Hey, I’m Christoffer 👋

I’m Christoffer, a Computer Science × Data Science student at the University of Toronto, originally from Palembang, Indonesia. I enjoy turning messy data into clean, practical tools and building software that solves real-world problems.

Outside of tech, I’m often on a court playing soccer, basketball, or badminton, and recently I’ve been exploring new sports like padel and pickleball. I love travelling and have visited 12 countries so far, with many more on my list. I like taking on challenges both in and out of tech.

Data ScienceSoftware EngineeringSports Enthusiast

Skills

Data Science

Exploratory analysis, statistical modeling, and deriving insights from large-scale datasets.

PythonRSQLPandasNumPyTidyverse

Machine Learning

From predictive modeling of bike-share usage to NLP-driven risk item grouping and topic modeling of tweets.

Scikit-learnXGBoostRandom ForestsNLPTopic Modeling

Software Engineering

Developing production-ready APIs, scalable features, and apps with clean architecture principles.

TypeScriptJavaNode.jsExpressREST APIsGit

Web & Infra

Building full-stack apps, containerized environments, and cloud-ready infrastructures.

Next.jsReactDockerMongoDBPostgreSQLAWS

Experience

Professional Experience

Data Science Intern
Royal Bank of Canada (RBC)
Toronto, ON | May 2025 – Aug. 2025
- Developed an NLP pipeline in Python to detect unauthorized IT assets by ranking the top 3 closest matches from 1,000+ records, reducing compliance review time by 80%.
- Built an executive dashboard using Python and SQL to track the top 10 high-risk hardware and software assets, enabling leaders to prioritize remediation and reduce potential downtime risks for 5,000+ employees.
- Created an operational risk metrics dashboard in Kibana to monitor key performance indicators and generate reports for leadership, improving visibility into risk trends across 10+ teams.
PythonSQLNLPKibanaData Visualization
Backend Software Engineer Intern
Bang Jamin
Jakarta, Indonesia | Jun. 2024 – Aug. 2024
- Engineered and optimized REST APIs for a high-traffic car dealer dashboard using TypeScript and Node.js, delivering faster load times and seamless data retrieval for 1,000+ active users.
- Streamlined insurance policy generation by integrating multiple external APIs with MongoDB queries, cutting manual processing steps in half and boosting efficiency by 60%.
- Established an API testing framework with Jest and produced comprehensive Swagger documentation, enabling new developers to contribute productively within their first week.
TypeScriptNode.jsMongoDBJestSwagger

Academic Experience

Teaching Assistant
University of Toronto
Toronto, ON | Sept. 2024 – Present
- Supporting instruction for 1,000+ students per course across STA130: Introduction to Data Science, STA237: Probability & Statistics, MAT135: Calculus I, and MAT136: Calculus II
- Teaching practical data skills using R and Python, focusing on visualization, wrangling, and statistical modeling.
- Evaluating weekly assignments and exams with consistent grading standards to ensure fairness and clarity of feedback.
RPythonCommunicationProbabilityMathematics
Research Assistant
Prof. Yang Xu, University of Toronto
Toronto, ON | Sept. 2025 – Present
- Collaborating with the Department of Kinesiology & Physical Education to examine how “talent” is represented in sports language using computational and linguistic analysis.
- Designing and implementing an NLP pipeline with large language models (Phi-3 Mini, Gemini AI) to extract and classify traits from 500+ NHL player reports across psychological, physical, technical, and tactical categories.
- Applying BERT-based embeddings and dimensionality-reduction techniques to visualize semantic clusters and identify position-specific patterns in evaluator language, aiming to enhance early-stage scouting analytics.
PythonNLPMachine LearningSports Analytics

Projects

Predictive Modelling of Bike Share Usage

Apr 2025

Compared five models (LM, GLM, GAM, RF, XGBoost) on 25,000+ Toronto trip records with weather data. Achieved R² = 0.85 and RMSE = 330 using Random Forest, with results shown on an interactive website.

RMachine LearningAPIplotly

View

Predicting Food Preferences

Apr 2025

Built four models (Random Forest, Softmax Regression, Naive Bayes Gaussian Discriminant Analysis, and Neural Networks to classify pizza, shawarma, or sushi preferences from 1,600+ survey responses. Achieved 85% test accuracy with an ensemble approach.

PythonNumPyScikit-learnNeural Networks

Report

Modeling Fertility Patterns in Portugal

Feb 2025

Applied Poisson and Negative Binomial regression to study how literacy, marriage age, and region affect family size using a 1979 fertility survey. Accounted for confounding and overdispersion with interaction terms, controls, and offsets, finalizing on a Negative Binomial model.

RGLMNegative BinomialStatistics

Report

Predicting NBA Salaries

Dec 2024

Built linear regression models in R to study how player performance metrics and achievements predict NBA salaries. Enhanced model validity with Box-Cox transformations, automated selection via AIC, VIF checks, and partial F-tests.

RLinear RegressionModel DiagnosticsStatistics

GitHub