My Portfolio

My name is Caleb O'Neel. I am a graduate student at Duke University getting my Masters in Data Science graduating this May.

Using Airbnb Data to predict Prices Across Regions

A random forest model to predict European Airbnb prices using data from the United States with a focus on feature engineering.

Predicting End of Season Batting Averages

I expirament with various Machine Learning Models to try to model end of season batting averages based upon data from just the first two months of the season.

Identifying Talent in the NFL Draft

In this project, I evaluate NFL teams ability to identify and draft talent relative to their draft position over the past decade leveraging data provided by Pro Football Focus.

Opioid Legislation Difference-in-Difference Analysis

In this project, I used Machine Learning to predict pricing and explore how training models on data from large US cities would translate to large metropolitans in Europe.

Neural Networks Exploration

In this project, I tune neural network parameters to improve performance.

Supervised Learning: Model Training and Evaluation

I manually recreate logistic regression and gradient descent algorithms. Additionally do some image classification and work with Bayes rule.

Machine Learning Basics

This project goes over some of the basics of machine learning, including some of the answering conceptual questions about bias-variance tradeoff, creating a KNN by hand, and going through some of the scoring metrics.

Asthma Patients in California

This is a project I did for my Modeling and Representation of Data class. The data set is from a study to compare the quality of services provided by two physician groups for asthma patients in California. Specifically, for patient i, let Yi(w) be the quality of service as judged by the patient (1=satisfactory, 0=not satisfactory), if the patient is served by physician group w, for w=1,2. The patients who visit the two groups can differ, and so a set of covariates are measured.

About Me


Growing up in Silicon Valley during the tech explosion, I’ve been baptized in the gospel of big data. Virtually every adult I knew growing up worked in either in the tech industry or a tech focused role - so most dinner table conversations or evenings spent with family friends revolved around the Tech world. I think because of this, that world has always excited me. However, I never really knew how I fit into it. Sure, I enjoyed some math like Statistics and Algebra (others like Calculus less so), and the few programming classes I took were interesting enough; but I had always viewed myself as more of the creative type. I enjoyed reading, writing and history, but these disciplines felt incompatible with the world of Tech. I didn't see myself as cutout for software engineering or any other type of engineering really, so I studied finance and business analytics in undergrad. It wasn’t until after graduation working at J.P. Morgan that I really heard about data science, and I knew it was exactly what I had been looking for.

Data Science is the blend of math, coding, storytelling and critical thinking that sits at the intersection of all my academic interests. Once this light bulb in my brain went on I instantly began researching Data Science graduate programs and began preparing for the GRE.

What I love about Duke’s program (in addition to the incredibly accessible and fantastic professors) is that the program emphasizes the aspects of data science such as study design, causal inference and storytelling in addition to all the crucial technical skills such as coding, machine learning, statistics and linear algebra. The technical skills are crucial, and I truly love learning them - but they are simply the tools you use. If you cannot understand what questions to ask, how to set up an effective experiment and what models most accurately deliver the answer to those questions, and how to effectively communicate those results you will not be an effective data scientist no matter how consummate your abilities in math and coding.

Outside of school work I enjoy cooking, hiking, traveling, soccer, football and reading books on ancient European history. I’ll spare you some tenuous attempt to make an abstract connection these hobbies have to data science. Instead I’ll simply leave it at that and include some killer book recs and recipes below. I'm also addicted to power ranking things, so I’ll throw up a few of those as well - these rankings should be taken as definitive and not up for debate.

Coursework

List of classes I have taken as part of my Masters in Data Science at Duke University. Ranges from August 2020 to December 2021.

Machine Learning

Automating prediction and decision-making using supervised, unsupervised, and reinforcement learning techniques. Focused on theory and math behind models to provide understanding of how the models work “under the hood” to better understand their best use cases.

Natural Language Processing

Using textual data produced by websites, social media platforms, digitization of administrative and historical records, and new monitoring technologies to gain insights and make decisions. Worked with TF-IDF and Neural net models to extract insights from text sources.

Causal Inference

This course is focused on how to answer questions effectively using quantitative data. Learned to recognize different types of questions (e.g. descriptive, causal, and predictive questions), and have an understanding of what methodological approaches are most appropriate for answering each type of question, and able to design and critically evaluate data analysis plans, and understand how to tailor their presentation of results to different audiences.

Probability

In this class we learned the theory of probability statistics. This included Combinatorial Probability, Limit Theorems, Probability distributions, conditional probability and markov chains among other items.

Matrices and Vector Spaces

Linear algebra course focusing on concepts, methods and applications. Covers topics such as Gaussian elimination, matrix factorization, eigenvalues, orthogonality and PCA.

Modeling and Representation of Data

Statistical modeling class focused on analyzing multivariate datasets. Focused on casual modeling and inferential questions. Covers topics such as Linear and Logistic Regression, hierarchical modeling and imputation. Class conducted primarily in R.

Cloud and Data Engineering

Learning to create and navigate databases using SQL. Leveraging Cloud Resources for tasks like data warehousing, computing powers, industrial scale Machine Learning, creating and deploying websites and flask applications. Primarily taught through AWS but with some exposure to GCP and Azure.

Python for Data Science

Python programming class with a focus on data science applications. Heavy emphasis on pandas and data manipulation. Covers the basics of data science workflows and the use of github, pair programming, and code review as well.

Algorithms

The mathematical theory of algorithms and graphs and their practical implementations. Examines the foundational mathematical structures for the behavior and analysis of algorithms from a variety of domains, with a particular emphasis on graphs. We tie theory to practice by writing code to implement algorithms, and compare experimentally observed run-times to those predicted by the mathematical theory.

Data Visualization

Course focusing on data visualization and storytelling. Emphasis on communicating the message of what the data tells us through written and verbal communication as well as the use of graphics. Tableau is the primary software used for this class.

Data Ethics

Data science tools are not morally neutral. In this course we think explicitly about their social responsibility as data scientists and the impact on the world of what they are building and analyzing. Using contemporary case studies from recent news stories and legal cases, we learn about issues such as intellectual copyright, consent, data security, differences between privacy and confidentiality, difficulties of anonymization, and bias in artificial intelligence.