Tutorial 1: Data exploration and wrangling

Before building a machine learning model, it is important to understand and wrangle your data into an appropriate numeric format. In this tutorial, we’ll look at how I like to set up my projects and some tips for exploratory visualizations.

Part 1: Project configuration

Tutorials are not marked in this course, so it’s up to you to keep track of them separately. I recommend copying the this directory to a new location rather than forking the entire w26 repo and working out of that, otherwise you’ll have a bunch of merge conflicts and extra stuff if you want to submit a PR to the main repo. Alternatively, you can create a fork and work within a separate branch.

Tools:

Part 2: Exploratory visualizations

Visualizations for the purposes of exploring data (rather than communicating results) can be “quick and dirty”, but there are some guidelines to consider, as well as a few tricks that can help.

Follow along with the notebook and answer the various TODOs.

Part 3: Reverse engineer a cleaned dataset

Create a new .ipynb file to explore this new dataset
Read the raw data into a pandas DataFrame. You can either download the zip file, or install the ucimlrepo package and fetch the data directly.
Read the pre-processed version into a different pandas DataFrame.
Try to answer the following questions:
1. How were the categorical features handled?
2. Were any of the numerical categories manipulated?
3. What additional transformations might be useful for this dataset?

Keyboard shortcuts

COMP 4630 | Winter 2026

Tutorial 1: Data exploration and wrangling

Part 1: Project configuration

Part 2: Exploratory visualizations

Part 3: Reverse engineer a cleaned dataset