Intro and data cleaning

Component	Weight
Assignments	3 x 10%
Midterm (theory) exam	20%
Journal club	10%
Final project	40%

4a. Handling missing data

In the book 3 options are listed to handle the NaN values:

housing.dropna(subset=["total_bedrooms"], inplace=True) ## option 1
housing.drop("total_bedrooms", axis=1)                  ## option 2
median = housing["total_bedrooms"].median()             ## option 3
housing["total_bedrooms"].fillna(median, inplace=True)

Discussion questions:

What is each option doing?
What are the pros and cons of each option?
Which one should we choose?

Welcome to Machine Learning!

What is this course about?

How did I get involved with ML?

What do you want to learn about ML?

Grade Assessment

Textbooks and other readings

Generative AI policy

Machine Learning Project Checklist

1. Look at the big picture

Where we left off on Wednesday, January 7

2. Get the data

2a. Set aside a test set

Side tangent: Sampling bias

Side tangent: Sampling bias continued

3. Explore the data

3a. Look for correlations

Where we left off on Monday, January 12

4. Prepare the data

4a. Handling missing data

4b. Handling non-numeric data

4c. Scaling the data

4e. Standardization details

4f. Other transformations

Coming up next