Gradient Descent for Polynomial Regression

Note

Solution now available! You can view it rendered on GitHub here.

There’s some fake data in the file data.csv, with a single feature x and a true value y. Your task is to:

Load the data and look at it
Split it into training, validation, and test sets
Create your design matrix
Implement gradient descent to find the best fit polynomial
Evaluate your model’s performance and experiment with different hyperparameters

It’s up to you to decide what degree polynomial to fit the data, and you can also play around with stochastic gradient descent, mini-batch, hyperparameters, etc.

Important

Do this without the use of scikit learn or other libraries aside from numpy and matplotlib!

Step 0: Import libraries and seed your random number generator

It’s usually a good idea to start with a consistent random number seed to ensure reproducibility.

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(seed="integer_of_your_choice")

Step 1: Load the data and look at it

x, y = np.loadtxt("data.csv", delimiter=",", skiprows=1, unpack=True)
#TODO: visualize

Step 2: Split the data

Weird numpy quirk: by default, a 1D array has a shape of (n,), but to behave as a proper vector, we need to convert it to be (n, 1). An easy way to do this is to pass np.newaxis as the second index when sampling your y data, e.g.:

n = len(y)
train_ids = rng.choice()
x_train, y_train = x[train_ids,], y[train_ids, np.newaxis]

Don’t worry about the x values for now, as we’ll be matrixifying them shortly anyway.

Step 3: Create your design matrix $X$ .

For the example given in class, the design matrix was simply a column of 1s concatenated with the feature vector, i.e.:

$X = 11 ⋮ 1 x_{1} x_{2} ⋮ x_{m}$

For this exercise, you probably want to fit a higher degree polynomial, so the design matrix will be something like:

$X = 11 ⋮ 1 x_{1} x_{2} ⋮ x_{m} x_{1}^{2} x_{2}^{2} ⋮ x_{m}^{2} \dots \dots ⋮ \dots x_{1}^{d} x_{2}^{d} ⋮ x_{m}^{d}$

where $d$ is the degree of the polynomial you want to fit. Try multiple degrees and see what gives the best results.

A note on scaling: the range of x values in this example is fairly small, but if you choose a high degree polynomial you will still end up with fairly different scales for your “features”. Consider normalizing each column of the design matrix (other than the first column accounting for the bias term), remembering to calculate your scaling parameters on the training data and apply them to the validation/test data.

Since you’ll be doing this twice (train/test), you might want to define a function to create the design matrix given a vector x and a degree d.

Step 4: Implement gradient descent

This has a number of sub components. First you’ll need to define your gradient function. For mean squared error, the gradient can be calculated as:

$\nabla_{θ} MSE = \frac{2}{m} X^{T} (Xθ - y)$

where $X$ is your design matrix, $θ$ is the current parameter vector, and $y$ is the true target value.

It’ll also be useful to define the actual mean squared error to evaluate your model:

$MSE = \frac{1}{m} (X θ - y)^{T} (X θ - y)$

Now you can define your hyperparameters and run your gradient descent. For batch gradient descent, you’ll need to define:

learning rate $η$ (usually in the range of $1 0^{- 5}$ to $1 0^{- 2}$ )
stopping criterion (can just be a fixed number of iterations)

The general algorithm for gradient descent is:

Start with a random $θ$
Calculate the gradient $\nabla_{θ}$ for the current $θ$
Update $θ$ as $θ = θ - η \nabla_{θ}$
Repeat 2-4 until some stopping criterion is met

You could also try mini-batch or stochastic gradient descent by adding an outer epoch loop if you want to get fancy.

Step 5: Evaluate your model’s performance and experiment

Now that you’ve computed a final estimate of $θ$ , apply it to your test set to see how well your model performs, perhaps by plotting the data as well as the best fit curve. If it doesn’t look good, try changing various hyperparameters, like $η$ , number of iterations, and degree of polynomial. If you didn’t rescale your design matrix earlier, try it now!

Technically we should have done a 3-way train/validate/test split, but I kept it as just train/test to keep things manageable.

Keyboard shortcuts

COMP 4630 | Winter 2026