Week 13 Exercise 2a

Outline of ridge regression with cross-validation optimization of the penalty parameter

Load the data


		load('ridgeData') %contains matrix of predictors (X) and vector of outcomes (Y)

Setup: Add a constant feature for the intercept term, and define the numbers of cases and predictors


		X(:,end+1) = 1; %append a column of 1s for intercept term (constant predictor)

		[n,m] = size(X); %number of cases, number of predictors

Randomly split the data into training and test cases


		ranks = randperm(n); %random ordering of the n cases

		Xtrain = X(ranks(1:ntrain),:); %take ntrain cases for the training set

		Ytrain = Y(ranks(1:ntrain)); %the corresponding training outcomes

		Xtest = X(ranks(ntrain+1:n),:); %use the remaining data as the test set

		Ytest = Y(ranks(ntrain+1:n)); %the corresponding test outcomes

Calculate the estimated weight vector based on the training set, using the ridge regression solution β^* = (X^TX+λI)^-1X^TY


		bhat = (Xtrain'*Xtrain+L*eye(m))^(-1)*Xtrain'*Ytrain; %estimated regression coefficients

Make predictions for the test set


		Yhat = Xtest*bhat; %predictions for test cases

Calculate root mean squared error between predictions and actual outcomes in the test set


		rmse = sqrt(mean((Yhat-Ytest).^2)); %root mean squared error

Now put the last four steps into a loop, to get CV performance over multiple train-test splits ("folds"). Calculate the mean RMSE and standard error of the mean across folds. I found that 100 folds was enough to get adequately small standard errors.

Then make an outer loop to test different values of the penalty parameter (L). Plot the mean and standard error as a function of L (see Matlab's errorbar function), and visually find the value of L giving the best performance.

Finally, make an outer loop to vary the size of the training set (ntrain). Try training set sizes both smaller and larger than the number of predictors. How does the profile of RMSE as a function of L compare across different training set sizes?