Please see the homework instructions document for detailed instructions and some grading notes. Failure to follow instructions will result in point reductions.

Exercise 1

[15 points] This exercise will use data in hw02-train-data.csv and hw02-test-data.csv which are train and test datasets respectively. Both datasets contain a single predictor x and a numeric response y.

Fit a total of 20 linear models. Each will be a polynomial model. Use degrees from 1 to 20. So, the smallest model you fit will be:

y ~ poly(x, degree = 1)

The largest model you fit will be:

y ~ poly(x, degree = 20)

For each model, calculate Train and Test RMSE. Summarize these results using a single plot which displays RMSE (both Train and Test) as a function of the degree of polynomial used. (Be sure to make the plot easy-to-read, and well labeled.) Note which polynomial degree appears to perform the “best,” as well as which polynomial degrees appear to be underfitting and overfitting.

Exercise 2

[15 points] This exercise will again use data in hw02-train-data.csv and hw02-test-data.csv which are train and test datasets respectively. Both datasets contain a single predictor x and a numeric response y.

Fit a total of 10 nearest neighbors models. Each will use a different value of k, the tuning parameter for the number of neighbors. Use the values of k defined in the following R chunk.

k = seq(5, 50, by = 5)

For simplicity, do not worry about scaling the x variable.

For each value of the tuning parameter, calculate Train and Test RMSE. Summarize these results using a single well-formatted table which displays RMSE (both Train and Test), k, and whether or not that value of the tuning parameter appears to be overfitting, underfitting, or the “best” value of the tuning parameter. Consider rounding your results to show only enough precision to choose the “best” model.

Homework 02

STAT 430, Fall 2017

Due: Friday, September 22, 11:59 PM

Exercise 1

Exercise 2