Please see the homework policy document for detailed instructions and some grading notes. Failure to follow instructions will result in point reductions.

“Nobody actually creates perfect code the first time around, except me. But there’s only one of me.”

For this homework, you may only use the following packages:

```
# general
library(MASS)
library(caret)
library(tidyverse)
library(knitr)
library(kableExtra)
library(mlbench)
# specific
library(randomForest)
library(gbm)
library(klaR)
library(ellipse)
```

If you feel additional general packages would be useful for future homework, please pass these along to the instructor.

You should use the `caret`

package and training pipeline to complete this homework. **Any time you use the train() function, first run set.seed(1337).**

`caret`

)**[6 points]** For this exercise we will train KNN regression models for the `Boston`

data from the `MASS`

package. Use `medv`

as the response and all other variables as predictors. Use the test-train split given below. When tuning models and reporting cross-validated error, use 5-fold cross-validation.

```
data(Boston, package = "MASS")
set.seed(1)
bstn_idx = createDataPartition(Boston$medv, p = 0.75, list = FALSE)
bstn_trn = Boston[bstn_idx, ]
bstn_tst = Boston[-bstn_idx, ]
```

Consider \(k \in \{1, 5, 10, 15, 20, 25, 30, 35\}\) and two pre-processing setups:

- Do
**not**scale the predictors. **Do**scale the predictors.

Provide plots of cross-validated error versus tuning parameters for both KNN pre-processing setups. Use the same value on the \(y\) axis for both plots. (You can be lazy and let `caret`

create these plots. Since it will use `lattice`

plotting, putting them side-by-side, or on the same plot would be difficult.)

`caret`

)**[7 points]** For this exercise we will train more regression models for the `Boston`

data from the `MASS`

package. Use `medv`

as the response and all other variables as predictors. Use the test-train split given previously. When tuning models and reporting cross-validated error, use 5-fold cross-validation.

Traing a total of three new models:

- An additive linear regression
- A random forest
- Use the default tuning parameters chosen by
`caret`

- Use the default tuning parameters chosen by
- A boosted tree model (Use
`gbm`

)- Use the provided tuning grid below

```
gbm_grid = expand.grid(interaction.depth = c(1, 2, 3),
n.trees = (1:20) * 100,
shrinkage = c(0.1, 0.3),
n.minobsinnode = 20)
```

Provide plots of error versus tuning parameters for the the boosted tree model. Also provide a table that summarizes the cross-validated and test RMSE for each of the three (tuned) models as well as the two models tuned in the previous exercise.

`caret`

)**[7 points]** For this exercise we will train a number of classifiers using the training data generated below. The categorical response variable is `classes`

and the remaining variables should be used as predictors. When tuning models and reporting cross-validated error, use 10-fold cross-validation. We will not use a test set for this exercise.

```
set.seed(42)
# simulate data using mlbench
sim_trn = mlbench.2dnormals(n = 500, cl = 7, r = 10, sd = 3)
# create tidy data
sim_trn = data.frame(
classes = sim_trn$classes,
sim_trn$x
)
```

```
featurePlot(x = sim_trn[, -1],
y = sim_trn$classes,
plot = "pairs",
auto.key = list(columns = 2),
par.settings = list(superpose.symbol = list(pch = 1:9))
)
```