Goal: After completing this lab, you should be able to…

In this lab we will use, but not focus on…

Some additional notes:


Exercise 0 - Cars

In class we looked at the (boring) cars dataset. Use ?cars to learn more about this dataset. (For example, the year that it was gathered.)

head(cars)
plot(dist ~ speed, data = cars, pch = 20)
grid()

Our purpose with this dataset was to fit a line that summarized the data. We did this with the lm() function in R.

cars_mod = lm(dist ~ speed, data = cars)

Using the summary() function on the result of the lm() function produced some useful output, including the slope and intercept of the line that we fit.

summary(cars_mod)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

We could use the abline() function to add this line to a plot.

plot(dist ~ speed, data = cars, pch = 20)
grid()
abline(cars_mod, col = "red")

Next let’s look at the predict() function. We will use it to create three different estimates, the latter two which we will explore this week during class, but are easy to do in R.

To understand the predict() function, we must first understand its first two arguments:

names(cars)
## [1] "speed" "dist"

Here we see that the cars data frame used to fit the model cars_mod has two variables, speed (which we used as the predictor variable, \(x\)) and dist (which we used as the response variable, \(y\)).

The following chunk estimates the mean stopping distance of a car, traveling at 30 miles per hour.

predict(object = cars_mod, 
        newdata = data.frame(speed = 30))
##        1 
## 100.3932

Note that we created a data frame that was immediately passed to newdata which contained a variable speed, and a single observation of 30. (This might seem like a lot of work. Why not just use newdata = 30. Well, for one that doesn’t work, but more importantly, later you’ll see that the predict() function is more powerful that we are showing in this lab.)

data.frame(speed = 30)

Now let’s add two more arguments, interval and level. By doing so below, we are creating a 95% confidence interval for the mean stopping distance of a car traveling 30 miles per hour.

predict(object = cars_mod, 
        newdata = data.frame(speed = 30),
        interval = "confidence",
        level = 0.95)
##        fit      lwr      upr
## 1 100.3932 87.43543 113.3509

This returns three values

So here we are 95% confidence that the mean (average) stopping distance of a car traveling 30 miles per hour is between 87.44 and 113.35. But what if instead of the mean, we are interested in a new observation?

predict(object = cars_mod, 
        newdata = data.frame(speed = 30),
        interval = "prediction",
        level = 0.95)
##        fit      lwr     upr
## 1 100.3932 66.86529 133.921

This code creates a 95% prediction interval. That means that we are 95% confident that a car traveling 30 miles per hour will stop between 55.6667 and 145.1196. Notice that this interval is much wider that the interval for the mean! (We’ll discuss this in detail on Wednesday.)

Exercise 1 - Cats

For this exercise we will use the cats dataset from the MASS package. You should use ?cats to learn about the background of this dataset.

library(MASS)
head(cats)
# your code here
# your code here
# your code here
# your code here

Exercise 2 - Goalie Penalty Minutes

For this exercise we will use the data stored in goalies.txt. It contains career data for 462 players in the National Hockey League who played goaltender at some point up to and including the 2014-2015 season. The variables in the dataset are:

The data is imported in the following chunk. We selected only certain columns from the original data, and remove some missing data.

goalies = read.csv("https://daviddalpiaz.github.io/stat3202-au18/data/goalies.txt")
goalies = na.omit(subset(goalies, 
                         select = c(Player, First, Last, GP, W, L, GA, 
                                    SA, SV, SV_PCT, GAA, SO, MIN, PIM)))
head(goalies)

Let’s take a look at a couple in particular. First, Crazy Eddie Belfour because, Go Blackhawks!

subset(goalies, Player == "Ed Belfour*")

Next, the current goaltender for your Columbus Blue Jackets, Sergei BOBROVSKY!

subset(goalies, Player == "Sergei Bobrovsky")
# your code here
# your code here
# your code here
# your code here

Exercise 3 - Goalie Saves

Return to the goalies dataset form the previous exercise.

# your code here
# your code here
# your code here
# your code here