## Exercise 1

Consider a random variable $$X$$ that has a normal distribution with a mean of 5 and a variance of 9. Calculate $$P[X > 4]$$.

### Solution

1 - pnorm(4, mean = 5, sd = 3)
## [1] 0.6305587
pnorm(4, mean = 5, sd = 3, lower.tail = FALSE)
## [1] 0.6305587

## Exercise 2

# starter

Consider the simple linear regression model

$Y = -3 + 2.5x + \epsilon$

where

$\epsilon \sim N(0, \sigma^2 = 4).$

What is the expected value of $$Y$$ given that $$x = 5$$? That is, what is $$\text{E}[Y \mid X = 5]$$?

### Solution

-3 + 2.5 * 5
## [1] 9.5

## Exercise 3

$Y = -3 + 2.5x + \epsilon$

where

$\epsilon \sim N(0, \sigma^2 = 4).$

What is the standard deviation of $$Y$$ when $$x$$ is $$10$$. That is, what is $$\text{SD}[Y \mid X = 10]$$?

### Solution

sqrt(4)
## [1] 2

## Exercise 4

For this Exercise, use the built-in trees dataset in R. Fit a simple linear regression model with Girth as the response and Height as the predictor. What is the slope of the fitted regression line?

### Solution

coef(lm(Girth ~ Height, data = trees))[2]
##    Height
## 0.2557471

## Exercise 5

For this Exercise, use the built-in trees dataset in R. Fit a simple linear regression model with Girth as the response and Height as the predictor. What is the value of $$R^2$$ for this fitted SLR model?

summary(lm(Girth ~ Height, data = trees))$r.squared ## [1] 0.2696518 ## Exercise 6 Consider the simple linear regression model $Y = 10 + 5x + \epsilon$ where $\epsilon \sim N(0, \sigma^2 = 16).$ Calculate the probability that $$Y$$ is less than 6 given that $$x = 0$$. ### Solution $Y \mid X = 0 \sim N(\mu = 10, \sigma^2 = 16)$ x = 0 mu = 10 + 5 * x sigma = 4 pnorm(6, mean = mu, sd = sigma) ## [1] 0.1586553 ## Exercise 7 Consider the simple linear regression model $Y = 6 + 3x + \epsilon$ where $\epsilon \sim N(0, \sigma^2 = 9).$ Calculate the probability that $$Y$$ is greater than 1.5 given that $$x = -1$$. ### Solution $Y \mid X = -1 \sim N(\mu = 3, \sigma^2 = 9)$ x = -1 mu = 6 + 3 * x sigma = 3 pnorm(1.5, mean = mu, sd = sigma, lower.tail = FALSE) ## [1] 0.6914625 ## Exercise 8 Consider the simple linear regression model $Y = 2 + -4x + \epsilon$ where $\epsilon \sim N(0, \sigma^2 = 25).$ Calculate the probability that $$Y$$ is greater than 1.5 given that $$x = 3$$. ### Solution $Y \mid X = 3 \sim N(\mu = -10, \sigma^2 = 25)$ x = 3 mu = 2 - 4 * x sigma = 5 pnorm(1.5, mean = mu, sd = sigma, lower.tail = FALSE) ## [1] 0.01072411 ## Exercise 9 For Exercises 9 - 15, use the faithful dataset, which is built into R. Suppose we would like to predict the duration of an eruption of the Old Faithful geyser in Yellowstone National Park based on the waiting time before an eruption. Fit a simple linear model in R that accomplishes this task. What is the estimate of the intercept parameter? ### Solution faithful_model = lm(eruptions ~ waiting, data = faithful) coef(faithful_model)[1] ## (Intercept) ## -1.874016 ## Exercise 10 What is the estimate of the slope parameter? ### Solution faithful_model = lm(eruptions ~ waiting, data = faithful) coef(faithful_model)[2] ## waiting ## 0.07562795 ## Exercise 11 Use the fitted model to estimate the mean duration of eruptions when the waiting time is 78 minutes. ### Solution faithful_model = lm(eruptions ~ waiting, data = faithful) predict(faithful_model, data.frame(waiting = 78)) ## 1 ## 4.024964 ## Exercise 12 Use the fitted model to estimate the mean duration of eruptions when the waiting time is 122 minutes. ### Solution faithful_model = lm(eruptions ~ waiting, data = faithful) predict(faithful_model, data.frame(waiting = 122)) ## 1 ## 7.352594 ## Exercise 13 Consider making predictions of eruption duration for waiting times of 80 and 120 minutes, which is more reliable? • 80 • 120 • Both are equally reliable ### Solution range(faithful$waiting)
## [1] 43 96
• 80
• 120
• Both are equally reliable

## Exercise 14

Calculate the RSS for the fitted model.

### Solution

faithful_model = lm(eruptions ~ waiting, data = faithful)
sum(resid(faithful_model) ^ 2)
## [1] 66.56178

## Exercise 15

What proportion of the variation in eruption duration is explained by the linear relationship with waiting time?

### Solution

faithful_model = lm(eruptions ~ waiting, data = faithful)
summary(faithful_model)\$r.squared
## [1] 0.8114608

## Exercise 16

For this Exercise, use the built-in trees dataset in R.

Fit a simple linear regression model with Girth as the response and Height as the predictor. Use this fitted model to give an estimate for the mean Girth of trees that are 81 feet tall.

### Solution

tree_model = lm(Girth ~ Height, data = trees)
predict(tree_model, data.frame(Height = 81))
##        1
## 14.52712

## Exercise 17

Suppose both Least Squares and Maximum Likelihood are used to fit a simple linear regression model to the same data. The estimates for the slope and the intercept will be:

• The same
• Different
• Possibly the same or different depending on the data

### Solution

• The same
• Different
• Possibly the same or different depending on the data

## Exercise 18

Consider the fitted regression model:

$\hat{y} = -1.5 + 2.3x$

Indicate all of the following that must be true:

• The difference between the $$y$$ values of observations at $$x = 10$$ and $$x = 9$$ is $$2.3$$.
• A good estimate for the mean of $$Y$$ when $$x = 0$$ is -1.5.
• There are observations in the dataset used to fit this regression with negative $$y$$ values.

### Solution

• The difference between the $$y$$ values of observations at $$x = 10$$ and $$x = 9$$ is $$2.3$$.
• A good estimate for the mean of $$Y$$ when $$x = 0$$ is -1.5.
• There are observations in the dataset used to fit this regression with negative $$y$$ values.

## Exercise 19

Indicate all of the following that are true:

• The SLR model assumes that errors are independent.
• The SLR model allows for a larger variances for larger values of the predictor variable.
• The SLR model assumes that the response variable follows a normal distribution.
• The SLR model assumes that the relationship between the response and the predictor is linear.

### Solution

• The SLR model assumes that errors are independent.
• The SLR model allows for a larger variances for larger values of the predictor variable.
• The SLR model assumes that the response variable follows a normal distribution.
• The SLR model assumes that the relationship between the response and the predictor is linear.

## Exercise 20

Suppose you fit a simple linear regression model and obtain $$\hat{\beta}_1 = 0$$. Does this mean that there is no relationship between the response and the predictor?

• Yes
• No
• Depends on the intercept

### Solution

• Yes
• No
• Depends on the intercept

A simple linear model will only detect a linear relationship between two variables.