Exercise 1

Consider a random variable \(X\) that has a normal distribution with a mean of 5 and a variance of 9. Calculate \(P[X > 4]\).

Solution

1 - pnorm(4, mean = 5, sd = 3)
## [1] 0.6305587
pnorm(4, mean = 5, sd = 3, lower.tail = FALSE)
## [1] 0.6305587

Exercise 2

# starter

Consider the simple linear regression model

\[ Y = -3 + 2.5x + \epsilon \]

where

\[ \epsilon \sim N(0, \sigma^2 = 4). \]

What is the expected value of \(Y\) given that \(x = 5\)? That is, what is \(\text{E}[Y \mid X = 5]\)?

Solution

-3 + 2.5 * 5
## [1] 9.5

Exercise 3

Return to the simple linear regression model

\[ Y = -3 + 2.5x + \epsilon \]

where

\[ \epsilon \sim N(0, \sigma^2 = 4). \]

What is the standard deviation of \(Y\) when \(x\) is \(10\). That is, what is \(\text{SD}[Y \mid X = 10]\)?

Solution

sqrt(4)
## [1] 2

Exercise 4

For this Exercise, use the built-in trees dataset in R. Fit a simple linear regression model with Girth as the response and Height as the predictor. What is the slope of the fitted regression line?

Solution

coef(lm(Girth ~ Height, data = trees))[2]
##    Height 
## 0.2557471

Exercise 5

For this Exercise, use the built-in trees dataset in R. Fit a simple linear regression model with Girth as the response and Height as the predictor. What is the value of \(R^2\) for this fitted SLR model?

Solution

summary(lm(Girth ~ Height, data = trees))$r.squared
## [1] 0.2696518

Exercise 6

Consider the simple linear regression model

\[ Y = 10 + 5x + \epsilon \]

where

\[ \epsilon \sim N(0, \sigma^2 = 16). \]

Calculate the probability that \(Y\) is less than 6 given that \(x = 0\).

Solution

\[ Y \mid X = 0 \sim N(\mu = 10, \sigma^2 = 16) \]

x = 0
mu = 10 + 5 * x
sigma = 4
pnorm(6, mean = mu, sd = sigma)
## [1] 0.1586553

Exercise 7

Consider the simple linear regression model

\[ Y = 6 + 3x + \epsilon \]

where

\[ \epsilon \sim N(0, \sigma^2 = 9). \]

Calculate the probability that \(Y\) is greater than 1.5 given that \(x = -1\).

Solution

\[ Y \mid X = -1 \sim N(\mu = 3, \sigma^2 = 9) \]

x = -1
mu = 6 + 3 * x
sigma = 3
pnorm(1.5, mean = mu, sd = sigma, lower.tail = FALSE)
## [1] 0.6914625

Exercise 8

Consider the simple linear regression model

\[ Y = 2 + -4x + \epsilon \]

where

\[ \epsilon \sim N(0, \sigma^2 = 25). \]

Calculate the probability that \(Y\) is greater than 1.5 given that \(x = 3\).

Solution

\[ Y \mid X = 3 \sim N(\mu = -10, \sigma^2 = 25) \]

x = 3
mu = 2 - 4 * x
sigma = 5
pnorm(1.5, mean = mu, sd = sigma, lower.tail = FALSE)
## [1] 0.01072411

Exercise 9

For Exercises 9 - 15, use the faithful dataset, which is built into R.

Suppose we would like to predict the duration of an eruption of the Old Faithful geyser in Yellowstone National Park based on the waiting time before an eruption. Fit a simple linear model in R that accomplishes this task.

What is the estimate of the intercept parameter?

Solution

faithful_model = lm(eruptions ~ waiting, data = faithful)
coef(faithful_model)[1]
## (Intercept) 
##   -1.874016

Exercise 10

What is the estimate of the slope parameter?

Solution

faithful_model = lm(eruptions ~ waiting, data = faithful)
coef(faithful_model)[2]
##    waiting 
## 0.07562795

Exercise 11

Use the fitted model to estimate the mean duration of eruptions when the waiting time is 78 minutes.

Solution

faithful_model = lm(eruptions ~ waiting, data = faithful)
predict(faithful_model, data.frame(waiting = 78))
##        1 
## 4.024964

Exercise 12

Use the fitted model to estimate the mean duration of eruptions when the waiting time is 122 minutes.

Solution

faithful_model = lm(eruptions ~ waiting, data = faithful)
predict(faithful_model, data.frame(waiting = 122))
##        1 
## 7.352594

Exercise 13

Consider making predictions of eruption duration for waiting times of 80 and 120 minutes, which is more reliable?

Solution

range(faithful$waiting)
## [1] 43 96
  • 80
  • 120
  • Both are equally reliable

Exercise 14

Calculate the RSS for the fitted model.

Solution

faithful_model = lm(eruptions ~ waiting, data = faithful)
sum(resid(faithful_model) ^ 2)
## [1] 66.56178

Exercise 15

What proportion of the variation in eruption duration is explained by the linear relationship with waiting time?

Solution

faithful_model = lm(eruptions ~ waiting, data = faithful)
summary(faithful_model)$r.squared
## [1] 0.8114608

Exercise 16

For this Exercise, use the built-in trees dataset in R.

Fit a simple linear regression model with Girth as the response and Height as the predictor. Use this fitted model to give an estimate for the mean Girth of trees that are 81 feet tall.

Solution

tree_model = lm(Girth ~ Height, data = trees)
predict(tree_model, data.frame(Height = 81))
##        1 
## 14.52712

Exercise 17

Suppose both Least Squares and Maximum Likelihood are used to fit a simple linear regression model to the same data. The estimates for the slope and the intercept will be:

Solution

  • The same
  • Different
  • Possibly the same or different depending on the data

Exercise 18

Consider the fitted regression model:

\[ \hat{y} = -1.5 + 2.3x \]

Indicate all of the following that must be true:

Solution

  • The difference between the \(y\) values of observations at \(x = 10\) and \(x = 9\) is \(2.3\).
  • A good estimate for the mean of \(Y\) when \(x = 0\) is -1.5.
  • There are observations in the dataset used to fit this regression with negative \(y\) values.

Exercise 19

Indicate all of the following that are true:

Solution

  • The SLR model assumes that errors are independent.
  • The SLR model allows for a larger variances for larger values of the predictor variable.
  • The SLR model assumes that the response variable follows a normal distribution.
  • The SLR model assumes that the relationship between the response and the predictor is linear.

Exercise 20

Suppose you fit a simple linear regression model and obtain \(\hat{\beta}_1 = 0\). Does this mean that there is no relationship between the response and the predictor?

Solution

  • Yes
  • No
  • Depends on the intercept

A simple linear model will only detect a linear relationship between two variables.