Exercise 1

Consider a random variable \(X\) that has a \(t\) distribution with \(7\) degrees of freedom. Calculate \(P[X > 1.3]\).

Solution

1 - pt(1.3, df = 7)
## [1] 0.1173839
pt(1.3, df = 7, lower.tail = FALSE)
## [1] 0.1173839

Exercise 2

Consider a random variable \(Y\) that has a \(t\) distribution with \(9\) degrees of freedom. Find \(c\) such that \(P[X > c] = 0.025\).

Solution

qt(1 - 0.025, df = 9)
## [1] 2.262157
qt(0.025, df = 9, lower.tail = FALSE)
## [1] 2.262157

Exercise 3

For this Exercise, use the built-in trees dataset in R. Fit a simple linear regression model with Girth as the response and Height as the predictor. What is the p-value for testing \(H_0: \beta_1 = 0\) vs \(H_1: \beta_1 \neq 0\)?

Solution

tree_model = lm(Girth ~ Height, data = trees)
summary(tree_model)$coefficients["Height", "Pr(>|t|)"]
## [1] 0.002757815

Exercise 4

Continue using the SLR model you fit in Exercise 3. What is the length of a 90% confidence interval for \(\beta_1\)?

Solution

tree_model = lm(Girth ~ Height, data = trees)
ci_beta_1 = confint(tree_model, parm = "Height", level = 0.90)
ci_beta_1[2] - ci_beta_1[1]
## [1] 0.2656018

Exercise 5

Continue using the SLR model you fit in Exercise 3. Calculate a 95% confidence interval for the mean tree girth of a tree that is 79 feet tall. Report the upper bound of this interval.

Solution

tree_model = lm(Girth ~ Height, data = trees)
predict(tree_model, newdata = data.frame(Height = 79), interval = "confidence")[, "upr"]
## [1] 15.12646

Exercise 6

Consider a random variable \(X\) that has a \(t\) distribution with \(5\) degrees of freedom. Calculate \(P[|X| > 2.1]\).

Solution

pt(-2.1, df = 5) + pt(2.1, df = 5, lower.tail = FALSE)
## [1] 0.08975325
2 * pt(2.1, df = 5, lower.tail = FALSE)
## [1] 0.08975325

Exercise 7

Calculate the critical value used for a 90% confidence interval about the slope parameter of a simple linear regression model that is fit to 10 observations. (Your answer should be a positive value.)

Solution

conf_level = 0.90
sig_level = 1 - conf_level
n = 10
abs(qt(sig_level / 2, df = n - 2))
## [1] 1.859548

Exercise 8

Consider the true simple linear regression model

\[ Y_i = 5 + 4 x_i + \epsilon_i \qquad \epsilon_i \sim N(0, \sigma^2 = 4) \qquad i = 1, 2, \ldots 20 \]

Given \(S_{xx} = 1.5\), calculate the probability of observing data according to this model, fitting the SLR model, and obtaining an estimate of the slope parameter greater than 4.2. In other words, calculate

\[ P[\hat{\beta}_1 > 4.2] \]

Solution

Sxx = 1.5
beta_1 = 4
sigma = 2
e_beta_1_hat = 4
sd_beta_1_hat = sqrt(sigma ^ 2 / Sxx)
pnorm(4.2, mean = e_beta_1_hat, sd = sd_beta_1_hat, lower.tail = FALSE)
## [1] 0.4512616

\[ \hat{\beta}_1 \sim N\left( \beta_1, \frac{\sigma^2}{S_{xx}} \right) \]


Exercise 9

For Exercises 9 - 13, use the faithful dataset, which is built into R.

Suppose we would like to predict the duration of an eruption of the Old Faithful geyser in Yellowstone National Park based on the waiting time before an eruption. Fit a simple linear model in R that accomplishes this task.

What is the value of \(\text{SE}[\hat{\beta}_1]\)?

Solution

faithful_model = lm(eruptions ~ waiting, data = faithful)
summary(faithful_model)$coefficients["waiting", "Std. Error"]
## [1] 0.002218541

Exercise 10

What is the value of the test statistic for testing \(H_0: \beta_0 = 0\) vs \(H_1: \beta_0 \neq 0\)?

Solution

faithful_model = lm(eruptions ~ waiting, data = faithful)
summary(faithful_model)$coefficients["(Intercept)", "t value"]
## [1] -11.70212

Exercise 11

What is the value of the test statistic for testing \(H_0: \beta_1 = 0\) vs \(H_1: \beta_1 \neq 0\)?

Solution

faithful_model = lm(eruptions ~ waiting, data = faithful)
summary(faithful_model)$coefficients["waiting", "t value"]
## [1] 34.08904

Exercise 12

Test \(H_0: \beta_1 = 0\) vs \(H_1: \beta_1 \neq 0\) with \(\alpha = 0.01\). What decision do you make?

Solution

faithful_model = lm(eruptions ~ waiting, data = faithful)
summary(faithful_model)$coefficients["waiting", "Pr(>|t|)"]
## [1] 8.129959e-100
  • Fail to reject \(H_0\)
  • Reject \(H_0\)
  • Reject \(H_1\)
  • Not enough information

Exercise 13

Calculate a 90% confidence interval for \(\beta_0\). Report the upper bound of this interval.

Solution

faithful_model = lm(eruptions ~ waiting, data = faithful)
confint(faithful_model, parm = "(Intercept)", level = 0.90)[, 2]
## [1] -1.609697

Exercise 14

For this Exercise, use the Orange dataset, which is built into R.

Use a simple linear regression model to create a 90% confidence interval for the change in mean circumference of orange trees in millimeters when age is increased by 1 day. Report the lower bound of this interval.

Solution

orange_model = lm(circumference ~ age, data = Orange)
confint(orange_model, parm = "age", level = 0.90)[, 1]
## [1] 0.0927633

Exercise 15

For this Exercise, use the Orange dataset, which is built into R.

Use a simple linear regression model to create a 90% confidence interval for the mean circumference of orange trees in millimeters when the age is 250 days. Report the lower bound of this interval.

Solution

orange_model = lm(circumference ~ age, data = Orange)
predict(orange_model, interval = "confidence", newdata = data.frame(age = 250), level = 0.90)[, "lwr"]
## [1] 32.48418

Exercise 16

For this Exercise, use the cats dataset from the MASS package.

Use a simple linear regression model to create a 99% prediction interval for a cat’s heart weight in grams if their body weight is 2.5 kilograms. Report the upper bound of this interval.

Solution

library(MASS)
cat_model = lm(Hwt ~ Bwt, data = cats)
predict(cat_model, interval = "prediction", level = 0.99, 
        newdata = data.frame(Bwt = 2.5))[, "upr"]
## [1] 13.53644

Exercise 17

Consider a 90% confidence interval for the mean response and a 90% prediction interval, both at the same \(x\) value. Which interval is narrower?

Solution

  • Confidence interval
  • Prediction interval
  • No enough information, it depends on the value of \(x\)

Exercise 18

Suppose you obtain a 99% confidence interval for \(\beta_1\) that is \((-0.4, 5.2)\). Now test \(H_0: \beta_1 = 0\) vs \(H_1: \beta_1 \neq 0\) with \(\alpha = 0.01\). What decision do you make?

Solution

  • Fail to reject \(H_0\)
  • Reject \(H_0\)
  • Reject \(H_1\)
  • Not enough information

Exercise 19

Suppose you test \(H_0: \beta_1 = 0\) vs \(H_1: \beta_1 \neq 0\) with \(\alpha = 0.01\) and fail to reject \(H_0\). Indicate all of the following that must always be true:

Solution

  • There is no relationship between the response and the predictor.
  • The probability of observing the estimated value of \(\beta_1\) (or something more extreme) is greater than \(0.01\) if we assume that \(\beta_1 = 0\).
  • The value of \(\hat{\beta}_1\) is very small. For example, it could not be 1.2.
  • The probability that \(\beta_1 = 0\) is very high.
  • We would also fail to reject at \(\alpha = 0.05\).

Exercise 20

Consider a 95% confidence interval for the mean response calculated at \(x = 6\). If instead we calculate the interval at \(x = 7\), mark each value that would change:

Solution

  • Point Estimate
  • Critical Value
  • Standard Error