Consider a random variable \(X\) that has a \(t\) distribution with \(7\) degrees of freedom. Calculate \(P[X > 1.3]\).
1 - pt(1.3, df = 7)
## [1] 0.1173839
pt(1.3, df = 7, lower.tail = FALSE)
## [1] 0.1173839
Consider a random variable \(Y\) that has a \(t\) distribution with \(9\) degrees of freedom. Find \(c\) such that \(P[X > c] = 0.025\).
qt(1 - 0.025, df = 9)
## [1] 2.262157
qt(0.025, df = 9, lower.tail = FALSE)
## [1] 2.262157
For this Exercise, use the built-in trees
dataset in R
. Fit a simple linear regression model with Girth
as the response and Height
as the predictor. What is the p-value for testing \(H_0: \beta_1 = 0\) vs \(H_1: \beta_1 \neq 0\)?
tree_model = lm(Girth ~ Height, data = trees)
summary(tree_model)$coefficients["Height", "Pr(>|t|)"]
## [1] 0.002757815
Continue using the SLR model you fit in Exercise 3. What is the length of a 90% confidence interval for \(\beta_1\)?
tree_model = lm(Girth ~ Height, data = trees)
ci_beta_1 = confint(tree_model, parm = "Height", level = 0.90)
ci_beta_1[2] - ci_beta_1[1]
## [1] 0.2656018
Continue using the SLR model you fit in Exercise 3. Calculate a 95% confidence interval for the mean tree girth of a tree that is 79 feet tall. Report the upper bound of this interval.
tree_model = lm(Girth ~ Height, data = trees)
predict(tree_model, newdata = data.frame(Height = 79), interval = "confidence")[, "upr"]
## [1] 15.12646
Consider a random variable \(X\) that has a \(t\) distribution with \(5\) degrees of freedom. Calculate \(P[|X| > 2.1]\).
pt(-2.1, df = 5) + pt(2.1, df = 5, lower.tail = FALSE)
## [1] 0.08975325
2 * pt(2.1, df = 5, lower.tail = FALSE)
## [1] 0.08975325
Calculate the critical value used for a 90% confidence interval about the slope parameter of a simple linear regression model that is fit to 10 observations. (Your answer should be a positive value.)
conf_level = 0.90
sig_level = 1 - conf_level
n = 10
abs(qt(sig_level / 2, df = n - 2))
## [1] 1.859548
Consider the true simple linear regression model
\[ Y_i = 5 + 4 x_i + \epsilon_i \qquad \epsilon_i \sim N(0, \sigma^2 = 4) \qquad i = 1, 2, \ldots 20 \]
Given \(S_{xx} = 1.5\), calculate the probability of observing data according to this model, fitting the SLR model, and obtaining an estimate of the slope parameter greater than 4.2. In other words, calculate
\[ P[\hat{\beta}_1 > 4.2] \]
Sxx = 1.5
beta_1 = 4
sigma = 2
e_beta_1_hat = 4
sd_beta_1_hat = sqrt(sigma ^ 2 / Sxx)
pnorm(4.2, mean = e_beta_1_hat, sd = sd_beta_1_hat, lower.tail = FALSE)
## [1] 0.4512616
\[ \hat{\beta}_1 \sim N\left( \beta_1, \frac{\sigma^2}{S_{xx}} \right) \]
For Exercises 9 - 13, use the faithful
dataset, which is built into R
.
Suppose we would like to predict the duration of an eruption of the Old Faithful geyser in Yellowstone National Park based on the waiting time before an eruption. Fit a simple linear model in R
that accomplishes this task.
What is the value of \(\text{SE}[\hat{\beta}_1]\)?
faithful_model = lm(eruptions ~ waiting, data = faithful)
summary(faithful_model)$coefficients["waiting", "Std. Error"]
## [1] 0.002218541
What is the value of the test statistic for testing \(H_0: \beta_0 = 0\) vs \(H_1: \beta_0 \neq 0\)?
faithful_model = lm(eruptions ~ waiting, data = faithful)
summary(faithful_model)$coefficients["(Intercept)", "t value"]
## [1] -11.70212
What is the value of the test statistic for testing \(H_0: \beta_1 = 0\) vs \(H_1: \beta_1 \neq 0\)?
faithful_model = lm(eruptions ~ waiting, data = faithful)
summary(faithful_model)$coefficients["waiting", "t value"]
## [1] 34.08904
Test \(H_0: \beta_1 = 0\) vs \(H_1: \beta_1 \neq 0\) with \(\alpha = 0.01\). What decision do you make?
faithful_model = lm(eruptions ~ waiting, data = faithful)
summary(faithful_model)$coefficients["waiting", "Pr(>|t|)"]
## [1] 8.129959e-100
Calculate a 90% confidence interval for \(\beta_0\). Report the upper bound of this interval.
faithful_model = lm(eruptions ~ waiting, data = faithful)
confint(faithful_model, parm = "(Intercept)", level = 0.90)[, 2]
## [1] -1.609697
For this Exercise, use the Orange
dataset, which is built into R
.
Use a simple linear regression model to create a 90% confidence interval for the change in mean circumference of orange trees in millimeters when age is increased by 1 day. Report the lower bound of this interval.
orange_model = lm(circumference ~ age, data = Orange)
confint(orange_model, parm = "age", level = 0.90)[, 1]
## [1] 0.0927633
For this Exercise, use the Orange
dataset, which is built into R
.
Use a simple linear regression model to create a 90% confidence interval for the mean circumference of orange trees in millimeters when the age is 250 days. Report the lower bound of this interval.
orange_model = lm(circumference ~ age, data = Orange)
predict(orange_model, interval = "confidence", newdata = data.frame(age = 250), level = 0.90)[, "lwr"]
## [1] 32.48418
For this Exercise, use the cats
dataset from the MASS
package.
Use a simple linear regression model to create a 99% prediction interval for a cat’s heart weight in grams if their body weight is 2.5 kilograms. Report the upper bound of this interval.
library(MASS)
cat_model = lm(Hwt ~ Bwt, data = cats)
predict(cat_model, interval = "prediction", level = 0.99,
newdata = data.frame(Bwt = 2.5))[, "upr"]
## [1] 13.53644
Consider a 90% confidence interval for the mean response and a 90% prediction interval, both at the same \(x\) value. Which interval is narrower?
Suppose you obtain a 99% confidence interval for \(\beta_1\) that is \((-0.4, 5.2)\). Now test \(H_0: \beta_1 = 0\) vs \(H_1: \beta_1 \neq 0\) with \(\alpha = 0.01\). What decision do you make?
Suppose you test \(H_0: \beta_1 = 0\) vs \(H_1: \beta_1 \neq 0\) with \(\alpha = 0.01\) and fail to reject \(H_0\). Indicate all of the following that must always be true:
Consider a 95% confidence interval for the mean response calculated at \(x = 6\). If instead we calculate the interval at \(x = 7\), mark each value that would change: