Consider a random variable \(X\) that has a \(t\) distribution with \(7\) degrees of freedom. Calculate \(P[X > 1.3]\).
1 - pt(1.3, df = 7)## [1] 0.1173839pt(1.3, df = 7, lower.tail = FALSE)## [1] 0.1173839Consider a random variable \(Y\) that has a \(t\) distribution with \(9\) degrees of freedom. Find \(c\) such that \(P[X > c] = 0.025\).
qt(1 - 0.025, df = 9)## [1] 2.262157qt(0.025, df = 9, lower.tail = FALSE)## [1] 2.262157For this Exercise, use the built-in trees dataset in R. Fit a simple linear regression model with Girth as the response and Height as the predictor. What is the p-value for testing \(H_0: \beta_1 = 0\) vs \(H_1: \beta_1 \neq 0\)?
tree_model = lm(Girth ~ Height, data = trees)
summary(tree_model)$coefficients["Height", "Pr(>|t|)"]## [1] 0.002757815Continue using the SLR model you fit in Exercise 3. What is the length of a 90% confidence interval for \(\beta_1\)?
tree_model = lm(Girth ~ Height, data = trees)
ci_beta_1 = confint(tree_model, parm = "Height", level = 0.90)
ci_beta_1[2] - ci_beta_1[1]## [1] 0.2656018Continue using the SLR model you fit in Exercise 3. Calculate a 95% confidence interval for the mean tree girth of a tree that is 79 feet tall. Report the upper bound of this interval.
tree_model = lm(Girth ~ Height, data = trees)
predict(tree_model, newdata = data.frame(Height = 79), interval = "confidence")[, "upr"]## [1] 15.12646Consider a random variable \(X\) that has a \(t\) distribution with \(5\) degrees of freedom. Calculate \(P[|X| > 2.1]\).
pt(-2.1, df = 5) + pt(2.1, df = 5, lower.tail = FALSE)## [1] 0.089753252 * pt(2.1, df = 5, lower.tail = FALSE)## [1] 0.08975325Calculate the critical value used for a 90% confidence interval about the slope parameter of a simple linear regression model that is fit to 10 observations. (Your answer should be a positive value.)
conf_level = 0.90
sig_level = 1 - conf_level
n = 10
abs(qt(sig_level / 2, df = n - 2))## [1] 1.859548Consider the true simple linear regression model
\[ Y_i = 5 + 4 x_i + \epsilon_i \qquad \epsilon_i \sim N(0, \sigma^2 = 4) \qquad i = 1, 2, \ldots 20 \]
Given \(S_{xx} = 1.5\), calculate the probability of observing data according to this model, fitting the SLR model, and obtaining an estimate of the slope parameter greater than 4.2. In other words, calculate
\[ P[\hat{\beta}_1 > 4.2] \]
Sxx = 1.5
beta_1 = 4
sigma = 2
e_beta_1_hat = 4
sd_beta_1_hat = sqrt(sigma ^ 2 / Sxx)
pnorm(4.2, mean = e_beta_1_hat, sd = sd_beta_1_hat, lower.tail = FALSE)## [1] 0.4512616\[ \hat{\beta}_1 \sim N\left( \beta_1, \frac{\sigma^2}{S_{xx}} \right) \]
For Exercises 9 - 13, use the faithful dataset, which is built into R.
Suppose we would like to predict the duration of an eruption of the Old Faithful geyser in Yellowstone National Park based on the waiting time before an eruption. Fit a simple linear model in R that accomplishes this task.
What is the value of \(\text{SE}[\hat{\beta}_1]\)?
faithful_model = lm(eruptions ~ waiting, data = faithful)
summary(faithful_model)$coefficients["waiting", "Std. Error"]## [1] 0.002218541What is the value of the test statistic for testing \(H_0: \beta_0 = 0\) vs \(H_1: \beta_0 \neq 0\)?
faithful_model = lm(eruptions ~ waiting, data = faithful)
summary(faithful_model)$coefficients["(Intercept)", "t value"]## [1] -11.70212What is the value of the test statistic for testing \(H_0: \beta_1 = 0\) vs \(H_1: \beta_1 \neq 0\)?
faithful_model = lm(eruptions ~ waiting, data = faithful)
summary(faithful_model)$coefficients["waiting", "t value"]## [1] 34.08904Test \(H_0: \beta_1 = 0\) vs \(H_1: \beta_1 \neq 0\) with \(\alpha = 0.01\). What decision do you make?
faithful_model = lm(eruptions ~ waiting, data = faithful)
summary(faithful_model)$coefficients["waiting", "Pr(>|t|)"]## [1] 8.129959e-100Calculate a 90% confidence interval for \(\beta_0\). Report the upper bound of this interval.
faithful_model = lm(eruptions ~ waiting, data = faithful)
confint(faithful_model, parm = "(Intercept)", level = 0.90)[, 2]## [1] -1.609697For this Exercise, use the Orange dataset, which is built into R.
Use a simple linear regression model to create a 90% confidence interval for the change in mean circumference of orange trees in millimeters when age is increased by 1 day. Report the lower bound of this interval.
orange_model = lm(circumference ~ age, data = Orange)
confint(orange_model, parm = "age", level = 0.90)[, 1]## [1] 0.0927633For this Exercise, use the Orange dataset, which is built into R.
Use a simple linear regression model to create a 90% confidence interval for the mean circumference of orange trees in millimeters when the age is 250 days. Report the lower bound of this interval.
orange_model = lm(circumference ~ age, data = Orange)
predict(orange_model, interval = "confidence", newdata = data.frame(age = 250), level = 0.90)[, "lwr"]## [1] 32.48418For this Exercise, use the cats dataset from the MASS package.
Use a simple linear regression model to create a 99% prediction interval for a cat’s heart weight in grams if their body weight is 2.5 kilograms. Report the upper bound of this interval.
library(MASS)
cat_model = lm(Hwt ~ Bwt, data = cats)
predict(cat_model, interval = "prediction", level = 0.99, 
        newdata = data.frame(Bwt = 2.5))[, "upr"]## [1] 13.53644Consider a 90% confidence interval for the mean response and a 90% prediction interval, both at the same \(x\) value. Which interval is narrower?
Suppose you obtain a 99% confidence interval for \(\beta_1\) that is \((-0.4, 5.2)\). Now test \(H_0: \beta_1 = 0\) vs \(H_1: \beta_1 \neq 0\) with \(\alpha = 0.01\). What decision do you make?
Suppose you test \(H_0: \beta_1 = 0\) vs \(H_1: \beta_1 \neq 0\) with \(\alpha = 0.01\) and fail to reject \(H_0\). Indicate all of the following that must always be true:
Consider a 95% confidence interval for the mean response calculated at \(x = 6\). If instead we calculate the interval at \(x = 7\), mark each value that would change: