---
title: 'STAT 3202: Practice 08'
author: "Autumn 2018, OSU"
date: ''
output:
html_document:
theme: simplex
pdf_document: default
urlcolor: BrickRed
---
***
## Exercise 1
Consider a random variable $X$ that has a normal distribution with a mean of 5 and a variance of 9. Calculate $P[X > 4]$.
### Solution
```{r}
1 - pnorm(4, mean = 5, sd = 3)
pnorm(4, mean = 5, sd = 3, lower.tail = FALSE)
```
***
## Exercise 2
```{r}
# starter
```
Consider the simple linear regression model
$$
Y = -3 + 2.5x + \epsilon
$$
where
$$
\epsilon \sim N(0, \sigma^2 = 4).
$$
What is the expected value of $Y$ given that $x = 5$? That is, what is $\text{E}[Y \mid X = 5]$?
### Solution
```{r}
-3 + 2.5 * 5
```
***
## Exercise 3
Return to the simple linear regression model
$$
Y = -3 + 2.5x + \epsilon
$$
where
$$
\epsilon \sim N(0, \sigma^2 = 4).
$$
What is the standard deviation of $Y$ when $x$ is $10$. That is, what is $\text{SD}[Y \mid X = 10]$?
### Solution
```{r}
sqrt(4)
```
***
## Exercise 4
For this Exercise, use the built-in `trees` dataset in `R`. Fit a simple linear regression model with `Girth` as the response and `Height` as the predictor. What is the slope of the fitted regression line?
### Solution
```{r}
coef(lm(Girth ~ Height, data = trees))[2]
```
***
## Exercise 5
For this Exercise, use the built-in `trees` dataset in `R`. Fit a simple linear regression model with `Girth` as the response and `Height` as the predictor. What is the value of $R^2$ for this fitted SLR model?
### Solution
```{r}
summary(lm(Girth ~ Height, data = trees))$r.squared
```
***
## Exercise 6
Consider the simple linear regression model
$$
Y = 10 + 5x + \epsilon
$$
where
$$
\epsilon \sim N(0, \sigma^2 = 16).
$$
Calculate the probability that $Y$ is less than 6 given that $x = 0$.
### Solution
$$
Y \mid X = 0 \sim N(\mu = 10, \sigma^2 = 16)
$$
```{r}
x = 0
mu = 10 + 5 * x
sigma = 4
pnorm(6, mean = mu, sd = sigma)
```
***
## Exercise 7
Consider the simple linear regression model
$$
Y = 6 + 3x + \epsilon
$$
where
$$
\epsilon \sim N(0, \sigma^2 = 9).
$$
Calculate the probability that $Y$ is greater than 1.5 given that $x = -1$.
### Solution
$$
Y \mid X = -1 \sim N(\mu = 3, \sigma^2 = 9)
$$
```{r}
x = -1
mu = 6 + 3 * x
sigma = 3
pnorm(1.5, mean = mu, sd = sigma, lower.tail = FALSE)
```
***
## Exercise 8
Consider the simple linear regression model
$$
Y = 2 + -4x + \epsilon
$$
where
$$
\epsilon \sim N(0, \sigma^2 = 25).
$$
Calculate the probability that $Y$ is greater than 1.5 given that $x = 3$.
### Solution
$$
Y \mid X = 3 \sim N(\mu = -10, \sigma^2 = 25)
$$
```{r}
x = 3
mu = 2 - 4 * x
sigma = 5
pnorm(1.5, mean = mu, sd = sigma, lower.tail = FALSE)
```
***
## Exercise 9
For Exercises 9 - 15, use the `faithful` dataset, which is built into `R`.
Suppose we would like to predict the duration of an eruption of [the Old Faithful geyser](http://www.yellowstonepark.com/about-old-faithful/) in [Yellowstone National Park](https://en.wikipedia.org/wiki/Yellowstone_National_Park) based on the waiting time before an eruption. Fit a simple linear model in `R` that accomplishes this task.
What is the estimate of the intercept parameter?
### Solution
```{r}
faithful_model = lm(eruptions ~ waiting, data = faithful)
coef(faithful_model)[1]
```
***
## Exercise 10
What is the estimate of the slope parameter?
### Solution
```{r}
faithful_model = lm(eruptions ~ waiting, data = faithful)
coef(faithful_model)[2]
```
***
## Exercise 11
Use the fitted model to estimate the mean duration of eruptions when the waiting time is **78** minutes.
### Solution
```{r}
faithful_model = lm(eruptions ~ waiting, data = faithful)
predict(faithful_model, data.frame(waiting = 78))
```
***
## Exercise 12
Use the fitted model to estimate the mean duration of eruptions when the waiting time is **122** minutes.
### Solution
```{r}
faithful_model = lm(eruptions ~ waiting, data = faithful)
predict(faithful_model, data.frame(waiting = 122))
```
***
## Exercise 13
Consider making predictions of eruption duration for waiting times of 80 and 120 minutes, which is more reliable?
- 80
- 120
- Both are equally reliable
### Solution
```{r}
range(faithful$waiting)
```
- **80**
- 120
- Both are equally reliable
***
## Exercise 14
Calculate the RSS for the fitted model.
### Solution
```{r}
faithful_model = lm(eruptions ~ waiting, data = faithful)
sum(resid(faithful_model) ^ 2)
```
***
## Exercise 15
What proportion of the variation in eruption duration is explained by the linear relationship with waiting time?
### Solution
```{r}
faithful_model = lm(eruptions ~ waiting, data = faithful)
summary(faithful_model)$r.squared
```
## Exercise 16
For this Exercise, use the built-in `trees` dataset in `R`.
Fit a simple linear regression model with `Girth` as the response and `Height` as the predictor. Use this fitted model to give an estimate for the mean `Girth` of trees that are 81 feet tall.
### Solution
```{r}
tree_model = lm(Girth ~ Height, data = trees)
predict(tree_model, data.frame(Height = 81))
```
***
## Exercise 17
Suppose both Least Squares and Maximum Likelihood are used to fit a simple linear regression model to the same data. The estimates for the slope and the intercept will be:
- The same
- Different
- Possibly the same or different depending on the data
### Solution
- **The same**
- Different
- Possibly the same or different depending on the data
***
## Exercise 18
Consider the fitted regression model:
$$
\hat{y} = -1.5 + 2.3x
$$
Indicate all of the following that **must** be true:
- The difference between the $y$ values of observations at $x = 10$ and $x = 9$ is $2.3$.
- A good estimate for the mean of $Y$ when $x = 0$ is -1.5.
- There are observations in the dataset used to fit this regression with negative $y$ values.
### Solution
- The difference between the $y$ values of observations at $x = 10$ and $x = 9$ is $2.3$.
- **A good estimate for the mean of $Y$ when $x = 0$ is -1.5.**
- There are observations in the dataset used to fit this regression with negative $y$ values.
***
## Exercise 19
Indicate all of the following that are true:
- The SLR model assumes that errors are independent.
- The SLR model allows for a larger variances for larger values of the predictor variable.
- The SLR model assumes that the response variable follows a normal distribution.
- The SLR model assumes that the relationship between the response and the predictor is linear.
### Solution
- **The SLR model assumes that errors are independent.**
- The SLR model allows for a larger variances for larger values of the predictor variable.
- The SLR model assumes that the response variable follows a normal distribution.
- **The SLR model assumes that the relationship between the response and the predictor is linear.**
***
## Exercise 20
Suppose you fit a simple linear regression model and obtain $\hat{\beta}_1 = 0$. Does this mean that there is **no relationship** between the response and the predictor?
- Yes
- No
- Depends on the intercept
### Solution
- Yes
- **No**
- Depends on the intercept
A simple linear model will only detect a linear relationship between two variables.