Consider a random variable \(X\) that has a normal distribution with a mean of 5 and a variance of 9. Calculate \(P[X > 4]\).

`1 - pnorm(4, mean = 5, sd = 3)`

`## [1] 0.6305587`

`pnorm(4, mean = 5, sd = 3, lower.tail = FALSE)`

`## [1] 0.6305587`

`# starter`

Consider the simple linear regression model

\[ Y = -3 + 2.5x + \epsilon \]

where

\[ \epsilon \sim N(0, \sigma^2 = 4). \]

What is the expected value of \(Y\) given that \(x = 5\)? That is, what is \(\text{E}[Y \mid X = 5]\)?

`-3 + 2.5 * 5`

`## [1] 9.5`

Return to the simple linear regression model

\[ Y = -3 + 2.5x + \epsilon \]

where

\[ \epsilon \sim N(0, \sigma^2 = 4). \]

What is the standard deviation of \(Y\) when \(x\) is \(10\). That is, what is \(\text{SD}[Y \mid X = 10]\)?

`sqrt(4)`

`## [1] 2`

For this Exercise, use the built-in `trees`

dataset in `R`

. Fit a simple linear regression model with `Girth`

as the response and `Height`

as the predictor. What is the slope of the fitted regression line?

`coef(lm(Girth ~ Height, data = trees))[2]`

```
## Height
## 0.2557471
```

For this Exercise, use the built-in `trees`

dataset in `R`

. Fit a simple linear regression model with `Girth`

as the response and `Height`

as the predictor. What is the value of \(R^2\) for this fitted SLR model?

`summary(lm(Girth ~ Height, data = trees))$r.squared`

`## [1] 0.2696518`

Consider the simple linear regression model

\[ Y = 10 + 5x + \epsilon \]

where

\[ \epsilon \sim N(0, \sigma^2 = 16). \]

Calculate the probability that \(Y\) is less than 6 given that \(x = 0\).

\[ Y \mid X = 0 \sim N(\mu = 10, \sigma^2 = 16) \]

```
x = 0
mu = 10 + 5 * x
sigma = 4
pnorm(6, mean = mu, sd = sigma)
```

`## [1] 0.1586553`

Consider the simple linear regression model

\[ Y = 6 + 3x + \epsilon \]

where

\[ \epsilon \sim N(0, \sigma^2 = 9). \]

Calculate the probability that \(Y\) is greater than 1.5 given that \(x = -1\).

\[ Y \mid X = -1 \sim N(\mu = 3, \sigma^2 = 9) \]

```
x = -1
mu = 6 + 3 * x
sigma = 3
pnorm(1.5, mean = mu, sd = sigma, lower.tail = FALSE)
```

`## [1] 0.6914625`

Consider the simple linear regression model

\[ Y = 2 + -4x + \epsilon \]

where

\[ \epsilon \sim N(0, \sigma^2 = 25). \]

Calculate the probability that \(Y\) is greater than 1.5 given that \(x = 3\).

\[ Y \mid X = 3 \sim N(\mu = -10, \sigma^2 = 25) \]

```
x = 3
mu = 2 - 4 * x
sigma = 5
pnorm(1.5, mean = mu, sd = sigma, lower.tail = FALSE)
```

`## [1] 0.01072411`

For Exercises 9 - 15, use the `faithful`

dataset, which is built into `R`

.

Suppose we would like to predict the duration of an eruption of the Old Faithful geyser in Yellowstone National Park based on the waiting time before an eruption. Fit a simple linear model in `R`

that accomplishes this task.

What is the estimate of the intercept parameter?

```
faithful_model = lm(eruptions ~ waiting, data = faithful)
coef(faithful_model)[1]
```

```
## (Intercept)
## -1.874016
```

What is the estimate of the slope parameter?

```
faithful_model = lm(eruptions ~ waiting, data = faithful)
coef(faithful_model)[2]
```

```
## waiting
## 0.07562795
```

Use the fitted model to estimate the mean duration of eruptions when the waiting time is **78** minutes.

```
faithful_model = lm(eruptions ~ waiting, data = faithful)
predict(faithful_model, data.frame(waiting = 78))
```

```
## 1
## 4.024964
```

Use the fitted model to estimate the mean duration of eruptions when the waiting time is **122** minutes.

```
faithful_model = lm(eruptions ~ waiting, data = faithful)
predict(faithful_model, data.frame(waiting = 122))
```

```
## 1
## 7.352594
```

Consider making predictions of eruption duration for waiting times of 80 and 120 minutes, which is more reliable?

- 80
- 120
- Both are equally reliable

`range(faithful$waiting)`

`## [1] 43 96`

**80**- 120
- Both are equally reliable

Calculate the RSS for the fitted model.

```
faithful_model = lm(eruptions ~ waiting, data = faithful)
sum(resid(faithful_model) ^ 2)
```

`## [1] 66.56178`

What proportion of the variation in eruption duration is explained by the linear relationship with waiting time?

```
faithful_model = lm(eruptions ~ waiting, data = faithful)
summary(faithful_model)$r.squared
```

`## [1] 0.8114608`

For this Exercise, use the built-in `trees`

dataset in `R`

.

Fit a simple linear regression model with `Girth`

as the response and `Height`

as the predictor. Use this fitted model to give an estimate for the mean `Girth`

of trees that are 81 feet tall.

```
tree_model = lm(Girth ~ Height, data = trees)
predict(tree_model, data.frame(Height = 81))
```

```
## 1
## 14.52712
```

Suppose both Least Squares and Maximum Likelihood are used to fit a simple linear regression model to the same data. The estimates for the slope and the intercept will be:

- The same
- Different
- Possibly the same or different depending on the data

**The same**- Different
- Possibly the same or different depending on the data

Consider the fitted regression model:

\[ \hat{y} = -1.5 + 2.3x \]

Indicate all of the following that **must** be true:

- The difference between the \(y\) values of observations at \(x = 10\) and \(x = 9\) is \(2.3\).
- A good estimate for the mean of \(Y\) when \(x = 0\) is -1.5.
- There are observations in the dataset used to fit this regression with negative \(y\) values.

- The difference between the \(y\) values of observations at \(x = 10\) and \(x = 9\) is \(2.3\).
**A good estimate for the mean of \(Y\) when \(x = 0\) is -1.5.**- There are observations in the dataset used to fit this regression with negative \(y\) values.

Indicate all of the following that are true:

- The SLR model assumes that errors are independent.
- The SLR model allows for a larger variances for larger values of the predictor variable.
- The SLR model assumes that the response variable follows a normal distribution.
- The SLR model assumes that the relationship between the response and the predictor is linear.

**The SLR model assumes that errors are independent.**- The SLR model allows for a larger variances for larger values of the predictor variable.
- The SLR model assumes that the response variable follows a normal distribution.
**The SLR model assumes that the relationship between the response and the predictor is linear.**

Suppose you fit a simple linear regression model and obtain \(\hat{\beta}_1 = 0\). Does this mean that there is **no relationship** between the response and the predictor?

- Yes
- No
- Depends on the intercept

- Yes
**No**- Depends on the intercept

A simple linear model will only detect a linear relationship between two variables.