--- title: 'STAT 3202: Practice 08' author: "Autumn 2018, OSU" date: '' output: html_document: theme: simplex pdf_document: default urlcolor: BrickRed --- *** ## Exercise 1 Consider a random variable $X$ that has a normal distribution with a mean of 5 and a variance of 9. Calculate $P[X > 4]$. ### Solution ```{r} 1 - pnorm(4, mean = 5, sd = 3) pnorm(4, mean = 5, sd = 3, lower.tail = FALSE) ``` *** ## Exercise 2 ```{r} # starter ``` Consider the simple linear regression model $$ Y = -3 + 2.5x + \epsilon $$ where $$ \epsilon \sim N(0, \sigma^2 = 4). $$ What is the expected value of $Y$ given that $x = 5$? That is, what is $\text{E}[Y \mid X = 5]$? ### Solution ```{r} -3 + 2.5 * 5 ``` *** ## Exercise 3 Return to the simple linear regression model $$ Y = -3 + 2.5x + \epsilon $$ where $$ \epsilon \sim N(0, \sigma^2 = 4). $$ What is the standard deviation of $Y$ when $x$ is $10$. That is, what is $\text{SD}[Y \mid X = 10]$? ### Solution ```{r} sqrt(4) ``` *** ## Exercise 4 For this Exercise, use the built-in `trees` dataset in `R`. Fit a simple linear regression model with `Girth` as the response and `Height` as the predictor. What is the slope of the fitted regression line? ### Solution ```{r} coef(lm(Girth ~ Height, data = trees))[2] ``` *** ## Exercise 5 For this Exercise, use the built-in `trees` dataset in `R`. Fit a simple linear regression model with `Girth` as the response and `Height` as the predictor. What is the value of $R^2$ for this fitted SLR model? ### Solution ```{r} summary(lm(Girth ~ Height, data = trees))$r.squared ``` *** ## Exercise 6 Consider the simple linear regression model $$ Y = 10 + 5x + \epsilon $$ where $$ \epsilon \sim N(0, \sigma^2 = 16). $$ Calculate the probability that $Y$ is less than 6 given that $x = 0$. ### Solution $$ Y \mid X = 0 \sim N(\mu = 10, \sigma^2 = 16) $$ ```{r} x = 0 mu = 10 + 5 * x sigma = 4 pnorm(6, mean = mu, sd = sigma) ``` *** ## Exercise 7 Consider the simple linear regression model $$ Y = 6 + 3x + \epsilon $$ where $$ \epsilon \sim N(0, \sigma^2 = 9). $$ Calculate the probability that $Y$ is greater than 1.5 given that $x = -1$. ### Solution $$ Y \mid X = -1 \sim N(\mu = 3, \sigma^2 = 9) $$ ```{r} x = -1 mu = 6 + 3 * x sigma = 3 pnorm(1.5, mean = mu, sd = sigma, lower.tail = FALSE) ``` *** ## Exercise 8 Consider the simple linear regression model $$ Y = 2 + -4x + \epsilon $$ where $$ \epsilon \sim N(0, \sigma^2 = 25). $$ Calculate the probability that $Y$ is greater than 1.5 given that $x = 3$. ### Solution $$ Y \mid X = 3 \sim N(\mu = -10, \sigma^2 = 25) $$ ```{r} x = 3 mu = 2 - 4 * x sigma = 5 pnorm(1.5, mean = mu, sd = sigma, lower.tail = FALSE) ``` *** ## Exercise 9 For Exercises 9 - 15, use the `faithful` dataset, which is built into `R`. Suppose we would like to predict the duration of an eruption of [the Old Faithful geyser](http://www.yellowstonepark.com/about-old-faithful/) in [Yellowstone National Park](https://en.wikipedia.org/wiki/Yellowstone_National_Park) based on the waiting time before an eruption. Fit a simple linear model in `R` that accomplishes this task. What is the estimate of the intercept parameter? ### Solution ```{r} faithful_model = lm(eruptions ~ waiting, data = faithful) coef(faithful_model)[1] ``` *** ## Exercise 10 What is the estimate of the slope parameter? ### Solution ```{r} faithful_model = lm(eruptions ~ waiting, data = faithful) coef(faithful_model)[2] ``` *** ## Exercise 11 Use the fitted model to estimate the mean duration of eruptions when the waiting time is **78** minutes. ### Solution ```{r} faithful_model = lm(eruptions ~ waiting, data = faithful) predict(faithful_model, data.frame(waiting = 78)) ``` *** ## Exercise 12 Use the fitted model to estimate the mean duration of eruptions when the waiting time is **122** minutes. ### Solution ```{r} faithful_model = lm(eruptions ~ waiting, data = faithful) predict(faithful_model, data.frame(waiting = 122)) ``` *** ## Exercise 13 Consider making predictions of eruption duration for waiting times of 80 and 120 minutes, which is more reliable? - 80 - 120 - Both are equally reliable ### Solution ```{r} range(faithful$waiting) ``` - **80** - 120 - Both are equally reliable *** ## Exercise 14 Calculate the RSS for the fitted model. ### Solution ```{r} faithful_model = lm(eruptions ~ waiting, data = faithful) sum(resid(faithful_model) ^ 2) ``` *** ## Exercise 15 What proportion of the variation in eruption duration is explained by the linear relationship with waiting time? ### Solution ```{r} faithful_model = lm(eruptions ~ waiting, data = faithful) summary(faithful_model)$r.squared ``` ## Exercise 16 For this Exercise, use the built-in `trees` dataset in `R`. Fit a simple linear regression model with `Girth` as the response and `Height` as the predictor. Use this fitted model to give an estimate for the mean `Girth` of trees that are 81 feet tall. ### Solution ```{r} tree_model = lm(Girth ~ Height, data = trees) predict(tree_model, data.frame(Height = 81)) ``` *** ## Exercise 17 Suppose both Least Squares and Maximum Likelihood are used to fit a simple linear regression model to the same data. The estimates for the slope and the intercept will be: - The same - Different - Possibly the same or different depending on the data ### Solution - **The same** - Different - Possibly the same or different depending on the data *** ## Exercise 18 Consider the fitted regression model: $$ \hat{y} = -1.5 + 2.3x $$ Indicate all of the following that **must** be true: - The difference between the $y$ values of observations at $x = 10$ and $x = 9$ is $2.3$. - A good estimate for the mean of $Y$ when $x = 0$ is -1.5. - There are observations in the dataset used to fit this regression with negative $y$ values. ### Solution - The difference between the $y$ values of observations at $x = 10$ and $x = 9$ is $2.3$. - **A good estimate for the mean of $Y$ when $x = 0$ is -1.5.** - There are observations in the dataset used to fit this regression with negative $y$ values. *** ## Exercise 19 Indicate all of the following that are true: - The SLR model assumes that errors are independent. - The SLR model allows for a larger variances for larger values of the predictor variable. - The SLR model assumes that the response variable follows a normal distribution. - The SLR model assumes that the relationship between the response and the predictor is linear. ### Solution - **The SLR model assumes that errors are independent.** - The SLR model allows for a larger variances for larger values of the predictor variable. - The SLR model assumes that the response variable follows a normal distribution. - **The SLR model assumes that the relationship between the response and the predictor is linear.** *** ## Exercise 20 Suppose you fit a simple linear regression model and obtain $\hat{\beta}_1 = 0$. Does this mean that there is **no relationship** between the response and the predictor? - Yes - No - Depends on the intercept ### Solution - Yes - **No** - Depends on the intercept A simple linear model will only detect a linear relationship between two variables.