---
title: 'STAT 3202: Lab 05, Power'
author: "Spring 2019, OSU"
date: 'Due: Friday, March 8'
output:
html_document:
theme: spacelab
toc: yes
pdf_document: default
urlcolor: BrickRed
---
***
```{r setup, include = FALSE}
knitr::opts_chunk$set(fig.align = "center")
```
**Goal:** After completing this lab, you should be able to...
- *Use* simulation to verify significance levels.
- *Use* simulation to estimate power.
In this lab we will use, but not focus on...
- `R` Markdown. This document will serve as a template. It is pre-formatted and already contains chunks that you need to complete.
Some additional notes:
- Please see [**Carmen**](https://carmen.osu.edu/) for information about submission, and grading.
- You may use [this document](lab-05-assign.Rmd) as a template. You do not need to remove directions. Chunks that require your input have a comment indicating to do so.
***
# Exercise 0 - Verifying $\alpha$, Power
The following function will be used throughout this lab:
```{r}
sim_p_value = function(mu = 0, sigma = 1, n = 25, mu_0 = 0) {
# sample from population
sample_data = rnorm(n = n, mean = mu, sd = sigma)
# calculate statistics
x_bar = mean(sample_data)
s = sd(sample_data)
t = (x_bar - mu_0) / (s / sqrt(n))
# calculate p-value
p_value = 2 * pt(-abs(t), df = n - 1)
# return p-value
p_value
}
```
This function simulates p-values for a two-sided, one sample $t$-test. That is, we assume that the data is sampled from a normal distribution, $N(\mu, \sigma^2)$, and we are performing the test:
$$
H_0: \mu = \mu_0 \text{ vs } H_1: \mu \neq \mu_0
$$
The function takes as input four arguments:
- `mu`: The true value of $\mu$ in the population
- `sigma`: The true value of $\sigma$ in the population
- `n`: The sample size
- `mu_0`: The hypothesized value of $\mu$ which we call $\mu_0$
Let's run the function and discuss what is happening.
```{r}
set.seed(1)
sim_p_value(mu = 10, sigma = 3, n = 25, mu_0 = 13)
```
First, internally the function samples 25 observations from a normal distribution with a mean of 10 and a standard deviation of 3. Then, the function calculates and returns the p-value for testing
$$
H_0: \mu = 13 \text{ vs } H_1: \mu \neq 13
$$
Note that this is a "small" p-value, so with a significance level of say, $\alpha = 0.05$ we would reject $H_0$. (Which makes sense, the true value of $\mu$ is not 13!)
Recall that $\alpha$, the significance level is the probably of rejecting the null hypothesis when it is true.
$$
\alpha = P(\text{Reject } H_0 \mid H_0 \text{ True})
$$
Lets verify the significance level of the test
$$
H_0: \mu = 0 \text{ vs } H_1: \mu \neq 0
$$
The true distribution will be normal with a mean of 0 and a standard deviation of 1. We will use a sample size of 25. We will simulate this test 25000 times and use the simulated p-values to estimate the true $\alpha$ of the test.
```{r}
set.seed(42)
p_values = replicate(25000, sim_p_value(mu = 0, sigma = 1, n = 25, mu_0 = 0))
head(p_values)
```
Here we use the `replicate()` function to repeatedly run the `sim_p_value()` function. This avoids needing to write a `for` loop.
Notice that these p-values appear roughly uniform.
```{r}
hist(p_values, probability = TRUE, col = "darkgrey",
main = "Histogram of P-Values, Null True", xlab = "p-values")
box()
grid()
```
Suppose we were to use $\alpha = 0.05$. Well, then we should expect that roughly 5% of these p-values are below 0.05.
```{r}
mean(p_values < 0.05)
```
Pretty close! If we modified the $\alpha$ value, we would still obtain the right value. (The big take-away here is that **p-values are uniformly distributed _when the null is true_.**)
```{r}
mean(p_values < 0.10)
mean(p_values < 0.05)
mean(p_values < 0.01)
```
What if the null hypothesis is **not** true. Instead, some alternative is true. What happens to the distribution of p-values?
Lets calculate the [**power**](https://www.youtube.com/watch?v=ygBP7MtT3Ac) of the test
$$
H_0: \mu = 0 \text{ vs } H_1: \mu \neq 0
$$
```{r}
set.seed(42)
p_values_power = replicate(25000, sim_p_value(mu = 0.5, sigma = 1, n = 25, mu_0 = 0))
```
Let's keep everything the same as before, except now the null is **not** true because the true mean is 0.5.
```{r}
hist(p_values_power, probability = TRUE, col = "darkgrey",
main = "Histogram of P-Values, Alternative True", xlab = "p-values")
box()
grid()
```
This is no longer uniform! Now to calculate the power when $\alpha = 0.05$. Recall that power is given by,
$$
\text{Power} = P(\text{Reject } H_0 \mid H_0 \text{ False})
$$
Then in `R` we simply note how often we reject.
```{r}
mean(p_values_power < 0.05)
```
This means that even though the null is not true, the mean is 0.5 instead of 0, using a sample of size 25 and $\alpha = 0.05$, we will only reject the null about 67% of the time. (We'd like this to be 100%, which isn't possible, so we'd like this to be as high as possible.)
What causes a test to have low or high power? The rest of this lab, you will investigate. Things we will consider:
- Sample size
- Variance of population
- Effect size (How big a difference in mean are we looking for. In other words, which alternative is true. There are a lot of ways for the null hypotheses to be false.)
- Significance level
Some things we won't look at here, but you should think about:
- One versus two-sided test. (One sided are more powerful.)
- Balance in design. (In a two-sample test, we like the samples from the two populations to be as close to equal as possible)
***
# Exercise 1 - Power, Sample Size
Consider samples of size
- $n = 20$
- $n = 40$
- $n = 60$
from a normal distribution with
- $\mu = 3$
- $\sigma = 2.5$
For each $n$, use 25000 simulations to estimate the power of the test
$$
H_0: \mu = 4 \text{ vs } H_1: \mu \neq 4
$$
when $\alpha = 0.05$.
```{r}
set.seed(42)
# perform simulations here
# store results of using replicate() with sim_p_value() three times
# (once for each n)
```
```{r}
# calculate power for each n here
# hint: use mean() three times
```
Based on these results, how does sample size affect power?
- YOUR ANSWER HERE.
***
# Exercise 2 - Power, Population Variance
Consider a sample of size
- $n = 25$
from a normal distribution with mean
- $\mu = 3$
and standard deviations
- $\sigma = 2$
- $\sigma = 3$
- $\sigma = 4$
For each $\sigma$, use 25000 simulations to estimate the power of the test
$$
H_0: \mu = 4 \text{ vs } H_1: \mu \neq 4
$$
when $\alpha = 0.05$.
```{r}
set.seed(42)
# perform simulations here
# store results of using replicate() with sim_p_value() three times
# (once for each sigma)
```
```{r}
# calculate power for each sigma here
# hint: use mean() three times
```
Based on these results, how does population variance affect power?
- YOUR ANSWER HERE.
***
# Exercise 3 - Power, Effect Size
Consider a sample of size
- $n = 20$
from a normal distributions with means
- $\mu = 5$
- $\mu = 6$
- $\mu = 7$
and standard deviation
- $\sigma = 3.5$
For each $\sigma$, use 25000 simulations to estimate the power of the test
$$
H_0: \mu = 4 \text{ vs } H_1: \mu \neq 4
$$
when $\alpha = 0.05$.
```{r}
set.seed(42)
# perform simulations here
# store results of using replicate() with sim_p_value() three times
# (once for each mu)
```
```{r}
# calculate power for each mu here
# hint: use mean() three times
```
Based on these results, how does effect size affect power?
- YOUR ANSWER HERE.
***
# Exercise 4 - Power, Significance Level
Consider a sample of size
- $n = 15$
from a normal distributions with mean
- $\mu = 0$
and standard deviation
- $\sigma = 1$
For each $\sigma$, use 25000 simulations to estimate the power of the test
$$
H_0: \mu = 0.5 \text{ vs } H_1: \mu \neq 0.5
$$
when
- $\alpha = 0.01$
- $\alpha = 0.05$
- $\alpha = 0.10$
```{r}
set.seed(42)
# perform simulations here
# store results of using replicate() with sim_p_value() ONCE
# (use the same simulations for each alpha)
```
```{r}
# calculate power for each alpha here
# hint: use mean() three times
```
Based on these results, how does the significance level affect power?
- YOUR ANSWER HERE.
***