---
title: "A Brief Introduction to the Bootstrap"
author: "David Dalpiaz"
date: "STAT 3202 @ OSU"
output:
html_document:
toc: yes
df_print: paged
theme: spacelab
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(fig.align = "center")
```
**Note:** This document is currently incomplete, but provides worked examples that may be useful for completing the relevant lab.
***
# Examples
***
## Simulated Exponential Data
```{r}
set.seed(42)
some_data = rexp(n = 75, rate = 0.25)
head(some_data)
```
```{r}
hist(some_data, col = "darkgrey",
main = "Histogram of Some (Simulated) Data",
xlab = "x")
grid()
box()
```
The sample median of this data, $\hat{m}$ is
```{r}
median(some_data)
```
- Create a 90% bootstrap confidence interval for $m$, the true median. Use 20000 bootstrap samples. Also plot a histogram of the bootstrap replicates of $\hat{m}$.
```{r, solution = TRUE}
set.seed(1)
boot_med = rep(0, 20000)
for (i in seq_along(boot_med)) {
boot_samp = sample(some_data, replace = TRUE)
boot_med[i] = median(boot_samp)
}
```
```{r, solution = TRUE}
hist(boot_med, col = "darkgrey",
main = "Histogram of Boostrap Replicates of the Median",
xlab = "Sample Medians")
grid()
box()
```
```{r, solution = TRUE}
quantile(boot_med, probs = c(0.05, 0.95))
```
***
## Old Faithful Geyser Data
***
### Eruption Length
```{r, eval = FALSE}
?faithful
```
```{r}
hist(faithful$eruptions, col = "darkgrey", breaks = 20,
main = "Histogram of Eruption Lengths",
xlab = "Eruption Time (Minutes)")
grid()
box()
```
What is the probability of an eruption less than three minutes? That is, if $X$ is the eruption length in minutes, what is $P[X < 3]$?
With this data, we could estimate. We calculate $\hat{P}[X < 3]$ using
```{r}
mean(faithful$eruptions < 3)
```
- Create a 95% bootstrap confidence interval for $P[X < 3]$, the probability of an eruption lasting less than three minutes. Use 10000 bootstrap samples. Also plot a histogram of the bootstrap replicates of $\hat{P}[X < 3]$.
```{r, solution = TRUE}
set.seed(1)
boot_3min_prob = rep(0, 10000)
for (i in seq_along(boot_3min_prob)) {
boot_samp = sample(faithful$eruptions, replace = TRUE)
boot_3min_prob[i] = mean(boot_samp < 3)
}
```
```{r, solution = TRUE}
hist(boot_3min_prob, col = "darkgrey",
main = "Histogram of Boostrap Replicates of Eruption Length Probability",
xlab = "Estimate of P(X < 3)")
grid()
box()
```
```{r, solution = TRUE}
quantile(boot_3min_prob, probs = c(0.025, 0.975))
```
***
### Waiting Times
```{r}
hist(faithful$waiting, col = "darkgrey", breaks = 20,
main = "Histogram of Waiting Times",
xlab = "Waiting Time (Minutes)")
grid()
box()
```
What is the 75th [percentile](https://en.wikipedia.org/wiki/Percentile), $\hat{p}_{0.75}$ of this data? That is, what is the waiting time such that 75% of waiting times are shorter.
```{r}
quantile(faithful$waiting, probs = 0.75)
```
- Create a 99% bootstrap confidence interval for $p_{0.75}$, the true 75th percentile of waiting times. Use 10000 bootstrap samples. Also plot a histogram of the bootstrap replicates of $\hat{p}_{0.75}$.
```{r, solution = TRUE}
set.seed(1)
boot_75th = rep(0, 10000)
for (i in seq_along(boot_75th)) {
boot_samp = sample(faithful$waiting, replace = TRUE)
boot_75th[i] = quantile(boot_samp, prob = 0.75)
}
```
```{r, solution = TRUE}
hist(boot_75th, col = "darkgrey",
main = "Histogram of Boostrap Replicates of 75th Percnetile",
xlab = "Estimate of 75th Percentile", breaks = 20)
grid()
box()
```
```{r, solution = TRUE}
quantile(boot_75th, probs = c(0.005, 0.995))
```
***