---
title: "A Brief Introduction to the Bootstrap"
author: "David Dalpiaz"
date: "STAT 3202, Autumn 2018, OSU"
output: 
  html_document:
    toc: yes
    df_print: paged
    theme: spacelab
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(fig.align = "center")
```

**Note:** This document is currently incomplete.

***

# Examples

***

## Simulated Exponential Data

```{r}
set.seed(42)
some_data = rexp(n = 75, rate = 0.25)
head(some_data)
```

```{r}
hist(some_data, col = "darkgrey",
     main = "Histogram of Some Data",
     xlab = "x")
box()
```

The sample median of this data, $\hat{m}$ is

```{r}
median(some_data)
```


- Create a 90% bootstrap confidence interval for $m$, the true median. Use 20000 bootstrap samples. Also plot a histogram of the bootstrap replicates of $\hat{m}$.

```{r, solution = TRUE}
set.seed(1)
boot_med = rep(0, 20000)
for (i in seq_along(boot_med)) {
  boot_samp = sample(some_data, replace = TRUE)
  boot_med[i] = median(boot_samp)
}
```

```{r, solution = TRUE}
hist(boot_med, col = "darkgrey",
     main = "Histogram of Boostrap Replicates of the Median",
     xlab = "Sample Medians")
box()
```

```{r, solution = TRUE}
quantile(boot_med, probs = c(0.05, 0.95))
```

***

## Old Faithful Geyser Data

***

### Eruption Length

```{r, eval = FALSE}
?faithful
```

```{r}
hist(faithful$eruptions, col = "darkgrey", breaks = 20,
     main = "Histogram of Eruption Lengths",
     xlab = "Eruption Time (Minutes)")
box()
```

What is the probability of an eruption less than three minutes? That is, if $X$ is the eruption length in minutes, what is $P[X < 3]$?

With this data, we could estimate. We calculate $\hat{P}[X < 3]$ using

```{r}
mean(faithful$eruptions < 3)
```

- Create a 95% bootstrap confidence interval for $P[X < 5]$, the probability of an eruption lasting less than three minutes. Use 10000 bootstrap samples. Also plot a histogram of the bootstrap replicates of $\hat{P}[X < 5]$.

```{r, solution = TRUE}
set.seed(1)
boot_3min_prob = rep(0, 10000)
for (i in seq_along(boot_3min_prob)) {
  boot_samp = sample(faithful$eruptions, replace = TRUE)
  boot_3min_prob[i] = mean(boot_samp < 3)
}
```

```{r, solution = TRUE}
hist(boot_3min_prob, col = "darkgrey",
     main = "Histogram of Boostrap Replicates of Eruption Length Probability",
     xlab = "Estimate of P(X < 3)")
box()
```

```{r, solution = TRUE}
quantile(boot_3min_prob, probs = c(0.025, 0.975))
```

***

### Waiting Times

```{r}
hist(faithful$waiting, col = "darkgrey", breaks = 20,
     main = "Histogram of Waiting Times",
     xlab = "Waiting Time (Minutes)")
box()
```

What is the 75th [percentile](https://en.wikipedia.org/wiki/Percentile), $\hat{p}_{0.75}$ of this data? That is, what is the waiting time such that 75% of waiting times are shorter.

```{r}
quantile(faithful$waiting, probs = 0.75)
```

- Create a 99% bootstrap confidence interval for $p_{0.75}$, the true 75th percentile of waiting times. Use 10000 bootstrap samples. Also plot a histogram of the bootstrap replicates of $\hat{p}_{0.75}$.

```{r, solution = TRUE}
set.seed(1)
boot_75th = rep(0, 10000)
for (i in seq_along(boot_75th)) {
  boot_samp = sample(faithful$waiting, replace = TRUE)
  boot_75th[i] = quantile(boot_samp, prob = 0.75)
}
```

```{r, solution = TRUE}
hist(boot_75th, col = "darkgrey",
     main = "Histogram of Boostrap Replicates of 75th Percnetile",
     xlab = "Estimate of 75th Percentile")
box()
```

```{r, solution = TRUE}
quantile(boot_75th, probs = c(0.005, 0.995))
```

***