--- title: "A Brief Introduction to the Bootstrap" author: "David Dalpiaz" date: "STAT 3202, Autumn 2018, OSU" output: html_document: toc: yes df_print: paged theme: spacelab --- ```{r setup, include = FALSE} knitr::opts_chunk$set(fig.align = "center") ``` **Note:** This document is currently incomplete. *** # Examples *** ## Simulated Exponential Data ```{r} set.seed(42) some_data = rexp(n = 75, rate = 0.25) head(some_data) ``` ```{r} hist(some_data, col = "darkgrey", main = "Histogram of Some Data", xlab = "x") box() ``` The sample median of this data, $\hat{m}$ is ```{r} median(some_data) ``` - Create a 90% bootstrap confidence interval for $m$, the true median. Use 20000 bootstrap samples. Also plot a histogram of the bootstrap replicates of $\hat{m}$. ```{r, solution = TRUE} set.seed(1) boot_med = rep(0, 20000) for (i in seq_along(boot_med)) { boot_samp = sample(some_data, replace = TRUE) boot_med[i] = median(boot_samp) } ``` ```{r, solution = TRUE} hist(boot_med, col = "darkgrey", main = "Histogram of Boostrap Replicates of the Median", xlab = "Sample Medians") box() ``` ```{r, solution = TRUE} quantile(boot_med, probs = c(0.05, 0.95)) ``` *** ## Old Faithful Geyser Data *** ### Eruption Length ```{r, eval = FALSE} ?faithful ``` ```{r} hist(faithful$eruptions, col = "darkgrey", breaks = 20, main = "Histogram of Eruption Lengths", xlab = "Eruption Time (Minutes)") box() ``` What is the probability of an eruption less than three minutes? That is, if $X$ is the eruption length in minutes, what is $P[X < 3]$? With this data, we could estimate. We calculate $\hat{P}[X < 3]$ using ```{r} mean(faithful$eruptions < 3) ``` - Create a 95% bootstrap confidence interval for $P[X < 5]$, the probability of an eruption lasting less than three minutes. Use 10000 bootstrap samples. Also plot a histogram of the bootstrap replicates of $\hat{P}[X < 5]$. ```{r, solution = TRUE} set.seed(1) boot_3min_prob = rep(0, 10000) for (i in seq_along(boot_3min_prob)) { boot_samp = sample(faithful$eruptions, replace = TRUE) boot_3min_prob[i] = mean(boot_samp < 3) } ``` ```{r, solution = TRUE} hist(boot_3min_prob, col = "darkgrey", main = "Histogram of Boostrap Replicates of Eruption Length Probability", xlab = "Estimate of P(X < 3)") box() ``` ```{r, solution = TRUE} quantile(boot_3min_prob, probs = c(0.025, 0.975)) ``` *** ### Waiting Times ```{r} hist(faithful$waiting, col = "darkgrey", breaks = 20, main = "Histogram of Waiting Times", xlab = "Waiting Time (Minutes)") box() ``` What is the 75th [percentile](https://en.wikipedia.org/wiki/Percentile), $\hat{p}_{0.75}$ of this data? That is, what is the waiting time such that 75% of waiting times are shorter. ```{r} quantile(faithful$waiting, probs = 0.75) ``` - Create a 99% bootstrap confidence interval for $p_{0.75}$, the true 75th percentile of waiting times. Use 10000 bootstrap samples. Also plot a histogram of the bootstrap replicates of $\hat{p}_{0.75}$. ```{r, solution = TRUE} set.seed(1) boot_75th = rep(0, 10000) for (i in seq_along(boot_75th)) { boot_samp = sample(faithful$waiting, replace = TRUE) boot_75th[i] = quantile(boot_samp, prob = 0.75) } ``` ```{r, solution = TRUE} hist(boot_75th, col = "darkgrey", main = "Histogram of Boostrap Replicates of 75th Percnetile", xlab = "Estimate of 75th Percentile") box() ``` ```{r, solution = TRUE} quantile(boot_75th, probs = c(0.005, 0.995)) ``` ***