Examples

Simulated Exponential Data

set.seed(42)
some_data = rexp(n = 75, rate = 0.25)
head(some_data)

## [1] 0.7933472 2.6435810 1.1339642 0.1527676 1.8927065 5.8545086

hist(some_data, col = "darkgrey",
     main = "Histogram of Some Data",
     xlab = "x")
box()

The sample median of this data, \(\hat{m}\) is

median(some_data)

## [1] 2.277542

Create a 90% bootstrap confidence interval for \(m\), the true median. Use 20000 bootstrap samples. Also plot a histogram of the bootstrap replicates of \(\hat{m}\).

set.seed(1)
boot_med = rep(0, 20000)
for (i in seq_along(boot_med)) {
  boot_samp = sample(some_data, replace = TRUE)
  boot_med[i] = median(boot_samp)
}

hist(boot_med, col = "darkgrey",
     main = "Histogram of Boostrap Replicates of the Median",
     xlab = "Sample Medians")
box()

quantile(boot_med, probs = c(0.05, 0.95))

##       5%      95% 
## 1.892707 2.876370

Old Faithful Geyser Data

Eruption Length

?faithful

hist(faithful$eruptions, col = "darkgrey", breaks = 20,
     main = "Histogram of Eruption Lengths",
     xlab = "Eruption Time (Minutes)")
box()

What is the probability of an eruption less than three minutes? That is, if \(X\) is the eruption length in minutes, what is \(P[X < 3]\)?

With this data, we could estimate. We calculate \(\hat{P}[X < 3]\) using

mean(faithful$eruptions < 3)

## [1] 0.3566176

Create a 95% bootstrap confidence interval for \(P[X < 5]\), the probability of an eruption lasting less than three minutes. Use 10000 bootstrap samples. Also plot a histogram of the bootstrap replicates of \(\hat{P}[X < 5]\).

set.seed(1)
boot_3min_prob = rep(0, 10000)
for (i in seq_along(boot_3min_prob)) {
  boot_samp = sample(faithful$eruptions, replace = TRUE)
  boot_3min_prob[i] = mean(boot_samp < 3)
}

hist(boot_3min_prob, col = "darkgrey",
     main = "Histogram of Boostrap Replicates of Eruption Length Probability",
     xlab = "Estimate of P(X < 3)")
box()

quantile(boot_3min_prob, probs = c(0.025, 0.975))

##      2.5%     97.5% 
## 0.3014706 0.4117647

Waiting Times

hist(faithful$waiting, col = "darkgrey", breaks = 20,
     main = "Histogram of Waiting Times",
     xlab = "Waiting Time (Minutes)")
box()

What is the 75th percentile, \(\hat{p}_{0.75}\) of this data? That is, what is the waiting time such that 75% of waiting times are shorter.

quantile(faithful$waiting, probs = 0.75)

## 75% 
##  82

Create a 99% bootstrap confidence interval for \(p_{0.75}\), the true 75th percentile of waiting times. Use 10000 bootstrap samples. Also plot a histogram of the bootstrap replicates of \(\hat{p}_{0.75}\).

set.seed(1)
boot_75th = rep(0, 10000)
for (i in seq_along(boot_75th)) {
  boot_samp = sample(faithful$waiting, replace = TRUE)
  boot_75th[i] = quantile(boot_samp, prob = 0.75)
}

hist(boot_75th, col = "darkgrey",
     main = "Histogram of Boostrap Replicates of 75th Percnetile",
     xlab = "Estimate of 75th Percentile")
box()

quantile(boot_75th, probs = c(0.005, 0.995))

##  0.5% 99.5% 
##    80    83

A Brief Introduction to the Bootstrap

David Dalpiaz

STAT 3202, Autumn 2018, OSU

Examples

Simulated Exponential Data

Old Faithful Geyser Data

Eruption Length

Waiting Times