**Goal:** After completing this lab, you should be able to…

*Use*permutation tests

In this lab we will use, but not focus on…

`R`

Markdown. This document will serve as a template. It is pre-formatted and already contains chunks that you need to complete.

Some additional notes:

- Please see
**Carmen**for information about submission, and grading. - You may use this document as a template. You do not need to remove directions. Chunks that require your input have a comment indicating to do so.
- Some code from this set of practice problems may be of use. In particular, the code seen in the solutions.

`library(tidyverse)`

For this lab we will use some elements of the `tidyverse`

as a preview for a lab to come which will focus on using the `tidyverse`

. (If you do not have the `tidyverse`

package installed, you will need to do so. Note that the `tidyverse`

package is actually a collection of other packages.)

```
# load data
osu_bb_2019_games = read_csv("https://daviddalpiaz.github.io/stat3202-sp19/data/osu-bb-2019-games.csv")
osu_bb_2019_games
```

For this exercise we will use data on the OSU Men’s Basketball games from the 2018 - 2019 season, excluding any games in the soon to be played 2019 NCAA Tournament where OSU is an 11 seed. While an 11 seed isn’t great, have a look at this video by Jon Bois which explains some of the weirdness around certain seeds in the tournament.

In particular we’ll investigate the personal fouls given to OSU compared to their opponents. Specifically we will look at the difference between the number of personal fouls obtained by OSU compared to their opponent *in each game*. That is, we have “paired” data. (So we will investigate data on the differences.)

```
# create difference data as a seperate vector
osu_bb_2019_games %>% mutate(pf_diff = PF - OPPPF) %>%
select(pf_diff) %>% unlist() %>% unname() -> pf_diff
head(pf_diff)
```

`## [1] 3 -3 -1 -12 -11 5`

For example, in the fifth game of the season, OSU had 11 fewer personal fouls than their opponent, Samford.

Suppose we are interested in testing:

- \(H_0\): There is no difference between the distribution of fouls obtained by OSU and their opponents.
- \(H_A\): OSU is given fewer fouls than their opponents. Specifically, the distribution of fouls for OSU is shifted lower (“to the left”) compared to their opponents, which makes this a one-sided “less-than” alternative. (Which might lead us to believe the referees are favoring OSU.
*But this analysis is far too simple to draw that conclusion.*)

There are a number of ways we could go about testing this. (Although with different or more specific null and alternative hypotheses.)

We could consider a t-test:

`t.test(pf_diff, alternative = "less")`

```
##
## One Sample t-test
##
## data: pf_diff
## t = -0.70983, df = 32, p-value = 0.2415
## alternative hypothesis: true mean is less than 0
## 95 percent confidence interval:
## -Inf 1.008247
## sample estimates:
## mean of x
## -0.7272727
```

Or, we could consider a Wilcoxon signed rank test:

`wilcox.test(pf_diff, alternative = "less")`

```
##
## Wilcoxon signed rank test with continuity correction
##
## data: pf_diff
## V = 244.5, p-value = 0.3608
## alternative hypothesis: true location is less than 0
```

We could also consider a sign test:

`binom.test(x = sum(pf_diff > 0), n = length(pf_diff), p = 0.5, alternative = "less")`

```
##
## Exact binomial test
##
## data: sum(pf_diff > 0) and length(pf_diff)
## number of successes = 14, number of trials = 33, p-value = 0.2434
## alternative hypothesis: true probability of success is less than 0.5
## 95 percent confidence interval:
## 0.0000000 0.5814382
## sample estimates:
## probability of success
## 0.4242424
```

But maybe none of these seem right to us.

- Perhaps we don’t believe the normal assumption required to perform the t-test. (Although, we could probably use a large sample \(z\) procedure here, but again, that’s an assumption we’d have to make.)
- Perhaps we don’t understand the sort of weird assumptions of the Wilcoxon test.
- Perhaps we understand that the sign test generally has low power.

`qplot(pf_diff, binwidth = 3)`