Goal: After completing this lab, you should be able to…

In this lab we will use, but not focus on…

Some additional notes:


Exercise 1 - 2019 Ohio State Basketball

library(tidyverse)

For this lab we will use some elements of the tidyverse as a preview for a lab to come which will focus on using the tidyverse. (If you do not have the tidyverse package installed, you will need to do so. Note that the tidyverse package is actually a collection of other packages.)

# load data
osu_bb_2019_games = read_csv("https://daviddalpiaz.github.io/stat3202-sp19/data/osu-bb-2019-games.csv")
osu_bb_2019_games

For this exercise we will use data on the OSU Men’s Basketball games from the 2018 - 2019 season, excluding any games in the soon to be played 2019 NCAA Tournament where OSU is an 11 seed. While an 11 seed isn’t great, have a look at this video by Jon Bois which explains some of the weirdness around certain seeds in the tournament.

In particular we’ll investigate the personal fouls given to OSU compared to their opponents. Specifically we will look at the difference between the number of personal fouls obtained by OSU compared to their opponent in each game. That is, we have “paired” data. (So we will investigate data on the differences.)

# create difference data as a seperate vector
osu_bb_2019_games %>% mutate(pf_diff = PF - OPPPF) %>% 
  select(pf_diff) %>% unlist() %>% unname() -> pf_diff
head(pf_diff)
## [1]   3  -3  -1 -12 -11   5

For example, in the fifth game of the season, OSU had 11 fewer personal fouls than their opponent, Samford.

Suppose we are interested in testing:

There are a number of ways we could go about testing this. (Although with different or more specific null and alternative hypotheses.)

We could consider a t-test:

t.test(pf_diff, alternative = "less")
## 
##  One Sample t-test
## 
## data:  pf_diff
## t = -0.70983, df = 32, p-value = 0.2415
## alternative hypothesis: true mean is less than 0
## 95 percent confidence interval:
##      -Inf 1.008247
## sample estimates:
##  mean of x 
## -0.7272727

Or, we could consider a Wilcoxon signed rank test:

wilcox.test(pf_diff, alternative = "less")
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  pf_diff
## V = 244.5, p-value = 0.3608
## alternative hypothesis: true location is less than 0

We could also consider a sign test:

binom.test(x = sum(pf_diff > 0), n = length(pf_diff), p = 0.5, alternative = "less")
## 
##  Exact binomial test
## 
## data:  sum(pf_diff > 0) and length(pf_diff)
## number of successes = 14, number of trials = 33, p-value = 0.2434
## alternative hypothesis: true probability of success is less than 0.5
## 95 percent confidence interval:
##  0.0000000 0.5814382
## sample estimates:
## probability of success 
##              0.4242424

But maybe none of these seem right to us.

qplot(pf_diff, binwidth = 3)