Chapter 6 Probability in R
6.1 Distributions
When working with different statistical distributions, we often want to make probabilistic statements based on the distribution.
We typically want to know one of four things:
- The density (pdf) at a particular value.
- The distribution (cdf) at a particular value.
- The quantile value corresponding to a particular probability.
- A random draw of values from a particular distribution.
This used to be done with statistical tables printed in the back of textbooks. Now, R
has functions for obtaining density, distribution, quantile and random values.
The general naming structure of the relevant R
functions is:
dname
calculates density (pdf) at inputx
.pname
calculates distribution (cdf) at inputx
.qname
calculates the quantile at an input probability.rname
generates a random draw from a particular distribution.
Note that name
represents the name of the given distribution.
For example, consider a random variable \(X\) which is \(N(\mu = 2, \sigma^2 = 25)\). (Note, we are parameterizing using the variance \(\sigma^2\). R
however uses the standard deviation.)
To calculate the value of the pdf at x = 3
, that is, the height of the curve at x = 3
, use:
dnorm(x = 3, mean = 2, sd = 5)
## [1] 0.07820854
To calculate the value of the cdf at x = 3
, that is, \(P(X \leq 3)\), the probability that \(X\) is less than or equal to 3
, use:
pnorm(q = 3, mean = 2, sd = 5)
## [1] 0.5792597
Or, to calculate the quantile for probability 0.975, use:
qnorm(p = 0.975, mean = 2, sd = 5)
## [1] 11.79982
Lastly, to generate a random sample of size n = 10
, use:
rnorm(n = 10, mean = 2, sd = 5)
## [1] -2.23359397 4.51241220 -1.66480609 4.21525677 -4.53867175
## [6] 3.35671337 -0.05086709 -0.49121308 8.93156987 0.68203843
These functions exist for many other distributions, including but not limited to:
Command | Distribution |
---|---|
*binom |
Binomial |
*t |
t |
*pois |
Poisson |
*f |
F |
*chisq |
Chi-Squared |
Where *
can be d
, p
, q
, and r
. Each distribution will have its own set of parameters which need to be passed to the functions as arguments. For example, dbinom()
would not have arguments for mean
and sd
, since those are not parameters of the distribution. Instead a binomial distribution is usually parameterized by \(n\) and \(p\), however R
chooses to call them something else. To find the names that R
uses we would use ?dbinom
and see that R
instead calls the arguments size
and prob
. For example:
dbinom(x = 6, size = 10, prob = 0.75)
## [1] 0.145998
Also note that, when using the dname
functions with discrete distributions, they are the pmf of the distribution. For example, the above command is \(P(Y = 6)\) if \(Y \sim b(n = 10, p = 0.75)\). (The probability of flipping an unfair coin 10
times and seeing 6
heads, if the probability of heads is 0.75
.)