Chapter 6 Probability in R

6.1 Distributions

When working with different statistical distributions, we often want to make probabilistic statements based on the distribution.

We typically want to know one of four things:

  • The density (pdf) at a particular value.
  • The distribution (cdf) at a particular value.
  • The quantile value corresponding to a particular probability.
  • A random draw of values from a particular distribution.

This used to be done with statistical tables printed in the back of textbooks. Now, R has functions for obtaining density, distribution, quantile and random values.

The general naming structure of the relevant R functions is:

  • dname calculates density (pdf) at input x.
  • pname calculates distribution (cdf) at input x.
  • qname calculates the quantile at an input probability.
  • rname generates a random draw from a particular distribution.

Note that name represents the name of the given distribution.

For example, consider a random variable \(X\) which is \(N(\mu = 2, \sigma^2 = 25)\). (Note, we are parameterizing using the variance \(\sigma^2\). R however uses the standard deviation.)

To calculate the value of the pdf at x = 3, that is, the height of the curve at x = 3, use:

dnorm(x = 3, mean = 2, sd = 5)
## [1] 0.07820854

To calculate the value of the cdf at x = 3, that is, \(P(X \leq 3)\), the probability that \(X\) is less than or equal to 3, use:

pnorm(q = 3, mean = 2, sd = 5)
## [1] 0.5792597

Or, to calculate the quantile for probability 0.975, use:

qnorm(p = 0.975, mean = 2, sd = 5)
## [1] 11.79982

Lastly, to generate a random sample of size n = 10, use:

rnorm(n = 10, mean = 2, sd = 5)
##  [1] -2.23359397  4.51241220 -1.66480609  4.21525677 -4.53867175
##  [6]  3.35671337 -0.05086709 -0.49121308  8.93156987  0.68203843

These functions exist for many other distributions, including but not limited to:

Command Distribution
*binom Binomial
*t t
*pois Poisson
*f F
*chisq Chi-Squared

Where * can be d, p, q, and r. Each distribution will have its own set of parameters which need to be passed to the functions as arguments. For example, dbinom() would not have arguments for mean and sd, since those are not parameters of the distribution. Instead a binomial distribution is usually parameterized by \(n\) and \(p\), however R chooses to call them something else. To find the names that R uses we would use ?dbinom and see that R instead calls the arguments size and prob. For example:

dbinom(x = 6, size = 10, prob = 0.75)
## [1] 0.145998

Also note that, when using the dname functions with discrete distributions, they are the pmf of the distribution. For example, the above command is \(P(Y = 6)\) if \(Y \sim b(n = 10, p = 0.75)\). (The probability of flipping an unfair coin 10 times and seeing 6 heads, if the probability of heads is 0.75.)