R, R Markdown, LaTeX

Summarizing random variables

Probability Mass Functions

  • Summarizes the probability of each outcome \(x\) occuring.
  • A function that describes the probability that a discrete random variable is exactly equal to some value. The PMF maps possible outcomes of a random variable to the corresponding probabilities of that outcome occuring.
  • How much “stuff” is there associated with a given event?
  • All probabilities add up to 1.

Example formalization

Below is the probability mass function of an unfair die, where we observe a 1, 2, or 3 with probability \(\frac{1}{12}\), and a 4, 5, or 6 with probability \(\frac{1}{4}\). The PMF can therefore be defined as:

\[\begin{equation} f(x) = \begin{cases} \frac{1}{12} & : x = 1 \\ \frac{1}{12} & : x = 2 \\ \frac{1}{12} & : x = 3 \\ \frac{1}{4} & : x = 4 \\ \frac{1}{4} & : x = 5 \\ \frac{1}{4} & : x = 6 \\ 0 & : otherwise \\ \end{cases} \end{equation}\]

What is \(Pr[X \geq 2]\)?

\[Pr[X \geq 2] = \] \[\sum_{x = 2}^6 f(x) = \] \[\frac{1}{12} + \frac{1}{12} + \frac{1}{4} + \frac{1}{4} + \frac{1}{4} = \] \[\frac{11}{12}\]

Example visualization

Cumulative Distribution Functions

  • Describes the distribution of all random variables, not just discrete random variables (i.e. can be discrete or continuous).
  • Describes the probability that a random variable \(X\) will take a value less than or equal to \(x\).
  • How much “stuff” is there to the left of a point?

Example visualizations

  • Discrete CDF will have “jumps” or “steps.”
  • Unfair die again:

  • Continuous CDF is smooth. Example: standard normal distribution.

\[ \Phi(x) = \int_{-\infty}^{x} \frac{1}{\sqrt{2\pi}}e^\frac{-u^2}{2}du \]

Probability Density Functions

  • The PDF specifies the probability of a random variable falling within a particular range of values (sand), as opposed to taking on any one value (sticks). The PDF therefore allows us to answer the question: how much of the distribution of a random variable is found in the filled area? In other words, how much probability mass is there between observations in a given range?
  • The probability of a random variable falling within a particular range of values is therefore given by the integral of this variable’s PDF over that range.
  • What is the absolute likelihood a continuous random variable takes on any particular value? Why?
  • The probability density function is nonnegative everywhere, and its integral over the entire space is equal to one.
  • The PDF is the derivative of the CDF. This is why the the slope of the CDF is greatest at the highest point of the PDF.

Example visualization