## R, R Markdown, LaTeX

• Questions, problems, or concerns?
• Please attempt to have R and R Markdown running by section next week (you still do not need to use R or Markdown for this week’s problem set).
• Feel free to stop by office hours, set up an individual appointment, or drop by the Stat Lab if you are having trouble.

## Summarizing random variables

• Definition of a random variable: a variable that takes on a real value that is determined by a random generative process.
• A random variable is neither random nor a variable.
• Placeholder for a quantity that has yet to be determine by a random generative process (function).
• A random variable is a function.
• A random variable’s possible values might represent the possible outcomes of a yet-to-be-performed experiment, the results of a random process such as rolling a die, or the “subjective” randomness that results from incomplete knowledge of a quantity.
• How can we summarize random variables?

### Probability Mass Functions

• Summarizes the probability of each outcome $$x$$ occuring.
• A function that describes the probability that a discrete random variable is exactly equal to some value. The PMF maps possible outcomes of a random variable to the corresponding probabilities of that outcome occuring.
• How much “stuff” is there associated with a given event?
• All probabilities add up to 1.

Example formalization

Below is the probability mass function of an unfair die, where we observe a 1, 2, or 3 with probability $$\frac{1}{12}$$, and a 4, 5, or 6 with probability $$\frac{1}{4}$$. The PMF can therefore be defined as:

$$$f(x) = \begin{cases} \frac{1}{12} & : x = 1 \\ \frac{1}{12} & : x = 2 \\ \frac{1}{12} & : x = 3 \\ \frac{1}{4} & : x = 4 \\ \frac{1}{4} & : x = 5 \\ \frac{1}{4} & : x = 6 \\ 0 & : otherwise \\ \end{cases}$$$

What is $$Pr[X \geq 2]$$?

$Pr[X \geq 2] =$ $\sum_{x = 2}^6 f(x) =$ $\frac{1}{12} + \frac{1}{12} + \frac{1}{4} + \frac{1}{4} + \frac{1}{4} =$ $\frac{11}{12}$

Example visualization

### Cumulative Distribution Functions

• Describes the distribution of all random variables, not just discrete random variables (i.e. can be discrete or continuous).
• Describes the probability that a random variable $$X$$ will take a value less than or equal to $$x$$.
• How much “stuff” is there to the left of a point?

Example visualizations

• Discrete CDF will have “jumps” or “steps.”
• Unfair die again:

• Continuous CDF is smooth. Example: standard normal distribution.

$\Phi(x) = \int_{-\infty}^{x} \frac{1}{\sqrt{2\pi}}e^\frac{-u^2}{2}du$

### Probability Density Functions

• The PDF specifies the probability of a random variable falling within a particular range of values (sand), as opposed to taking on any one value (sticks). The PDF therefore allows us to answer the question: how much of the distribution of a random variable is found in the filled area? In other words, how much probability mass is there between observations in a given range?
• The probability of a random variable falling within a particular range of values is therefore given by the integral of this variable’s PDF over that range.
• What is the absolute likelihood a continuous random variable takes on any particular value? Why?
• The probability density function is nonnegative everywhere, and its integral over the entire space is equal to one.
• The PDF is the derivative of the CDF. This is why the the slope of the CDF is greatest at the highest point of the PDF.

Example visualization