Summary of Probability

Axiom, condition, IID, DRV, distribution...

Posted by Hao Xu on September 21, 2018

1. Axioms of Probability

Definition

The set of all possible outcomes of an experiment is known as the sample space of the experiment, denoted by $S$.

Definition

Any subset $E$ of the sample space is known as an event.

Definition

If $E_2, E_2, …$ are events.

The union of these events, denoted by $\bigcup_{n=1f}^\infty E_n$, is defined to be that event which consists of all outcomes that are in $E_n$ for at least one value of $n=1, 2, …$.

The intersection of the events $E_n$, denoted by $\bigcap _{n=1}^\infty E_n$, is defined to be the event consisting of those outcomes which are in all of the events $E_n, n=1, 2, …$.

Definition

The complement of $E$,denoted by $E^c$, consists of all outcomes in the sample space $S$ that are not in $E$.

  • $E^c$ occurs iff $E$ does not occur.
  • $E \bigcup E^c = S$
  • $S^c = \emptyset$

Theorem The DeMorgan’s Laws

\[(\bigcup_{i=1}^n E_i)^c = \bigcap_{i=1}^n E_i^c\] \[(\bigcap_{i=1}^n E_i)^c = \bigcap^n_{i=1} E_i^c\]

Definition

​ The probability of the event E is defined as \(P(E) = lim_{n \rightarrow \infty} \cfrac{n(E)}{n}\) For each event $E$ of the sample space $S$, we assume that a number $P(E)$ is defined and satisfies the following three axioms:

Axiom 1

\[0 \leq P(E) \leq 1\]

Axiom 2

\[P(S) = 1\]

Axiom 3

for any sequence of mutually exclusive events $E_1, E_2, …$

\[P(\bigcup_{i=1}^\infty E_i) = \sum_{i=1}^\infty P(E_i)\]

Proposition

\[P(E^c) = 1 - P(E)\]

Proposition

If $E \subset F$, then $P(E) <= P(F)$.

Proposition

\[P(E \bigcup F) = P(E) + P(F) - P(EF)\]

proposition

\[\begin{eqnarray*} P(E_1 \cup E_2 \cup ... \cup E_n) = &\sum_{i=1}^n& P(E_i) - \sum_{i_1<i_2} P(E_{i_1}E_{i_2}) + (-1)^{r+1} \\ &+&...+ \sum_{i_1<i_2<...<i_r}P(E_{i_1}E_{i_2}...E_{i_r}) \\ &+& ...+ (-1)^{n+1}P(E_{i_1}E_{i_2}...E_{i_n}) \end{eqnarray*}\]

2. Conditional Probability

Definition

The conditional probability that $E$ occurs given that $F$ has occurred is denoted by $P(E|F)$. If $P(F)>0$, then

\[P(E|F)= \cfrac{P(EF)}{P(F)}\]

Theorem The multiplication rule

\[P(E_1E_2E_3...E_n) = P(E_1)P(E_2|E_1)P(E_3|E_1E_2)...P(E_n|E_1...E_{n-1})\]

Proposition

Let $E$ and $F$ be events. We can express $E$ as

\[E = EF \ \cup \ EF^c\]

. By Axiom 3 we have

\[P(E) = P(E|F)P(F)\ +\ P(E|F^c)[1\ -\ P(F)]\]

Definition

The odds of an event $A$ are defined by

\[\cfrac{P(A)}{P(A^c)} = \cfrac{P(A)}{1\ -\ P(A)}\]

This tells how much more likely it is that the event $A$ occurs than it is that does not occur. If the odds are equal to $\alpha$, then it is common to say that the odds are “$\alpha$ to 1” in favor of the hypothesis.

The new odds after the evidence $E$ are

\[\cfrac{P(H|E)}{P(H^c|E)} = \cfrac{P(H)}{P(H^c)} \cfrac{P(E|H)}{P(E|H^c)}\]

Theorem Bayes’s formula

\[P(F_j|E) = \cfrac{P(EF_j)}{P(E)}= \cfrac{P(E|F_j)P(F_j)}{\sum_{i=1}^n P(E|F_i)P(F_i)}\]

Bayes’s formula shows us how to use new evidence to modify existing opinions

  • $P(F_j|E)$: the likelihood of event $F_j$ occurring given that $E$ is true.
  • $P(E|F_j)$:

3. Independence

Definition

Two events $E$ and $F$ are said to be independent if the following equation holds.

\[P(EF) = P(E)P(F)\]

Two events $E$ and $F$ that are not independent are said to be dependent.

Proposition

If $E$ and $F$ are independent, then so are $E$ and $F^c$.

Definition

Three events $E_1, E_2, …, E_n$ are said to be independent if, for every subset $E_{1’}, E_{2’}, …, E_{r’}, r<=n$, of these events,

\[\begin{eqnarray*} P(E_{1'}, E_{2'}, ..., E_{r'}) &=& P(E_{1'})P(E_{2'})...P(E_{r'}) \\ \end{eqnarray*}\]

Proposition

Conditional probabilities satisfy all of the properties of ordinary probabilities.

(a) $$0 \leq P(E F) \leq 1$$
(b) $$P(S F) = 1$$

(c) If $ E_i,\ i=1, 2, …$, are mutually exclusive events, then

\[P(\bigcup_1^\infty E_i|F) = \sum_1^\infty P(E_i|F)\]

4. Discrete Random Variables

Definition

Definition A random variable X is a function from the sample space S to the set of real numbers R:

\[X: S \rightarrow R\]

Definition

For a discrete random variable $X$, we define the probability mass function $p(a)$ of $X$ by

\[p(a) = P(X=a)\]

$X$ must take on one of the value $x_i$ for $i=1, 2, …$, and we have

\[\begin{eqnarray*} &p&(x_i) \geq 0 \quad \quad \ \ \ for \ i=1, 2, ...\\ &p&(x)=0 \qquad \quad for \ all \ other \ values \ of \ x \\ &\sum_{i=1}^\infty& p(x_i) = 1 \end{eqnarray*}\]

Definition

If $X$ is a discrete random variable having a probability mass function $p(x)$, then the expectation, or the expected value, of $X$, denoted by $E[X]$, is defined by

\[E[X] = \sum_{x:p(x)>0}xp(x)\]

$E[X]$ is also referred to as the mean or the first moment of $X$. The quantity $E[Xn], n \geq 1$, is called the nth moment of X.

Proposition

the sample space S—is either finite or countably infinite. For a random variable X, let X(s) denote the value of X when s ∈ S is the outcome of the experiment.

The expected value of a sum of random variables is equal to the sum of their expectations. X be a random variable,X(s) is the value of X \(E[X] = \sum_{s \in S} X(s) p(s)\)

Definition

We say that $I$ is an indicator variable for the event $A$ if

\[I=\begin {cases} 1, & if\ A\ occurs \\ 0, & if\ A^c\ occurs \end {cases}\]

, and we have $E[I] = P(A)$

Proposition

If $X$ is a discrete random variable that takes on one of the values $x_i, i \geq 1$, with respective probabilities $p(x_i)$, then, for any real-valued function $g$,

\[E[g(X)]=\sum_i g(x_i)p(x_i)\]

Corollary

If $a$ and $b$ are constants, then

\[E[aX + b] = aE[X] + b\]

Definition

If $X$ is a random variable with mean $\mu$, then the variance of $X$, denoted by $Var(X)$, is defined by

\[Var(X) = E[(X − \mu)^2]\]

An alternative formula for $Var(X)$ is derived as follows:

\[Var(X) = E[X^2] − (E[X])^2\]

Corollary

For any constants $a$ and $b$

\[Var(aX + b) = a^2Var(X)\]

Definition 4.5

The square root of the $Var(X)$ is called the standard deviation of $X$, and we denote it by $SD(X)$. That is,

\[SD(X) = \sqrt{Var(X)}\]

Remarks

Analogous to the means being the center of gravity of a distribution of mass, the variance represents, in the terminology of mechanics, the moment of inertia.

5. The Bernoulli and Binomial Random Variables

Definition

Suppose now that $n$ independent trials, each of which results in a success with probability $p$ and in a failure with probability $1 − p$, are to be performed. If $X$ represents the number of successes that occur in the $n$ trials, then $X$ is said to be a binomial random variable with parameters $(n, p)$, and its probability mass function is given by

\[p(i) = \left(\begin{matrix} n\\i \end{matrix} \right) p^i(1\ -\ p)^{n-i} \qquad i=0, 1, ..., n\]

Definition

A random variable $X$ is said to be a Bernoulli random variable if its probability mass function is given by following equations for some $p \in (0, 1)$

\[\begin{eqnarray*} &p(0)& = P{X = 0} = 1 − p \\ &p(1)& = P{X = 1} = p \end{eqnarray*}\]

A Bernoulli random variable is just a binomial random variable with parameters $(1, p)$.

Propertie

The expected value and variance of binomial random variable with parameters $n$ and $p$.

\[E[X^k] = \sum_{i=0}^n i^k \left(\begin{matrix} n\\i \end{matrix}\right) p^i (1 \ - p)^{n-i} = npE[(Y\ + 1)^{k-1}]\]

where $Y$ is a binormial random variable with parameters $n-1$ and $p$.

\[E[X] = np\] \[Var(X)= np(1 − p)\]

Propertie

If $X$ is a binomial random variable with parameters $(n, p)$, where $0 < p < 1$, then as $k$ goes from $0$ to $n$, $P{X = k}$ first increases monotonically and then decreases monotonically, reaching its largest value when k is the largest integer less than or equal to (n + 1)p.

\[P\{X = k + 1\} = \cfrac{p}{1 − p} \cfrac{n − k}{k + 1} P{X = k}\]

Definition

The binomial distribution function is

\[P\{X\leq i\} = \sum_{k=0}^i \left(\begin{matrix} n\\k \end{matrix}\right) p^k (1 − p)^{n−k} \qquad i = 0, 1,... , n\]

6. Continuous Random Variable

Definition

We say that $X$ is a continuous random variable if there exists a nonnegative function $f$ , defined for all real $x ∈ (−q,q)$, having the property that, for any set $B$ of real numbers,

\[P\{X ∈ B\} = \int_B f(x)\ dx\]

Since X must assume some value, f, called the probability density function of the random variable X, must satisfy

\[1 = P\{X ∈ (−\infty, \infty)\} = \int_{−\infty}^\infty f(x)\ dx\]

7. Distribution Function

Definition

If $X$ is a random variable, its distribution function is a function $F_X: \mathbb{R} \rightarrow [0, 1]$ such that \(F_X(x) = P(X \leq x) \qquad \forall x \in \mathbb{R}\) where $P(X \leq x)$ is the probability that $X$ is less than or equal to $x$.

Properties

Every distribution function enjoys the following four properties:

  • Increasing
\[F_X(x_1) \leq F_X(x_2) \qquad if\ x_1 < x_2\]
  • Right-continuous
\[\lim_{t \rightarrow x} F_X(t) = F_X(x) \qquad for\ t \geq x\]
  • Limit at minus infinity
\[\lim_{x \rightarrow -\infty} F(x) = 0\]
  • Limit at plus infinity \(\lim_{x \rightarrow \infty} F(x) = 1\)

Properties

If $X$ is continuous, then its distribution function $F$ will be differentiable and

\[\cfrac{d}{dx} F(x) = f(x)\]

Definition

The expected value of $X$ is defined by

\[E[X] = \int_{−\infty}^\infty xf(x)\ dx\]

Proposition

For any real-valued function $g$

\[E[g(X)] = \int_{−\infty}^\infty g(x)f(x)\ dx\]

Lemma

For a nonnegative random variable Y,

\[E[Y] = \int_0^\infty P(Y>y)\ dy\]

Lemma

If a and b are constants, then

\[E[aX + b] = aE[X] + b\]

Definition

The variance of random variable $X$ with expected value μ is defined by

\[Var(X) = E[(X − μ)^2] = E[X^2] − (E[X])^2\]

9. The Uniform Random Variable

Definition

A random variable is said to be uniformly distributed over the interval (0, 1) if its probability density function is given by

\[f(x)=\begin {cases} 1, & 0 < x < 1 \\ 0, & otherwise \end {cases}\]

for any 0 < a < b < 1

\[P\{a \leq X \leq b\} = \int_a^b f (x) dx = b − a\]

Definition

we say that X is a uniform random variable on the interval (α, β) if the probability density function of X is given by

\[f(x)=\begin {cases} \cfrac{1}{\beta - \alpha}, & \alpha < x < \beta \\ 0, & otherwise \end {cases}\]

Definition

The (cumulative) distribution function of a uniform random variable on the interval $(\alpha, \beta)$ is given by

\[F(x)=\begin {cases} 0, & x \leq \alpha \\ \cfrac{x - \alpha}{\beta - \alpha}, & \alpha < x < \beta \\ 1, & x \geq \beta \end {cases}\]

Proposition

\[\begin {eqnarray*} E[X] &=& \cfrac{a+b}{2} \\ Var(X) &=& \cfrac{(b-a)^2}{12} \end {eqnarray*}\]

10. Normal Random Variable

Definition

We say that $X$ is a normal random variable, or simply that $X$ is normally distributed, with parameters $\mu$ and $\sigma^2$ if the density of $X$ is given by

\[f(x) = \cfrac{1}{\sqrt{2\pi \sigma}}e^{-(x-\mu)^2 / 2\sigma^2} \qquad -\infty<x<\infty\]

Proposition

If $X$ is normally distributed with parameters $\mu$ and $\sigma^2$, then $Y = aX + b$ is normally distributed with parameters $a\mu+b$ and $a^2\sigma^2$,

Definition

$Z = (X − \mu)/\sigma$ is normally distributed with parameters $0$ and $1$. Such a random variable is said to be a standard (unit) normal random variable.

Definition

It is customary to denote the cumulative distribution function of a standard normal random variable by $\Phi(x)$. That is,

\[\Phi(x) = \cfrac{1}{\sqrt{2\pi}} \int_{-\infty}^x e^{-y^2/2} \ dy\]

Proposition

\[\begin{eqnarray*} E(X) &=& \mu \\ Var(X) &=& \sigma^2 \end{eqnarray*}\]

The DeMoivre-Laplace Limit Theorem

If $S_n$ denotes the number of successes that occur when $n$ independent trials, each resulting in a success with probability $p$, are performed, then, for any $a<b$,

\[P \left\{ a \leq \cfrac{S_n - np}{\sqrt{np(1-p)}} \leq b \right\} \rightarrow \ \Phi(b) - \Phi(a)\]

as $n \rightarrow \infty$.

In other words, the probability distribution function of a binomial random variable with parameters $n$ and $p$ can be approximated by that of a normal random variable having mean $np$ and variance $np(1 − p)$.

11. The distribution of A Function of A Random Variable

Theorem

Let $X$ be a continuous random variable having probability density function $f_X$. Suppose that $g(x)$ is a strictly monotonic (increasing or decreasing), differentiable (and thus continuous) function of $x$. Then the random variable $Y$ defined by $Y = g(X)$ has a probability density function given by

\(f_Y(y) = \begin{cases} f_X[g^{-1}(y)] \left| \cfrac{d}{dy} g^{-1}(y) \right|, & y = g(x)\ for\ some\ x \\ 0, & y \neq g(x)\ for\ all\ x \end{cases}\) where $g^{-1}(y)$ is defined to equal that value of $x$ such that $g(x)=y$

12. Joint Distribution Function


Definition

For any two random variables $X$ and $Y$, the joint cumulative probability distribution function of $X$ and $Y$ by

\[F(a, b) = P\{X \leq a, Y \leq b \} \qquad -\infty<a, b< \infty\]

Definition

The marginal distributions of $X$ and $Y$ are defined by

\[\begin{eqnarray*} &F_X(x)& = P\{X \leq x \} = \lim_{y \rightarrow \infty} F(x, y) \\ &F_Y(y)& = P\{Y \leq y \} = \lim_{x \rightarrow \infty} F(x, y) \end{eqnarray*}\]

Property

\[P\{ a_1 < X \leq a_2, b_1 < Y \leq b_2 \} = F(a_2, b_2) + F(a_1, b_1) - F(a_1, b_2) - F(a_2, b_1)\]

Definition

In the case when $X$ and $Y$ are both distrete random variables, it is convenient to define the joint probability mass function of $X$ and $Y$ by

\[p(x,y) = P\{X=x, Y=y \}\]

Definition

The probability mass function of $X$ and $Y$ can be obtained from $p(x, y)$ by

\[\begin{eqnarray*} &p_X&(x) = P\{X=x \} = \sum_{y:p(x,y)>0} p(x, y) \\ &p_Y&(y) = P\{Y=y \} = \sum_{x:p(x,y)>0} p(x, y) \end{eqnarray*}\]

13. Independent Random Variables


Definition

The random variables $X$ and $Y$ are said to be independent if, for any two sets of real number $A$ and $B$,

\[P{X \in A, Y \in B} = P{X \in A}P{Y \in B}\]

When $X$ and $Y$ are discrete random variables, the condition of independence is equivalent to

\[p(x, y) = p_X(x)p_Y(y) \qquad for\ all\ x, y\]

For continuous random variables $X$ and $Y$, the condition of independence is equivalent to

\[F(a, b) = F_X(a)F_Y(b) \qquad for\ all\ a, b \\ f(x, y) = f_X(x)f_Y(y) \qquad for\ all\ x, y\]

Random Variables that are not independent are said to be dependent.

Proposition

The continuous (discrete) random variable $X$ and $Y$ are independent if and only if their joint probability density (mass) function ca be expressed as

\[f_{X,Y}(x, y) = h(x)g(y) \qquad -\infty < x, y < \infty\]

Remark

Independence is a symmetric relation. To say that $X$ is independent of $Y$ is equivalent to saying that $Y$ is independent of X, or just that $X$ and $Y$ are independent.

14. Sums of Independent Random Variables

Define

Suppose that $X$ and $Y$ are independent, continuous random variables having probability density function $f_X$ and $f_Y$. The cumulative distribution function of $X+Y$ is obtained as follows:

\[F_{X+Y}(a) = P\{X + Y \leq a \} = \int_{-\infty}^\infty F_X(a-y)f_Y(y)dy\]

$F_{X+Y}$ is called the convolution of the distributions $F_X$ and $F_Y$.

The probability density function $f_{X+Y}$ of $X$ and $Y$ is given by

\[f_{X+Y}(a) = \cfrac{d}{da} F_{X+Y}(a) = \int_{-\infty}^\infty f_X(a-y)f_Y(y)dy\]

Identically Distributed Uniform Random Variables

Suppose $X$ and $Y$ are independent uniform random variables. The probability density of $X$ and $Y$ is

\[f_{X+Y}(a) = \begin{cases} a & 0 \leq a \leq 1 \\ 2-a & 1 < a < 2 \\ 0 & otherwise \end{cases}\]