1. Axioms of Probability
Definition
The set of all possible outcomes of an experiment is known as the sample space of the experiment, denoted by $S$.
Definition
Any subset $E$ of the sample space is known as an event.
Definition
If $E_2, E_2, …$ are events.
The union of these events, denoted by $\bigcup_{n=1f}^\infty E_n$, is defined to be that event which consists of all outcomes that are in $E_n$ for at least one value of $n=1, 2, …$.
The intersection of the events $E_n$, denoted by $\bigcap _{n=1}^\infty E_n$, is defined to be the event consisting of those outcomes which are in all of the events $E_n, n=1, 2, …$.
Definition
The complement of $E$,denoted by $E^c$, consists of all outcomes in the sample space $S$ that are not in $E$.
- $E^c$ occurs iff $E$ does not occur.
- $E \bigcup E^c = S$
- $S^c = \emptyset$
Theorem The DeMorgan’s Laws
\[(\bigcup_{i=1}^n E_i)^c = \bigcap_{i=1}^n E_i^c\] \[(\bigcap_{i=1}^n E_i)^c = \bigcap^n_{i=1} E_i^c\]Definition
The probability of the event E is defined as \(P(E) = lim_{n \rightarrow \infty} \cfrac{n(E)}{n}\) For each event $E$ of the sample space $S$, we assume that a number $P(E)$ is defined and satisfies the following three axioms:
Axiom 1
\[0 \leq P(E) \leq 1\]Axiom 2
\[P(S) = 1\]Axiom 3
for any sequence of mutually exclusive events $E_1, E_2, …$
\[P(\bigcup_{i=1}^\infty E_i) = \sum_{i=1}^\infty P(E_i)\]Proposition
\[P(E^c) = 1 - P(E)\]Proposition
If $E \subset F$, then $P(E) <= P(F)$.
Proposition
\[P(E \bigcup F) = P(E) + P(F) - P(EF)\]proposition
\[\begin{eqnarray*} P(E_1 \cup E_2 \cup ... \cup E_n) = &\sum_{i=1}^n& P(E_i) - \sum_{i_1<i_2} P(E_{i_1}E_{i_2}) + (-1)^{r+1} \\ &+&...+ \sum_{i_1<i_2<...<i_r}P(E_{i_1}E_{i_2}...E_{i_r}) \\ &+& ...+ (-1)^{n+1}P(E_{i_1}E_{i_2}...E_{i_n}) \end{eqnarray*}\]2. Conditional Probability
Definition
The conditional probability that $E$ occurs given that $F$ has occurred is denoted by $P(E|F)$. If $P(F)>0$, then
\[P(E|F)= \cfrac{P(EF)}{P(F)}\]Theorem The multiplication rule
\[P(E_1E_2E_3...E_n) = P(E_1)P(E_2|E_1)P(E_3|E_1E_2)...P(E_n|E_1...E_{n-1})\]Proposition
Let $E$ and $F$ be events. We can express $E$ as
\[E = EF \ \cup \ EF^c\]. By Axiom 3 we have
\[P(E) = P(E|F)P(F)\ +\ P(E|F^c)[1\ -\ P(F)]\]Definition
The odds of an event $A$ are defined by
\[\cfrac{P(A)}{P(A^c)} = \cfrac{P(A)}{1\ -\ P(A)}\]This tells how much more likely it is that the event $A$ occurs than it is that does not occur. If the odds are equal to $\alpha$, then it is common to say that the odds are “$\alpha$ to 1” in favor of the hypothesis.
The new odds after the evidence $E$ are
\[\cfrac{P(H|E)}{P(H^c|E)} = \cfrac{P(H)}{P(H^c)} \cfrac{P(E|H)}{P(E|H^c)}\]Theorem Bayes’s formula
\[P(F_j|E) = \cfrac{P(EF_j)}{P(E)}= \cfrac{P(E|F_j)P(F_j)}{\sum_{i=1}^n P(E|F_i)P(F_i)}\]Bayes’s formula shows us how to use new evidence to modify existing opinions
- $P(F_j|E)$: the likelihood of event $F_j$ occurring given that $E$ is true.
- $P(E|F_j)$:
3. Independence
Definition
Two events $E$ and $F$ are said to be independent if the following equation holds.
\[P(EF) = P(E)P(F)\]Two events $E$ and $F$ that are not independent are said to be dependent.
Proposition
If $E$ and $F$ are independent, then so are $E$ and $F^c$.
Definition
Three events $E_1, E_2, …, E_n$ are said to be independent if, for every subset $E_{1’}, E_{2’}, …, E_{r’}, r<=n$, of these events,
\[\begin{eqnarray*} P(E_{1'}, E_{2'}, ..., E_{r'}) &=& P(E_{1'})P(E_{2'})...P(E_{r'}) \\ \end{eqnarray*}\]Proposition
Conditional probabilities satisfy all of the properties of ordinary probabilities.
(a) $$0 \leq P(E | F) \leq 1$$ |
(b) $$P(S | F) = 1$$ |
(c) If $ E_i,\ i=1, 2, …$, are mutually exclusive events, then
\[P(\bigcup_1^\infty E_i|F) = \sum_1^\infty P(E_i|F)\]4. Discrete Random Variables
Definition
Definition A random variable X is a function from the sample space S to the set of real numbers R:
\[X: S \rightarrow R\]Definition
For a discrete random variable $X$, we define the probability mass function $p(a)$ of $X$ by
\[p(a) = P(X=a)\]$X$ must take on one of the value $x_i$ for $i=1, 2, …$, and we have
\[\begin{eqnarray*} &p&(x_i) \geq 0 \quad \quad \ \ \ for \ i=1, 2, ...\\ &p&(x)=0 \qquad \quad for \ all \ other \ values \ of \ x \\ &\sum_{i=1}^\infty& p(x_i) = 1 \end{eqnarray*}\]Definition
If $X$ is a discrete random variable having a probability mass function $p(x)$, then the expectation, or the expected value, of $X$, denoted by $E[X]$, is defined by
\[E[X] = \sum_{x:p(x)>0}xp(x)\]$E[X]$ is also referred to as the mean or the first moment of $X$. The quantity $E[Xn], n \geq 1$, is called the nth moment of X.
Proposition
the sample space S—is either finite or countably infinite. For a random variable X, let X(s) denote the value of X when s ∈ S is the outcome of the experiment.
The expected value of a sum of random variables is equal to the sum of their expectations. X be a random variable,X(s) is the value of X \(E[X] = \sum_{s \in S} X(s) p(s)\)
Definition
We say that $I$ is an indicator variable for the event $A$ if
\[I=\begin {cases} 1, & if\ A\ occurs \\ 0, & if\ A^c\ occurs \end {cases}\], and we have $E[I] = P(A)$
Proposition
If $X$ is a discrete random variable that takes on one of the values $x_i, i \geq 1$, with respective probabilities $p(x_i)$, then, for any real-valued function $g$,
\[E[g(X)]=\sum_i g(x_i)p(x_i)\]Corollary
If $a$ and $b$ are constants, then
\[E[aX + b] = aE[X] + b\]Definition
If $X$ is a random variable with mean $\mu$, then the variance of $X$, denoted by $Var(X)$, is defined by
\[Var(X) = E[(X − \mu)^2]\]An alternative formula for $Var(X)$ is derived as follows:
\[Var(X) = E[X^2] − (E[X])^2\]Corollary
For any constants $a$ and $b$
\[Var(aX + b) = a^2Var(X)\]Definition 4.5
The square root of the $Var(X)$ is called the standard deviation of $X$, and we denote it by $SD(X)$. That is,
\[SD(X) = \sqrt{Var(X)}\]Remarks
Analogous to the means being the center of gravity of a distribution of mass, the variance represents, in the terminology of mechanics, the moment of inertia.
5. The Bernoulli and Binomial Random Variables
Definition
Suppose now that $n$ independent trials, each of which results in a success with probability $p$ and in a failure with probability $1 − p$, are to be performed. If $X$ represents the number of successes that occur in the $n$ trials, then $X$ is said to be a binomial random variable with parameters $(n, p)$, and its probability mass function is given by
\[p(i) = \left(\begin{matrix} n\\i \end{matrix} \right) p^i(1\ -\ p)^{n-i} \qquad i=0, 1, ..., n\]Definition
A random variable $X$ is said to be a Bernoulli random variable if its probability mass function is given by following equations for some $p \in (0, 1)$
\[\begin{eqnarray*} &p(0)& = P{X = 0} = 1 − p \\ &p(1)& = P{X = 1} = p \end{eqnarray*}\]A Bernoulli random variable is just a binomial random variable with parameters $(1, p)$.
Propertie
The expected value and variance of binomial random variable with parameters $n$ and $p$.
\[E[X^k] = \sum_{i=0}^n i^k \left(\begin{matrix} n\\i \end{matrix}\right) p^i (1 \ - p)^{n-i} = npE[(Y\ + 1)^{k-1}]\]where $Y$ is a binormial random variable with parameters $n-1$ and $p$.
\[E[X] = np\] \[Var(X)= np(1 − p)\]Propertie
If $X$ is a binomial random variable with parameters $(n, p)$, where $0 < p < 1$, then as $k$ goes from $0$ to $n$, $P{X = k}$ first increases monotonically and then decreases monotonically, reaching its largest value when k is the largest integer less than or equal to (n + 1)p.
\[P\{X = k + 1\} = \cfrac{p}{1 − p} \cfrac{n − k}{k + 1} P{X = k}\]Definition
The binomial distribution function is
\[P\{X\leq i\} = \sum_{k=0}^i \left(\begin{matrix} n\\k \end{matrix}\right) p^k (1 − p)^{n−k} \qquad i = 0, 1,... , n\]6. Continuous Random Variable
Definition
We say that $X$ is a continuous random variable if there exists a nonnegative function $f$ , defined for all real $x ∈ (−q,q)$, having the property that, for any set $B$ of real numbers,
\[P\{X ∈ B\} = \int_B f(x)\ dx\]Since X must assume some value, f, called the probability density function of the random variable X, must satisfy
\[1 = P\{X ∈ (−\infty, \infty)\} = \int_{−\infty}^\infty f(x)\ dx\]7. Distribution Function
Definition
If $X$ is a random variable, its distribution function is a function $F_X: \mathbb{R} \rightarrow [0, 1]$ such that \(F_X(x) = P(X \leq x) \qquad \forall x \in \mathbb{R}\) where $P(X \leq x)$ is the probability that $X$ is less than or equal to $x$.
Properties
Every distribution function enjoys the following four properties:
- Increasing
- Right-continuous
- Limit at minus infinity
- Limit at plus infinity \(\lim_{x \rightarrow \infty} F(x) = 1\)
Properties
If $X$ is continuous, then its distribution function $F$ will be differentiable and
\[\cfrac{d}{dx} F(x) = f(x)\]Definition
The expected value of $X$ is defined by
\[E[X] = \int_{−\infty}^\infty xf(x)\ dx\]Proposition
For any real-valued function $g$
\[E[g(X)] = \int_{−\infty}^\infty g(x)f(x)\ dx\]Lemma
For a nonnegative random variable Y,
\[E[Y] = \int_0^\infty P(Y>y)\ dy\]Lemma
If a and b are constants, then
\[E[aX + b] = aE[X] + b\]Definition
The variance of random variable $X$ with expected value μ is defined by
\[Var(X) = E[(X − μ)^2] = E[X^2] − (E[X])^2\]9. The Uniform Random Variable
Definition
A random variable is said to be uniformly distributed over the interval (0, 1) if its probability density function is given by
\[f(x)=\begin {cases} 1, & 0 < x < 1 \\ 0, & otherwise \end {cases}\]for any 0 < a < b < 1
\[P\{a \leq X \leq b\} = \int_a^b f (x) dx = b − a\]Definition
we say that X is a uniform random variable on the interval (α, β) if the probability density function of X is given by
\[f(x)=\begin {cases} \cfrac{1}{\beta - \alpha}, & \alpha < x < \beta \\ 0, & otherwise \end {cases}\]Definition
The (cumulative) distribution function of a uniform random variable on the interval $(\alpha, \beta)$ is given by
\[F(x)=\begin {cases} 0, & x \leq \alpha \\ \cfrac{x - \alpha}{\beta - \alpha}, & \alpha < x < \beta \\ 1, & x \geq \beta \end {cases}\]Proposition
\[\begin {eqnarray*} E[X] &=& \cfrac{a+b}{2} \\ Var(X) &=& \cfrac{(b-a)^2}{12} \end {eqnarray*}\]10. Normal Random Variable
Definition
We say that $X$ is a normal random variable, or simply that $X$ is normally distributed, with parameters $\mu$ and $\sigma^2$ if the density of $X$ is given by
\[f(x) = \cfrac{1}{\sqrt{2\pi \sigma}}e^{-(x-\mu)^2 / 2\sigma^2} \qquad -\infty<x<\infty\]Proposition
If $X$ is normally distributed with parameters $\mu$ and $\sigma^2$, then $Y = aX + b$ is normally distributed with parameters $a\mu+b$ and $a^2\sigma^2$,
Definition
$Z = (X − \mu)/\sigma$ is normally distributed with parameters $0$ and $1$. Such a random variable is said to be a standard (unit) normal random variable.
Definition
It is customary to denote the cumulative distribution function of a standard normal random variable by $\Phi(x)$. That is,
\[\Phi(x) = \cfrac{1}{\sqrt{2\pi}} \int_{-\infty}^x e^{-y^2/2} \ dy\]Proposition
\[\begin{eqnarray*} E(X) &=& \mu \\ Var(X) &=& \sigma^2 \end{eqnarray*}\]The DeMoivre-Laplace Limit Theorem
If $S_n$ denotes the number of successes that occur when $n$ independent trials, each resulting in a success with probability $p$, are performed, then, for any $a<b$,
\[P \left\{ a \leq \cfrac{S_n - np}{\sqrt{np(1-p)}} \leq b \right\} \rightarrow \ \Phi(b) - \Phi(a)\]as $n \rightarrow \infty$.
In other words, the probability distribution function of a binomial random variable with parameters $n$ and $p$ can be approximated by that of a normal random variable having mean $np$ and variance $np(1 − p)$.
11. The distribution of A Function of A Random Variable
Theorem
Let $X$ be a continuous random variable having probability density function $f_X$. Suppose that $g(x)$ is a strictly monotonic (increasing or decreasing), differentiable (and thus continuous) function of $x$. Then the random variable $Y$ defined by $Y = g(X)$ has a probability density function given by
\(f_Y(y) = \begin{cases} f_X[g^{-1}(y)] \left| \cfrac{d}{dy} g^{-1}(y) \right|, & y = g(x)\ for\ some\ x \\ 0, & y \neq g(x)\ for\ all\ x \end{cases}\) where $g^{-1}(y)$ is defined to equal that value of $x$ such that $g(x)=y$
12. Joint Distribution Function
Definition
For any two random variables $X$ and $Y$, the joint cumulative probability distribution function of $X$ and $Y$ by
\[F(a, b) = P\{X \leq a, Y \leq b \} \qquad -\infty<a, b< \infty\]Definition
The marginal distributions of $X$ and $Y$ are defined by
\[\begin{eqnarray*} &F_X(x)& = P\{X \leq x \} = \lim_{y \rightarrow \infty} F(x, y) \\ &F_Y(y)& = P\{Y \leq y \} = \lim_{x \rightarrow \infty} F(x, y) \end{eqnarray*}\]Property
\[P\{ a_1 < X \leq a_2, b_1 < Y \leq b_2 \} = F(a_2, b_2) + F(a_1, b_1) - F(a_1, b_2) - F(a_2, b_1)\]Definition
In the case when $X$ and $Y$ are both distrete random variables, it is convenient to define the joint probability mass function of $X$ and $Y$ by
\[p(x,y) = P\{X=x, Y=y \}\]Definition
The probability mass function of $X$ and $Y$ can be obtained from $p(x, y)$ by
\[\begin{eqnarray*} &p_X&(x) = P\{X=x \} = \sum_{y:p(x,y)>0} p(x, y) \\ &p_Y&(y) = P\{Y=y \} = \sum_{x:p(x,y)>0} p(x, y) \end{eqnarray*}\]13. Independent Random Variables
Definition
The random variables $X$ and $Y$ are said to be independent if, for any two sets of real number $A$ and $B$,
\[P{X \in A, Y \in B} = P{X \in A}P{Y \in B}\]When $X$ and $Y$ are discrete random variables, the condition of independence is equivalent to
\[p(x, y) = p_X(x)p_Y(y) \qquad for\ all\ x, y\]For continuous random variables $X$ and $Y$, the condition of independence is equivalent to
\[F(a, b) = F_X(a)F_Y(b) \qquad for\ all\ a, b \\ f(x, y) = f_X(x)f_Y(y) \qquad for\ all\ x, y\]Random Variables that are not independent are said to be dependent.
Proposition
The continuous (discrete) random variable $X$ and $Y$ are independent if and only if their joint probability density (mass) function ca be expressed as
\[f_{X,Y}(x, y) = h(x)g(y) \qquad -\infty < x, y < \infty\]Remark
Independence is a symmetric relation. To say that $X$ is independent of $Y$ is equivalent to saying that $Y$ is independent of X, or just that $X$ and $Y$ are independent.
14. Sums of Independent Random Variables
Define
Suppose that $X$ and $Y$ are independent, continuous random variables having probability density function $f_X$ and $f_Y$. The cumulative distribution function of $X+Y$ is obtained as follows:
\[F_{X+Y}(a) = P\{X + Y \leq a \} = \int_{-\infty}^\infty F_X(a-y)f_Y(y)dy\]$F_{X+Y}$ is called the convolution of the distributions $F_X$ and $F_Y$.
The probability density function $f_{X+Y}$ of $X$ and $Y$ is given by
\[f_{X+Y}(a) = \cfrac{d}{da} F_{X+Y}(a) = \int_{-\infty}^\infty f_X(a-y)f_Y(y)dy\]Identically Distributed Uniform Random Variables
Suppose $X$ and $Y$ are independent uniform random variables. The probability density of $X$ and $Y$ is
\[f_{X+Y}(a) = \begin{cases} a & 0 \leq a \leq 1 \\ 2-a & 1 < a < 2 \\ 0 & otherwise \end{cases}\]