Summary of Probability

1. Axioms of Probability

Definition

The set of all possible outcomes of an experiment is known as the sample space of the experiment, denoted by $S$.

Definition

Any subset $E$ of the sample space is known as an event.

Definition

If $E_2, E_2, …$ are events.

The union of these events, denoted by $\bigcup_{n=1f}^\infty E_n$, is defined to be that event which consists of all outcomes that are in $E_n$ for at least one value of $n=1, 2, …$.

The intersection of the events $E_n$, denoted by $\bigcap _{n=1}^\infty E_n$, is defined to be the event consisting of those outcomes which are in all of the events $E_n, n=1, 2, …$.

Definition

The complement of $E$,denoted by $E^c$, consists of all outcomes in the sample space $S$ that are not in $E$.

$E^c$ occurs iff $E$ does not occur.
$E \bigcup E^c = S$
$S^c = \emptyset$

Theorem The DeMorgan’s Laws

\[(\bigcup_{i=1}^n E_i)^c = \bigcap_{i=1}^n E_i^c\] \[(\bigcap_{i=1}^n E_i)^c = \bigcap^n_{i=1} E_i^c\]

Definition

The probability of the event E is defined as $P(E) = lim_{n \rightarrow \infty} \cfrac{n(E)}{n}$ For each event $E$ of the sample space $S$, we assume that a number $P(E)$ is defined and satisfies the following three axioms:

Axiom 1

\[0 \leq P(E) \leq 1\]

Axiom 2

\[P(S) = 1\]

Axiom 3

for any sequence of mutually exclusive events $E_1, E_2, …$

\[P(\bigcup_{i=1}^\infty E_i) = \sum_{i=1}^\infty P(E_i)\]

Proposition

\[P(E^c) = 1 - P(E)\]

Proposition

If $E \subset F$, then $P(E) <= P(F)$.

Proposition

\[P(E \bigcup F) = P(E) + P(F) - P(EF)\]

proposition

\[\begin{eqnarray*} P(E_1 \cup E_2 \cup ... \cup E_n) = &\sum_{i=1}^n& P(E_i) - \sum_{i_1<i_2} P(E_{i_1}E_{i_2}) + (-1)^{r+1} \\ &+&...+ \sum_{i_1<i_2<...<i_r}P(E_{i_1}E_{i_2}...E_{i_r}) \\ &+& ...+ (-1)^{n+1}P(E_{i_1}E_{i_2}...E_{i_n}) \end{eqnarray*}\]

2. Conditional Probability

Definition

The conditional probability that $E$ occurs given that $F$ has occurred is denoted by $P(E|F)$. If $P(F)>0$, then

\[P(E|F)= \cfrac{P(EF)}{P(F)}\]

Theorem The multiplication rule

\[P(E_1E_2E_3...E_n) = P(E_1)P(E_2|E_1)P(E_3|E_1E_2)...P(E_n|E_1...E_{n-1})\]

Proposition

Let $E$ and $F$ be events. We can express $E$ as

\[E = EF \ \cup \ EF^c\]

. By Axiom 3 we have

\[P(E) = P(E|F)P(F)\ +\ P(E|F^c)[1\ -\ P(F)]\]

Definition

The odds of an event $A$ are defined by

\[\cfrac{P(A)}{P(A^c)} = \cfrac{P(A)}{1\ -\ P(A)}\]

This tells how much more likely it is that the event $A$ occurs than it is that does not occur. If the odds are equal to $\alpha$, then it is common to say that the odds are “$\alpha$ to 1” in favor of the hypothesis.

The new odds after the evidence $E$ are

\[\cfrac{P(H|E)}{P(H^c|E)} = \cfrac{P(H)}{P(H^c)} \cfrac{P(E|H)}{P(E|H^c)}\]

Theorem Bayes’s formula

\[P(F_j|E) = \cfrac{P(EF_j)}{P(E)}= \cfrac{P(E|F_j)P(F_j)}{\sum_{i=1}^n P(E|F_i)P(F_i)}\]

Bayes’s formula shows us how to use new evidence to modify existing opinions

$P(F_j|E)$: the likelihood of event $F_j$ occurring given that $E$ is true.
$P(E|F_j)$:

3. Independence

Definition

Two events $E$ and $F$ are said to be independent if the following equation holds.

\[P(EF) = P(E)P(F)\]

Two events $E$ and $F$ that are not independent are said to be dependent.

Proposition

If $E$ and $F$ are independent, then so are $E$ and $F^c$.

Definition

Three events $E_1, E_2, …, E_n$ are said to be independent if, for every subset $E_{1’}, E_{2’}, …, E_{r’}, r<=n$, of these events,

\[\begin{eqnarray*} P(E_{1'}, E_{2'}, ..., E_{r'}) &=& P(E_{1'})P(E_{2'})...P(E_{r'}) \\ \end{eqnarray*}\]

Proposition

Conditional probabilities satisfy all of the properties of ordinary probabilities.

(a) $$0 \leq P(E

F) \leq 1$$

(b) $$P(S

F) = 1$$

\[P(\bigcup_1^\infty E_i|F) = \sum_1^\infty P(E_i|F)\]

4. Discrete Random Variables

Definition

Definition A random variable X is a function from the sample space S to the set of real numbers R:

\[X: S \rightarrow R\]

Definition

For a discrete random variable $X$, we define the probability mass function $p(a)$ of $X$ by

\[p(a) = P(X=a)\]

$X$ must take on one of the value $x_i$ for $i=1, 2, …$, and we have

\[\begin{eqnarray*} &p&(x_i) \geq 0 \quad \quad \ \ \ for \ i=1, 2, ...\\ &p&(x)=0 \qquad \quad for \ all \ other \ values \ of \ x \\ &\sum_{i=1}^\infty& p(x_i) = 1 \end{eqnarray*}\]

Definition

If $X$ is a discrete random variable having a probability mass function $p(x)$, then the expectation, or the expected value, of $X$, denoted by $E[X]$, is defined by

\[E[X] = \sum_{x:p(x)>0}xp(x)\]

$E[X]$ is also referred to as the mean or the first moment of $X$. The quantity $E[Xn], n \geq 1$, is called the nth moment of X.

Proposition

the sample space S—is either finite or countably infinite. For a random variable X, let X(s) denote the value of X when s ∈ S is the outcome of the experiment.

The expected value of a sum of random variables is equal to the sum of their expectations. X be a random variable,X(s) is the value of X $E[X] = \sum_{s \in S} X(s) p(s)$

Definition

We say that $I$ is an indicator variable for the event $A$ if

\[I=\begin {cases} 1, & if\ A\ occurs \\ 0, & if\ A^c\ occurs \end {cases}\]

, and we have $E[I] = P(A)$

Proposition

If $X$ is a discrete random variable that takes on one of the values $x_i, i \geq 1$, with respective probabilities $p(x_i)$, then, for any real-valued function $g$,

\[E[g(X)]=\sum_i g(x_i)p(x_i)\]

Corollary

If $a$ and $b$ are constants, then

\[E[aX + b] = aE[X] + b\]

Definition

If $X$ is a random variable with mean $\mu$, then the variance of $X$, denoted by $Var(X)$, is defined by

\[Var(X) = E[(X − \mu)^2]\]

An alternative formula for $Var(X)$ is derived as follows:

\[Var(X) = E[X^2] − (E[X])^2\]

Corollary

For any constants $a$ and $b$

\[Var(aX + b) = a^2Var(X)\]

Definition 4.5

The square root of the $Var(X)$ is called the standard deviation of $X$, and we denote it by $SD(X)$. That is,

\[SD(X) = \sqrt{Var(X)}\]

Remarks

Analogous to the means being the center of gravity of a distribution of mass, the variance represents, in the terminology of mechanics, the moment of inertia.

5. The Bernoulli and Binomial Random Variables

Definition

Suppose now that $n$ independent trials, each of which results in a success with probability $p$ and in a failure with probability $1 − p$, are to be performed. If $X$ represents the number of successes that occur in the $n$ trials, then $X$ is said to be a binomial random variable with parameters $(n, p)$, and its probability mass function is given by

\[p(i) = \left(\begin{matrix} n\\i \end{matrix} \right) p^i(1\ -\ p)^{n-i} \qquad i=0, 1, ..., n\]

Definition

A random variable $X$ is said to be a Bernoulli random variable if its probability mass function is given by following equations for some $p \in (0, 1)$

\[\begin{eqnarray*} &p(0)& = P{X = 0} = 1 − p \\ &p(1)& = P{X = 1} = p \end{eqnarray*}\]

A Bernoulli random variable is just a binomial random variable with parameters $(1, p)$.

Propertie

The expected value and variance of binomial random variable with parameters $n$ and $p$.

\[E[X^k] = \sum_{i=0}^n i^k \left(\begin{matrix} n\\i \end{matrix}\right) p^i (1 \ - p)^{n-i} = npE[(Y\ + 1)^{k-1}]\]

where $Y$ is a binormial random variable with parameters $n-1$ and $p$.

\[E[X] = np\] \[Var(X)= np(1 − p)\]

Propertie

If $X$ is a binomial random variable with parameters $(n, p)$, where $0 < p < 1$, then as $k$ goes from $0$ to $n$, $P{X = k}$ first increases monotonically and then decreases monotonically, reaching its largest value when k is the largest integer less than or equal to (n + 1)p.

\[P\{X = k + 1\} = \cfrac{p}{1 − p} \cfrac{n − k}{k + 1} P{X = k}\]

Definition

The binomial distribution function is

\[P\{X\leq i\} = \sum_{k=0}^i \left(\begin{matrix} n\\k \end{matrix}\right) p^k (1 − p)^{n−k} \qquad i = 0, 1,... , n\]

6. Continuous Random Variable

Definition

We say that $X$ is a continuous random variable if there exists a nonnegative function $f$ , defined for all real $x ∈ (−q,q)$, having the property that, for any set $B$ of real numbers,

\[P\{X ∈ B\} = \int_B f(x)\ dx\]

Since X must assume some value, f, called the probability density function of the random variable X, must satisfy

\[1 = P\{X ∈ (−\infty, \infty)\} = \int_{−\infty}^\infty f(x)\ dx\]

7. Distribution Function

Definition

If $X$ is a random variable, its distribution function is a function $F_X: \mathbb{R} \rightarrow [0, 1]$ such that $F_X(x) = P(X \leq x) \qquad \forall x \in \mathbb{R}$ where $P(X \leq x)$ is the probability that $X$ is less than or equal to $x$.

Properties

Every distribution function enjoys the following four properties:

Increasing

\[F_X(x_1) \leq F_X(x_2) \qquad if\ x_1 < x_2\]

Right-continuous

\[\lim_{t \rightarrow x} F_X(t) = F_X(x) \qquad for\ t \geq x\]

Limit at minus infinity

\[\lim_{x \rightarrow -\infty} F(x) = 0\]

Limit at plus infinity $\lim_{x \rightarrow \infty} F(x) = 1$

Properties

If $X$ is continuous, then its distribution function $F$ will be differentiable and

\[\cfrac{d}{dx} F(x) = f(x)\]

Definition

The expected value of $X$ is defined by

\[E[X] = \int_{−\infty}^\infty xf(x)\ dx\]

Proposition

For any real-valued function $g$

\[E[g(X)] = \int_{−\infty}^\infty g(x)f(x)\ dx\]

Lemma

For a nonnegative random variable Y,

\[E[Y] = \int_0^\infty P(Y>y)\ dy\]

Lemma

If a and b are constants, then

\[E[aX + b] = aE[X] + b\]

Definition

The variance of random variable $X$ with expected value μ is defined by

\[Var(X) = E[(X − μ)^2] = E[X^2] − (E[X])^2\]

9. The Uniform Random Variable

Definition

A random variable is said to be uniformly distributed over the interval (0, 1) if its probability density function is given by

\[f(x)=\begin {cases} 1, & 0 < x < 1 \\ 0, & otherwise \end {cases}\]

for any 0 < a < b < 1

\[P\{a \leq X \leq b\} = \int_a^b f (x) dx = b − a\]

Definition

we say that X is a uniform random variable on the interval (α, β) if the probability density function of X is given by

\[f(x)=\begin {cases} \cfrac{1}{\beta - \alpha}, & \alpha < x < \beta \\ 0, & otherwise \end {cases}\]

Definition

The (cumulative) distribution function of a uniform random variable on the interval $(\alpha, \beta)$ is given by

\[F(x)=\begin {cases} 0, & x \leq \alpha \\ \cfrac{x - \alpha}{\beta - \alpha}, & \alpha < x < \beta \\ 1, & x \geq \beta \end {cases}\]

Proposition

\[\begin {eqnarray*} E[X] &=& \cfrac{a+b}{2} \\ Var(X) &=& \cfrac{(b-a)^2}{12} \end {eqnarray*}\]

10. Normal Random Variable

Definition

We say that $X$ is a normal random variable, or simply that $X$ is normally distributed, with parameters $\mu$ and $\sigma^2$ if the density of $X$ is given by

\[f(x) = \cfrac{1}{\sqrt{2\pi \sigma}}e^{-(x-\mu)^2 / 2\sigma^2} \qquad -\infty<x<\infty\]

Proposition

If $X$ is normally distributed with parameters $\mu$ and $\sigma^2$, then $Y = aX + b$ is normally distributed with parameters $a\mu+b$ and $a^2\sigma^2$,

Definition

$Z = (X − \mu)/\sigma$ is normally distributed with parameters $0$ and $1$. Such a random variable is said to be a standard (unit) normal random variable.

Definition

It is customary to denote the cumulative distribution function of a standard normal random variable by $\Phi(x)$. That is,

\[\Phi(x) = \cfrac{1}{\sqrt{2\pi}} \int_{-\infty}^x e^{-y^2/2} \ dy\]

Proposition

\[\begin{eqnarray*} E(X) &=& \mu \\ Var(X) &=& \sigma^2 \end{eqnarray*}\]

The DeMoivre-Laplace Limit Theorem

If $S_n$ denotes the number of successes that occur when $n$ independent trials, each resulting in a success with probability $p$, are performed, then, for any $a<b$,

\[P \left\{ a \leq \cfrac{S_n - np}{\sqrt{np(1-p)}} \leq b \right\} \rightarrow \ \Phi(b) - \Phi(a)\]

as $n \rightarrow \infty$.

In other words, the probability distribution function of a binomial random variable with parameters $n$ and $p$ can be approximated by that of a normal random variable having mean $np$ and variance $np(1 − p)$.

11. The distribution of A Function of A Random Variable

Theorem

Let $X$ be a continuous random variable having probability density function $f_X$. Suppose that $g(x)$ is a strictly monotonic (increasing or decreasing), differentiable (and thus continuous) function of $x$. Then the random variable $Y$ defined by $Y = g(X)$ has a probability density function given by

$f_Y(y) = \begin{cases} f_X[g^{-1}(y)] \left| \cfrac{d}{dy} g^{-1}(y) \right|, & y = g(x)\ for\ some\ x \\ 0, & y \neq g(x)\ for\ all\ x \end{cases}$ where $g^{-1}(y)$ is defined to equal that value of $x$ such that $g(x)=y$

12. Joint Distribution Function

Definition

For any two random variables $X$ and $Y$, the joint cumulative probability distribution function of $X$ and $Y$ by

\[F(a, b) = P\{X \leq a, Y \leq b \} \qquad -\infty<a, b< \infty\]

Definition

The marginal distributions of $X$ and $Y$ are defined by

\[\begin{eqnarray*} &F_X(x)& = P\{X \leq x \} = \lim_{y \rightarrow \infty} F(x, y) \\ &F_Y(y)& = P\{Y \leq y \} = \lim_{x \rightarrow \infty} F(x, y) \end{eqnarray*}\]

Property

\[P\{ a_1 < X \leq a_2, b_1 < Y \leq b_2 \} = F(a_2, b_2) + F(a_1, b_1) - F(a_1, b_2) - F(a_2, b_1)\]

Definition

In the case when $X$ and $Y$ are both distrete random variables, it is convenient to define the joint probability mass function of $X$ and $Y$ by

\[p(x,y) = P\{X=x, Y=y \}\]

Definition

The probability mass function of $X$ and $Y$ can be obtained from $p(x, y)$ by

\[\begin{eqnarray*} &p_X&(x) = P\{X=x \} = \sum_{y:p(x,y)>0} p(x, y) \\ &p_Y&(y) = P\{Y=y \} = \sum_{x:p(x,y)>0} p(x, y) \end{eqnarray*}\]

13. Independent Random Variables

Definition

The random variables $X$ and $Y$ are said to be independent if, for any two sets of real number $A$ and $B$,

\[P{X \in A, Y \in B} = P{X \in A}P{Y \in B}\]

When $X$ and $Y$ are discrete random variables, the condition of independence is equivalent to

\[p(x, y) = p_X(x)p_Y(y) \qquad for\ all\ x, y\]

For continuous random variables $X$ and $Y$, the condition of independence is equivalent to

\[F(a, b) = F_X(a)F_Y(b) \qquad for\ all\ a, b \\ f(x, y) = f_X(x)f_Y(y) \qquad for\ all\ x, y\]

Random Variables that are not independent are said to be dependent.

Proposition

The continuous (discrete) random variable $X$ and $Y$ are independent if and only if their joint probability density (mass) function ca be expressed as

\[f_{X,Y}(x, y) = h(x)g(y) \qquad -\infty < x, y < \infty\]

Remark

Independence is a symmetric relation. To say that $X$ is independent of $Y$ is equivalent to saying that $Y$ is independent of X, or just that $X$ and $Y$ are independent.

14. Sums of Independent Random Variables

Define

Suppose that $X$ and $Y$ are independent, continuous random variables having probability density function $f_X$ and $f_Y$. The cumulative distribution function of $X+Y$ is obtained as follows:

\[F_{X+Y}(a) = P\{X + Y \leq a \} = \int_{-\infty}^\infty F_X(a-y)f_Y(y)dy\]

$F_{X+Y}$ is called the convolution of the distributions $F_X$ and $F_Y$.

The probability density function $f_{X+Y}$ of $X$ and $Y$ is given by

\[f_{X+Y}(a) = \cfrac{d}{da} F_{X+Y}(a) = \int_{-\infty}^\infty f_X(a-y)f_Y(y)dy\]

Identically Distributed Uniform Random Variables

Suppose $X$ and $Y$ are independent uniform random variables. The probability density of $X$ and $Y$ is

\[f_{X+Y}(a) = \begin{cases} a & 0 \leq a \leq 1 \\ 2-a & 1 < a < 2 \\ 0 & otherwise \end{cases}\]