Probability Theory

Experiments, Elementary Outcomes, and Events

Before we dive too deep into probability theory let’s define what experiments, elementary outcomes and events are.

An experiment is a repeatable test in which one of multiple outcomes occurs.
Elementary outcomes are the results of a “fair experiment” and, it’s important to note, in a fair experiment with n elementary outcomes the probability of each outcome is $\frac{1}{n}$ .
An event is a subset containing a number of elementary outcomes. We also have impossible events, which are events that can’t happen, and certain events, which are events that will definitely happen. One cool thing to remember is that as long as all elementary outcomes have equal probability, the event probability is the number of elementary outcomes in the event divided by the total number of outcomes. In layman’s terms, an event's probability will be the sum of the probabilities of its constituent elementary outcomes. Now that we have those definitions, let’s look at the law of large numbers.

The Law of Large Numbers

The law of large numbers states that the more times you repeat an experiment, the closer the frequency of a given event will be to its probability.

We can also use this rule in reverse. If we don’t know the probability of an event, but we can repeat the experiment many times, we can estimate its probability from the frequency of the outcomes.

So now that we can start to see how probable events are, the next step is how we present these events.

Mutually Exclusive and Independent Events; Multiplying Probabilities

If you want to illustrate the intersection between events, you can use a Venn diagram:

Venn diagram

A Venn diagram will show us where events A and B intersect. If we have a Venn diagram with no intersection then events cannot occur simultaneously and are mutually exclusive meaning that the probability of both mutually exclusive events occurring is zero, making them independent.

no intersection

All this means that mutually exclusive events occupy the whole sample space, so the sum of their probabilities will equal 1.

You can tell if events are mutually exclusive from a Venn diagram, but it's less easy to illustrate independence. To do that you’ll need to check whether the product of events' probabilities equals the probability of their intersection. That’s where random numbers, probability distributions and value intervals comes in.

Test your skills on this topic with the quiz below.

Random Numbers, Probability Distributions, and Value Intervals

A random variable is a random value that cannot be predicted before the experiment is carried out. This is important because experiments have outcomes that can be described quantitatively as well as qualitatively. A random number is defined numerically based on outcomes. And because of this we can project an experiment's outcomes, no matter how they were defined, onto the number line. Then a Probability distribution of a random variable can be shown in a table that contains all of the variable's possible values and the probability of their occurrence.

You can use the array data type from the NumPy library to store numerical tables:


1table = np.array([[2,3,4,5,6,7], 
2[3,4,5,6,7,8], 
3...
4[7,8,9,10,11,12]])

You can use the keys() method to get a list of all dictionary keys, and the values() method to get a list of all the values.


1dictionary = {...}
2print(dictionary.keys())
3print(dictionary.values())

Next it’s important to know how expected value and variance are used.

Expected Value and Variance

The expected value of a random variable occurs when you define a random variable for an experiment and find a numerical value which it will tend toward over multiple repetitions of the experiment. This makes the expected value of a random variable (X) the sum of all values of the random variable (represented using a lowercase x) multiplied by their probabilities:

E(X) = \sum p(x_i)x_i

This is kind of like a measure of location, just for random variables rather than datasets. It tells you which value the random variable is distributed around and which value it will tend toward when the experiment is repeated.

Because the random variable is distributed like a "measure of location," it becomes easier to determine what its variance is. To do so, you need to find the expected value of the square of the random variable.

Since we know the expected value of the random variable itself and its square, the variance can be found using this formula:

\text{Var}(X) = E(X^2) - (E(X))^2

Now that we know what expected value and variance our next step is to figure out the probability of success in a binomial experiment.

Probability of Success in Binomial Experiments

Experiments with two possible outcomes are known as binomial experiments. Typically one of the outcomes is called a "success" and the other a "failure." So, if the probability of success is $p$ , then the probability of failure is $1 - p$ , as the sum of the probabilities of the outcomes must equal 1.

The Binomial Distribution

To get $k$ successes from $n$ repetitions of an experiment you can use this formula where ! (read as "factorial") equals the product of natural numbers from 1 to the given number: $n! = 1 *2* 3 *4* \dots *(n-1)* n$ .

C_{n}^{k} = \frac{n!}{k!(n-k)!}

You can calculate the factorial using the math library and its factorial method:


1from math import factorial
2x = factorial(5)

If the probability of success is $p$ and that of failure is $1 - p$ , and the experiment is repeated n times, then the probability of getting $k$ successes given $n$ attempts is:

C_{n}^{k} p^k(1 - p)^{n-k}

Here are the conditions that allow us to confirm that the random variable is binomially distributed:

A finite, fixed number of attempts ( $n$ ) are conducted
Every attempt is a simple binomial experiment with exactly two outcomes
The attempts are independent of each other
The probability of success ( $p$ ) is the same for all $n$ attempts

In the interactive below, change the values of $p$ and $n$ to see how they relate to one another in the context of the binomial distribution.

Now that we know how to confirm if a random variable is binomially distributed our next step is to get a normal distribution.

The Normal Distribution

The central limit theorem is a key theorem in statistics, it states that “Many independent random variables, added together, give a normal distribution.”

The normal distribution describes real continuous values. It has two parameters, mean and variance:

X \sim \mathcal{N}(\mu,\,\sigma^{2})

This notation can be read as: The variable $X$ has a normal distribution with a mean of mu ( $\mu$ ) and a variance of sigma squared ( $\sigma^2$ ) (corresponding to a standard deviation of sigma).

See how $\mu$ and $\sigma$ affect the normal distribution in the interactive below.

To find the probability of any given interval occurring from known distribution parameters, we call two methods from the scipy.stats package: norm.ppf and norm.cdf.

ppf: percent point function.
cdf: cumulative distribution function.

Both work with the normal distribution, given a particular mean (expected value) and standard deviation.

norm.ppf gives the value of a variable when the probability of the interval to the left of that value is known.
norm.cdf, on the other hand, gives the probability of the interval to the left of the value when the value is known.

You calculate the normal distribution using the norm() method from the scipy.stats package with two arguments: expected value and standard deviation. Let’s find the probability of getting a particular value, $x$ :


1from scipy import stats as st
2
3# set normal distribution 
4distr = st.norm(1000, 100) 
5
6x = 1000
7
8result = distr.cdf(x) # calculate probability of getting the value x

Using the norm.cdf function, we can calculate the probability of getting a value in the interval between $x_1$ and $x_2$ :


1from scipy import stats as st
2
3# set normal distribution 
4distr = st.norm(1000, 100) 
5
6x1 = 900
7x2 = 1100
8
9result = distr.cdf(x2) - distr.cdf(x1) 
10# calculate the probability of getting a value between x1 and x2

To find a value given a certain probability, we use the norm.ppf method:


1from scipy import stats as st
2
3# set normal distribution 
4distr = st.norm(1000, 100) 
5
6p1 = 0.841344746
7
8result = distr.ppf(p1)

Normal Approximation to the Binomial Distribution

We can now see that a large number of repetitions of a binomial experiment will make the binomial distribution approach the normal distribution.

For a discrete binomial distribution, given a number of attempts n and a probability of success $p$ , the expected value equals $n*p$ , and the variance is $n*p*(1-p)$ .

If n is greater than 50, these binomial distribution parameters can be taken as the mean and variance of a normal distribution fairly close to the binomial. The normal distribution will be closest to the binomial when the expected value has $n*p$ for the mean value and $n*p*(1-p)$ for the variance.

Now it’s time to put our understanding to the test with some practice.