Probability distributions - High school Statistics & Probabilities Lessons

Generalities on random variables

Discrete or continuous random variables

Random variable

A random variable X is a variable whose possible values depend on outcomes of random phenomenon.

The outcome of flipping a coin has two possibilities: heads or tails.

Suppose we assign a random variable X a value of 1 to heads and a value of 2 to tails. The variable X of landing heads or tails on a coin flip is a random variable.

Discrete random variable

A random variable that can only take on finite values is called a discrete random variable.

Flipping a coin has only two possible outcomes. Therefore, if X is the random variable in flipping a coin defined in the previous example, then X is a discrete random variable.

Let X be the random variable which gives the number chosen in picking an integer between 1 and 10. X takes a value of 1 whenever 1 is selected, a value of 2 if 2 is selected, and so forth. The random variable X is a discrete random variable.

Continuous random variable

A continuous random variable is a random variable that can take any value in an interval.

Let X be the random variable which is the speed of a car along a highway. The speed of a car can take on any nonnegative real number. Therefore, the random variable X of measuring the speeds of cars is a continuous random variable.

Expected value of random variables

Expected value

The expected value of a random variable X is the average value of the outcomes of the random value. It is denoted by E\left(X\right).

If X is a discrete random variable whose possible values are N_1,N_2,\ldots, N_\ell and whose respective probabilities are p_1,p_2,\ldots p_\ell, then:

E\left(X\right)=p_1N_1+p_2N_2+\cdots +p_\ell N_\ell

Flipping a fair coin has a 50\% chance of landing on either heads or tails. Let X be the random variable which assigns a value of 1 for heads and a value of 2 for tails. The expected value of flipping a coin is:

E\left(X\right)=1\left(\mbox{probability of heads}\right)+2\left(\mbox{probability of tails}\right)

E\left(X\right)=1\left(.5\right)+2\left(.5\right)

E\left(X\right)=.5+1

E\left(X\right)=1.5

Suppose there are six balls in a hat. Three are red, two are blue, and one is yellow. We select randomly a ball from the hat. Consider the random variable X that assigns a value of 1 to a red ball, a value of 2 to a blue ball, and a value of 3 to a yellow ball.

The expected value of this particular random variable is:

E\left(X\right)=1\left(\mbox{probability of selecting red}\right)+2\left(\mbox{probability of selecting blue}\right)+3\left(\mbox{probability of selecting yellow}\right)

E\left(X\right)=1\left(\dfrac{3}{6}\right)+2\left(\dfrac{2}{6}\right)+3\left(\dfrac{1}{6}\right)

E\left(X\right)=\dfrac{10}{6}=\dfrac{5}{3}

Standard deviation and variance of random variables

If X is a discrete random variable whose possible values are N_1,N_2,\ldots, N_\ell and whose respective probabilities are p_1,p_2,\ldots p_\ell, then:

E\left(X^2\right)=p_1N_1^2+p_2N_2^2+\cdots +p_\ell N_\ell^2

Standard deviation

The standard deviation of a discrete random variable X is:

\sigma_X=\sqrt{E\left(X^2\right)-\left(E\left(X\right)\right)^2}

Flipping a fair coin has exactly a 50\% chance of landing on either heads or tails. Let X be the random variable which assigns a value of 1 for heads and a value of 2 for tails.

We already know that:

E\left(X\right)=1.5

The expected value of X^2 is

E\left(X^2\right)=1^2\left(.5\right)+2^2\left(.5\right)=.5+2=3

Therefore, the standard deviation of the discrete random variable X is:

\sigma_X=\sqrt{E\left(X^2\right)-\left(E\left(X\right)\right)^2}=\sqrt{3-1.5^2}=\sqrt{0.75}

Variance

The variance of a random variable X is the square of the standard deviation of X :

Var\left(X\right)=\sigma_X^2

In the example of flipping a coin, we know that:

\sigma_X=\sqrt{0.75}

Therefore, the variance of X is:

Var\left(X\right)=\sqrt{0.75}^2=0.75

The standard deviation of a random variable is a measure of the degree for which the random variable differs from the expected value.

Let X_1 be the random variable which assigns a value of 1 to heads and a value of 2 to tails in flipping a fair coin. We know that the expected value and standard deviation of X_1 are:

E\left(X_1\right)=1.5
\sigma_{X_1}=\sqrt{0.75}\approx 0.866

Let X_2 be the random variable which assigns a value of 1 to heads and a value of \dfrac{5}{3} to tails in flipping an unfair coin which has a 0.25 probability of landing on heads and a 0.75 probability of landing on tails. Then:

E\left(X_2\right)=1\left(0.25\right)+\dfrac{5}{3}\left(0.75\right)=1.5
E\left(X^2_2\right)=1^2\left(0.25\right)+\left(\dfrac{5}{3}\right)^2\left(0.75\right)=\dfrac{7}{3}
\sigma_{X_2}=\sqrt{E\left(X^2_2\right)-\left(E\left(X_2\right)\right)^2}=\sqrt{\dfrac{1}{12}}\approx 0.289

Observe that the random variables X_1 and X_2 have the same expected values but the standard deviation of X_2 is less than the standard deviation of X_1. Therefore, the values of the random variable X_2 are closer to the expected value than those of the random variable X_1.

The binomial distribution

Model

Binomial experiment

Let n be a natural number. A binomial experiment is an experiment consisting of n identical trials. Each trial has only two possible outcomes: a success or a failure. We denote by p the probability of a success in one trial.

Flipping a fair coin n times and counting landing on heads as a success is an example of a binomial experiment. The probability of a success in one trial is p=.5.

Binomial random variable

Let X be the discrete random variable which measures the number of successes in a binomial experiments. X is called a binomial random variable.

Binomial probability distribution

Suppose X is a binomial random variable and let p be the probability of a success. The probability there will be k success (k being an integer) in n trials is given by the following formula:

P\left(k\right)={n\choose k}p^k\left(1-p\right)^{n-k}

Consider the binomial random variable X of flipping a fair coin n times which counts landing on a heads as a success. The probability of a success in one trial is p=.5.

The probability of having 3 successes in 5 trials is:

P\left(3\right)={5\choose 3}\left(.5\right)^{3}\left(1-.5\right)^{5-3}=\dfrac{10}{32}=\dfrac{5}{16}

Consider the binomial random variable X of selecting a ball at random from a hat which contains 17 balls, 3 of which are red and 14 of which are blue. Let selecting a red ball count as a success.

The probability of a success in a single trial is p=\dfrac{3}{17}.

The probability of having 2 success in 7 trials is:

P\left(2\right)={7\choose 2}\left(\dfrac{3}{17}\right)^{2}\left(1-\dfrac{3}{17}\right)^{7-2}\approx 0.25

Recall that if n,k are integers with 0\leq k \leq n, then:

{n\choose k}=\dfrac{n!}{k!\left(n-k\right)!}

If there are n experiments in an a binomial distribution, then there are exactly n\choose k ways of having k successful experiments and n-k failures.

For one given way, the value p^k\left(1-p\right)^{n-k} is the probability of having exactly k successes in n trials.

Therefore, the probability of having exactly k successes in n trials in any given order is:

P\left(k\right)={n\choose k}p^k\left(1-p\right)^{n-k}

Properties

The domain of the binomial distribution function of a binomial random variable conducted on n trials is the set of natural numbers \{0{,}1{,}2{,}3,\ldots, n\}. This is because when conducting n experiments the number of successes is a whole number between 0 and n.

Mean of a binomial random variable

The mean of a binomial random variable X with n trials and a probability of p for a single success is given by the following formula:

\mu_X=np

Consider the binomial random variable X when selecting a random integer between 1 and 10. Call selecting a number greater than or equal to 7 a success. The probability of a success is \dfrac{4}{10}=0.4. The mean of the binomial random variable X in 8 experiments is:

\mu_X=8\cdot 0.4=3.2

Standard deviation of a binomial random variable

The standard deviation of a binomial random variable X with n trials and a probability of p for a single success is given by the following formula:

\sigma_X=\sqrt{np\left(1-p\right)}

Consider again the binomial random variable X when selecting a random integer between 1 and 10. Selecting a number greater than or equal to 7 is called a success. The standard deviation of the binomial random variable X in 8 experiments is:

\sigma_X=\sqrt{8\left(0.4\right)\left(1-0.4\right)}\approx 1.386

Variance of a binomial random variable

The variance of a binomial random variable X with n trials and a probability of p for a single success is given by the following formula:

Var\left(X\right)=np\left(1-p\right)

Consider again the binomial random variable X when selecting a random integer between 1 and 10. Selecting a number greater than or equal to 7 is considered a success. The variance of the binomial random variable X in 8 experiments is:

Var\left(X\right)=8\left(0.4\right)\left(1-0.4\right)=1.92

Find probabilities

By formula

The formula providing the probability of a given number of successes in a sequence of n trials is easily extended to compute probabilities of a number of successes in a sequence of n trials.

Consider the binomial random variable X which counts landing on heads as a success in flipping a fair coin n times. The probability of a success in one trial is p=.5.

We wish to compute the probability of having at least 3 successes in 5 trials.

Note that this probability equals the probability of having 3 successes, 4 successes, or 5 successes. Therefore:

P\left(X=3\right)+P\left(X=4\right)+P\left(X=5\right)={5\choose 3}\left(.5\right)^{3}\left(1-.5\right)^{5-3}+{5\choose 4}\left(.5\right)^{4}\left(1-.5\right)^{5-4}+{5\choose 5}\left(.5\right)^{5}\left(1-.5\right)^{5-5}=\dfrac{1}{2}

Using tables

One can find probabilities of a binomial distribution by using a binomial distribution table.

If one wanted to compute the probability that a binomial random variable would have no more than x=2 successes in n=6 trials when the probability of a success in one trial is p=0.3, then one can find the probability using the above table as follows:

Go to the row where n=6 and x=2.
Read off the value in this row which is also in the column where p=0.3.
The number to find is 0.7\ 443 and is highlighted in yellow in the following table:

Therefore, the probability of no more than 2 successes in 6 trials of a binomial random variable which has success rate of 0.3 is 0.7\ 443.

III

The normal distribution

Model

Normal Distribution

A normal distribution model is a model in which most values in the distribution are centered around the mean value.

The heights of people in a city is an example of a normally distributed model.

Observe how the majority of people's height are centered around the value of 180 centimeters.

Normal distribution notation

If X is a normally distributed random variable with mean \mu and standard deviation \sigma, then we write:

X\sim N\left(\mu,\sigma\right)

Let X be the normally distributed variable which is the measure of people's height in a particular city. Suppose that the mean height of a person is 180 centimeters and the standard deviation of X is 9 centimeter. Then X=N\left(180{,}9\right).

Normally distributed random variables are used to study behaviors and phenomena such as people's height, their weight, or the average age of a tree in a forest.

Properties

The domain of continuous normal distribution model N\left(\mu,\sigma\right) is all real numbers.

Suppose X is a random variable. There are two measurements of X which are related to the mean of X :

The median of X is the mid-value of all possible values of X.
The mode of X is the most frequently occurring value of X.

If X\sim N\left(\mu,\sigma\right) is a standard distribution, then the mean, median, and mode of X are all the same value.

The area underneath the bellcurve defining continuous normal distribution N\left(\mu,\sigma\right) is 1.

The standard normal distribution

Standard normal distribution

A standard normal distribution is a normal distribution which has a mean of 0 and a standard deviation of 1.

Let k be a real number. The area under the graph of the standard normal distribution bellcurve which falls to the left of a deviation amount of k is denoted by P\left(X \lt k\right).

Let k be a real number. P\left(X \lt k\right) can be read from normal distribution tables.

To find P\left(X \lt 0.83\right), one reads it from the above table by finding the intersection of the row labeled 0.8 and the column labeled 0.03.

We find a value of 0.7\ 967 :

p\left(X\lt0.83\right)=0.7\ 967

It indicates that 79.67\% of the samples in a standard normal distribution are below 0.83 deviations from the mean.

The value of 0.7967 is the area of the shaded region under the graph of the standard normal distribution curve.

The amount of area under the graph of the standard normal distribution bellcurve which falls to the left of a deviation amount of k \gt 0 and to the right of the deviation amount -k is denoted by P\left(-k \lt X \lt k\right).

Due to the symmetric nature of a random variable X which has a standard normal distribution :

P\left(-k \lt X \lt k\right)=P\left(X \lt k\right)-P\left(X \lt -k\right)=2\left(P\left(X \lt k\right)-0.5\right)

Suppose X follows a standard normal distribution and we want to compute P\left(-0.83 \lt X \lt 0.83\right).

We get:

P\left(-0.83 \lt X \lt 0.83\right)=2\left(P\left(0.83\right)-0.5\right)=2\left(.7\ 967-0.5\right)=0.5\ 934

Normally distributed random variables

z-score

Suppose X\sim N\left(\mu,\sigma\right) is a normally distributed random variable and x is a particular value of the random variable X. Then the z-score of x is the number of standard deviations x is away from the mean \mu. Specifically, the z-score of x is:

\dfrac{x-\mu}{\sigma}

Suppose X\sim N\left(3{,}1.2\right), then the z-score of 5 is:

\dfrac{5-3}{1.2}\approx 1.667

Suppose X\sim N\left(\sigma,\mu\right) is a normally distributed random variable and x is a particular value of X.

The z-score of x is positive if and only if x is larger than the mean \mu.
The z-score of x is negative if and only if x is smaller than the mean \mu.

Suppose X=N\left(72{,}4\right) is the normally distributed random variable of scores on an exam.

An exam score of 65 has a negative z-score.
An exam score of 80 has a positive z-score.

Let X be a random variable that follows the normal distribution \mathcal{N}\left(\mu,\sigma\right) and Z be a random variable that follows a normal distribution \mathcal{N}\left(0{,}1\right).

We are able to use the standard normal distribution table of \mathcal{N}\left(0{,}1\right) in reverse to find what the probability that X is below a certain value of x.

More precisely, for any value x\in\mathbb{R} :

P\left(X≤x\right)=P\left(X-\mu≤x-\mu\right)

P\left(X≤x\right)=P\left(\dfrac{X-\mu}{\sigma}≤\dfrac{x-\mu}{\sigma} \right)

P\left(X≤x\right)=P\left(Z≤\dfrac{x-\mu}{\sigma} \right)

At this point, we can find the value of P\left(Z≤\dfrac{x-\mu}{\sigma} \right) directly in the standard normal distribution table.

Suppose X is the normally distributed random variable of the score of an exam taken by a group of 500 students. The average score on the exam is a \mu=73 and the standard deviation is \sigma=9. Therefore:

X\sim N\left(73{,}9\right)

We want to know the score for which 95\% of the students fell beneath. Look at the normal distribution table and find 0.95, or the number closest to it.

The number closest to 0.95 is 0.955, which corresponds to a z-score of 1.66. To find the upper score for which 95\% of the students fell beneath, solve the following equation for x :

\dfrac{x-\mu}{\sigma}=1.66

\dfrac{x-73}{9}=1.66

x-73=14.94

x=87.94\approx 90

Therefore, 95\% of the exam scores fell below a score of 90.

Let X be a random variable that follows the normal distribution \mathcal{N}\left(\mu,\sigma\right) and Z a random variable that follows a normal distribution \mathcal{N}\left(0{,}1\right).

We are able to use the standard normal distribution table of \mathcal{N}\left(0{,}1\right) in reverse to find what the probability that X is between the values \mu+x and \mu-x.

More precisely, for any value x\in\mathbb{R^+} :

P\left(\mu-x≤X≤\mu+x\right)=P\left(-x≤X-\mu≤x\right)

P\left(\mu-x≤X≤\mu+x\right)=P\left(\dfrac{-x}{\mu}≤\dfrac{X-\mu}{\sigma}≤\dfrac{x}{\sigma} \right)

P\left(\mu-x≤X≤\mu+x\right)=P\left(\dfrac{-x}{\sigma}≤Z≤\dfrac{x}{\sigma} \right)

P\left(\mu-x≤X≤\mu+x\right)=2\left(P\left(Z≤\dfrac{x}{\sigma}\right)-0.5\right)

At this point, we can find the value of P\left(Z≤\dfrac{x}{\sigma}\right) directly in the standard normal distribution table and get the value of 2\left(P\left(Z≤\dfrac{x}{\sigma}\right)-0.5 \right).

Suppose X is the normally distributed random variable of the previous example. Suppose now that we wanted to find a low score, L, and a high score H, for which 80\% of the exam scores fell in between. L and H and are evenly distributed around the mean score \mu=73. Due to the symmetric nature of a normally distributed random variable we are looking for a score L for which only 10\% of the scores fell beneath and a score H for which 90\% fell beneath. This is because if we find the score for which 90\% of the scores fell beneath and subtracted away the bottom 10\%, then we are left with the 80\% scores evenly distributed around the mean.

We first find the score for which 90\% of the students fell beneath using the normal distribution table in reverse.

The closest score in the table to 0.90 is 0.8\ 997 which corresponds to a z-score of 1.28. We solve the following equation for x :

\dfrac{x-73}{9}=1.28

x=84.52\approx 85

Therefore, 90\% of the scores fell below an 85. To find the score for which only 10\% of the scores fell below, we would like to find 0.10 on the table and solve the equation as before. However, the numbers on the normal distribution table only go as low as 0.50 because normally distributed random variables are symmetric about the mean.

Thus to find the value for which 10\% of the scores fell below, we take the value for 90\% of the students that fell below which is 85. Find the distance between the mean and 85 :

85-72=13

Lastly, subtract 13 from the mean:

72-13=59

Therefore, we can conclude that only 10\% of the scores fell below a 59. 80\% of the exam scores fell within the range of 59 to 85.

Probability distributions Statistics & Probabilities

Summary

Generalities on random variables

Discrete or continuous random variables

Random variable

Discrete random variable

Continuous random variable

Expected value of random variables

Expected value

Standard deviation and variance of random variables

Standard deviation

Variance

The binomial distribution

Model

Binomial experiment

Binomial random variable

Binomial probability distribution

Properties

Mean of a binomial random variable

Standard deviation of a binomial random variable

Variance of a binomial random variable

Find probabilities

By formula

Using tables

The normal distribution

Model

Normal Distribution

Normal distribution notation

Properties

The standard normal distribution

Standard normal distribution

Normally distributed random variables

z-score