# ProbabilityExpectation and Variance

We often want to distill a random variable's distribution down to a single number. For example, consider the height of an individual selected uniformly at random from a given population. This is a random variable, and communicating its distribution would involve communicating the heights of every person in the population. However, we can summarize the distribution by reporting an *average* height: we add up the heights of the people in the population and

If the random individual is selected according to some non-uniform probability distribution on the population, then it makes sense to calculate a **expectation**.

**Definition**

The **expectation** (or **mean** ) of a random variable is the *probability-weighted average of *:

For example, the expected number of heads in two fair coin flips is

There are two common ways of interpreting expected value.

- The expectation may be thought of as the value of a random game with payout . According to this interpretation, you should be willing to pay anything less than $1 to play the game where you get a dollar for each head in two fair coin flips. For more than $1 you should be unwilling to play the game, and at $1 you should be indifferent.
- The second way of thinking about expected value is as a
*long-run average*. If you play the dollar-per-head two-coin-flip game a very large number of times, then your average payout per play is very likely to be close to $1.

We can test this second interpretation out:

**Exercise**

Use the expression `sum(randint(0,2) + randint(0,2) for _ in range(10**6))/10**6`

`mean(rand(0:1) + rand(0:1) for k=1:10^6)`

to play the dollar-per-head two-coin-flip game a million times and calculate the average payout in those million runs.

How close to 1 is the result typically? Choose the best answer.

from numpy.random import randint sum(randint(0,2) + randint(0,2) for _ in range(10**6))/10**6

mean(rand(0:1) + rand(0:1) for k=1:10^6)

*Solution.* Running the code several times, we see that the error is seldom as large as 0.01 or as small as 0.0000001. So the correct answer choice is the third one.

We will see that this second interpretation is actually a *theorem* in probability, called the **law of large numbers**. In the meantime, however, this interpretation gives us a useful tool for investigation: if a random variable is easy to simulate, then we can sample from it many times and calculate the average of the resulting samples. This will not give us the expected value exactly, but we can get as close as desired by using sufficiently many samples. This is called the **Monte Carlo** method of approximating the expectation of a random variable.

**Exercise**

Use a Monte Carlo simulation to estimate the expectation of , where and are independent die rolls.

import numpy as np

*Solution.* `sum(randint(1,7)/randint(1,7) for i in range(10_000_000))/10_000_000`

returns approximately 1.43. The actual mean is `sum(x/y for x in range(1,7) for y in range(1,7))/36`

, which is . So we can say that the Monte Carlo result with 10 million trials is quite close to the correct value.

*Solution.* `mean(rand(1:6)/rand(1:6) for i=1:10^8)`

returns approximately 1.43. The actual mean is `mean(x/y for x=1:6, y=1:6)`

, which is

The following exercise confirms an intuitive fact about expectation: a random variable which is always larger than another has a larger mean. We will state this idea with "larger" replaced by its weak version "at least as large as".

**Exercise**

Explain why

*Solution.* If

## Expectation and distribution

Although the definition *distribution* of

**Theorem**

The expectation of a discrete random variable

The idea is that the given formula is just a rearrangement of the terms in the definition of expectation. Let's begin by considering an example. Suppose

We can group the first two terms together to get

This expression is the one we would get if we wrote out

Therefore, we can see that the two sides are the same.

Let's write this idea down in general form. We group terms on the right-hand side in the formula

Then we can replace

Since

as desired.

**Exercise**

The expectation of a random variable need not be finite or even well-defined. Show that the expectation of the random variable which assigns a probability mass of

Consider a random variable

*Solution.* We multiply the probability mass at each point

For the second distribution, the positive and negative parts of the are both infinite for the same reason. Therefore, the sum does not make sense and the mean is therefore not well-defined.

We can also work out the expectation of a function of two

**Theorem**

If

*Proof.* We use the same idea we used in the proof of the expectation formula: group terms in the definition of expectation according the value of the pair

We can use this theorem to show that expectation distributes across multiplication for independent random variables:

**Exercise** (independence product formula)

Show that

*Solution.* Using the definition of independence, we have

as desired.

## Variance

The expectation of a random variable gives us some coarse information about where on the number line the random variable's probability mass is located. The **variance** gives us some information about how widely the probability mass is spread around its mean. A random variable whose distribution is highly concentrated about its mean will have a small variance, and a random variable which is likely to be very far from its mean will have a large variance. We define the variance of a random variable

**Definition** (Variance)

The variance of a random variable

The standard deviation

**Exercise**

Consider a random variable which is obtained by making a selection from the list

uniformly at random. Make a rough estimate of the mean and variance of this random variable just from looking at the number line. Then use Python to calculate the mean and variance exactly to see how close your estimates were.

import numpy as np

*Solution.* My estimate of the mean and variance are

Calculating the mean exactly using `m = mean([0.245, 0.874, 0.998, 0.567, 0.482])`

, we get a value of 0.6332. Calculating the variance exactly using `mean([(a-m)^2 for a in A])`

(where

**Exercise**

Consider the following game. We begin by picking a number in

Tips: `rand(0:1000)/1000`

returns a sample from the desired distribution. Also, it's a good idea to wrap a single run of the game into a zero-argument function.

import numpy as np

*Solution.* We define a function `run`

which plays the game once, and we record the result of the game over a million runs. We estimate the mean as the mean of the resulting list, and we estimate the variance using

```
import numpy as np
def runs_till_over():
s = 0
ctr = 0
while s < 1.0:
s += np.random.randint(0,1001)/1000
return ctr
A = [runs_till_over() for _ in range(1_000_000)]
μ = np.mean(A)
var = np.mean((a-μ)**2 for a in A)
μ,var
```

```
function runs_till_over()
s = 0
ctr = 0
while s < 1.0
s += rand(1:1000)/1000
end
ctr
end
A = [runs_till_over() for _ in 1:1_000_000]
μ = mean(A)
var = mean((a-μ)^2 for a in A)
μ,var
```

We get a mean of about

We can use linearity of expectation to rewrite the formula for variance in a simpler form:

We can use this formula to show how variance interacts with linear operations:

**Exercise**

Show that variance satisfies the properties

if **independent** random variables,

*Proof.* The first part of the statement follows easily from linearity of expectation

Since

Rearranging and using linearity of expectation, we get

The desired result follows because if

**Exercise**

Consider the distribution which assigns a probability mass of

Show that this distribution has a finite mean but not a finite variance.

*Solution.* Let

Since the sum on the right converges by the

does not converge because of the harmonic series term. Therefore