Module 09. Uncertainty in Models

Lecture Date 1: October 27, 2021 - Wednesday
Lecture Date 2: November 1, 2021 - Monday Slides-1
Lecture Date 3: November 3, 2021 - Wednesday Slides-2
Lecture Date 4: November 8, 2021 - Monday Slides-3
Lecture Date 5: November 10, 2021 - Wednesday
Lecture Date 6: November 15, 2021 - Monday Slides-4

Uncertainty is an inherent feature of many real world systems ranging from the weather to population behavior. In these series of lectures, we will try to understand why uncertainty exists and how to account for it. We will develop Python code to represent some simple uncertainties. We will also talk about more advanced uncertainty cases where equal chances is no longer the case. Finally, we will investigate the concept of Monte Carlo Simulation. Throughout these lectures, we will heavily use NumPy and Matplotlib.

Table of contents:

Random Numbers and the Concept of Equal Chances
The Concept of Unequal Chances
Monte Carlo Simulation

1. Random Numbers and the Concept of Equal Chances

Download code

import random
import numpy as np
import matplotlib.pyplot as plt

1.1. Default random number generation [0,1)

random.random() # built-in python random number generator

0.06307777120650893

np.random.random() # numpy random number generator (preffered)

0.3906705873305315

np.random.random(5) # 5 values to be generated

array([0.65348309, 0.27415441, 0.01372371, 0.71346053, 0.54695058])

np.random.random((2,3))

array([[0.61078621, 0.07792854, 0.11373331],
       [0.04665458, 0.92737203, 0.3861151 ]])

1.2. Random Integer Generation

np.random.randint(5) # generates a number between 0 (inc) and 5 (exl)

np.random.randint(10,20) # generate a number between 10 and 20

np.random.randint(5,size=10)

array([3, 4, 0, 0, 3, 1, 4, 3, 1, 1])

np.random.randint(10,20,size=10)

array([10, 15, 16, 10, 10, 13, 15, 12, 19, 15])

np.random.randint(0,5,size=(3,4))

array([[4, 4, 0, 2],
       [1, 3, 4, 4],
       [0, 0, 4, 4]])

1.3. The concept of equal chances

np.random.uniform(0.,1.) # equal to np.random.random()

0.4921918549158054

np.random.uniform(-500.0, 500.0) # a random float between -500 and +500

389.21373647640064

np.random.uniform(0.0, 10.0, size=100) # 100 random floats between 0 and 10

array([7.73123491, 7.72278679, 7.84984113, 3.64597633, 3.09166542,
       2.1372517 , 4.56584416, 4.55322728, 1.49055408, 7.80140388,
       8.32915531, 6.80373627, 2.62093904, 3.28699214, 3.16933746,
       9.47962056, 4.59755548, 9.93780824, 8.50222742, 0.67033916,
       0.7680997 , 3.89928015, 5.10750796, 0.23213316, 7.22972843,
       5.45403522, 0.14627873, 2.31024876, 3.64709778, 4.94125794,
       8.9890525 , 4.96857027, 7.82692512, 3.82440752, 6.08652705,
       3.15458908, 6.77813816, 2.34337811, 2.47766461, 4.47326407,
       7.16964651, 5.03041095, 2.27432346, 2.58464947, 8.31960573,
       6.1079084 , 4.06609776, 9.62319325, 5.44977887, 9.72312348,
       3.35883305, 1.91267416, 1.78270747, 4.72766223, 4.33097185,
       8.65538898, 2.61953746, 4.05091517, 9.78499707, 1.64899334,
       2.74467793, 4.71359807, 8.08675696, 0.89846275, 4.35739942,
       3.7082654 , 4.31793983, 5.00649306, 8.50431795, 8.02075301,
       4.72356718, 2.33185499, 7.92272907, 8.60834256, 7.22456524,
       5.34957216, 2.64242313, 7.09629589, 7.40626942, 0.07142494,
       5.50727335, 4.06497208, 3.82767186, 0.47236227, 1.8954008 ,
       9.9604546 , 4.94970934, 6.63181853, 1.11178444, 9.53742493,
       3.4794558 , 8.80279326, 5.51649455, 6.99048747, 6.50757072,
       8.6984726 , 8.81226183, 8.86915319, 6.22003787, 3.14325275])

np.random.uniform(0.0, 10.0, size=(3,3)) # 3x3 matrix populated w/ random floats between 0 and 10

array([[1.97134968, 9.8719721 , 3.25614689],
       [3.63576096, 7.91214218, 3.48837422],
       [9.66394753, 8.91367194, 3.30569732]])

1.4. Shuffling and permutation

cards_diamonds = [ str(i)+"_diamonds" for i in range(1,11)]
cards_spades = [str(i)+"_spades" for i in range(1,11)]
cards_hearts = [ str(i)+"_hearts" for i in range(1,11)]
cards_clubs = [ str(i)+"_clubs" for i in range(1,11)]

all_cards = np.concatenate([cards_diamonds, cards_spades, cards_hearts, cards_clubs]) 

print(all_cards)

['1_diamonds' '2_diamonds' '3_diamonds' '4_diamonds' '5_diamonds'
 '6_diamonds' '7_diamonds' '8_diamonds' '9_diamonds' '10_diamonds'
 '1_spades' '2_spades' '3_spades' '4_spades' '5_spades' '6_spades'
 '7_spades' '8_spades' '9_spades' '10_spades' '1_hearts' '2_hearts'
 '3_hearts' '4_hearts' '5_hearts' '6_hearts' '7_hearts' '8_hearts'
 '9_hearts' '10_hearts' '1_clubs' '2_clubs' '3_clubs' '4_clubs' '5_clubs'
 '6_clubs' '7_clubs' '8_clubs' '9_clubs' '10_clubs']

shuffled_cards = np.random.permutation(all_cards) # does not change the original array

shuffled_cards

array(['6_hearts', '10_hearts', '4_diamonds', '8_spades', '10_clubs',
       '9_clubs', '2_hearts', '2_spades', '7_hearts', '10_diamonds',
       '10_spades', '3_spades', '1_clubs', '8_clubs', '5_hearts',
       '4_spades', '1_hearts', '7_spades', '9_spades', '1_spades',
       '5_clubs', '3_clubs', '3_hearts', '2_clubs', '1_diamonds',
       '5_diamonds', '6_spades', '9_diamonds', '6_diamonds', '4_clubs',
       '8_diamonds', '2_diamonds', '7_diamonds', '9_hearts', '4_hearts',
       '3_diamonds', '5_spades', '7_clubs', '6_clubs', '8_hearts'],
      dtype='<U11')

np.random.shuffle(all_cards) # does change the original array

all_cards

array(['9_hearts', '5_clubs', '3_diamonds', '1_diamonds', '6_hearts',
       '8_hearts', '3_clubs', '1_clubs', '9_diamonds', '5_spades',
       '6_clubs', '2_diamonds', '10_diamonds', '2_hearts', '3_hearts',
       '1_spades', '10_clubs', '2_spades', '8_clubs', '4_hearts',
       '1_hearts', '7_clubs', '10_hearts', '6_diamonds', '10_spades',
       '8_spades', '9_clubs', '4_spades', '7_diamonds', '3_spades',
       '4_diamonds', '8_diamonds', '6_spades', '5_diamonds', '7_spades',
       '9_spades', '2_clubs', '4_clubs', '7_hearts', '5_hearts'],
      dtype='<U11')

2. The Concept of Unequal Chances

Download code

import numpy as np
import matplotlib.pyplot as plt

2.1. The traffic example

# easy but unpractical solution
chance = np.random.uniform(0.0, 100.0)

if chance <= 15:
    print("Turn left")
elif 15 < chance <= 95:
    print("Go straigt")
else:
    print("Turn right")

Turn left

def direction():
    chance = np.random.uniform(0.0, 100.0)

    if chance <= 15:
        print("Turn left")
    elif 15 < chance <= 95:
        print("Go straigt")
    else:
        print("Turn right")

for i in range(10):
    direction()

Turn left
Go straigt
Go straigt
Go straigt
Go straigt
Turn left
Go straigt
Go straigt
Turn right
Go straigt

# more practical solution
direction = [1,2,3]
probabilities = [0.15, 0.8, 0.05]
selected_direction = np.random.choice(direction, p=probabilities)

if selected_direction == 1:
    print("Turn left")
elif selected_direction == 2:
    print("Go straight")
else:
    print("Turn right")

Go straight

def direction2():
    direction = [1,2,3]
    probabilities = [0.15, 0.8, 0.05]
    selected_direction = np.random.choice(direction, p=probabilities)

    if selected_direction == 1:
        print("Turn left")
    elif selected_direction == 2:
        print("Go straight")
    else:
        print("Turn right")

for i in range(10):
    direction2()

Go straight
Turn left
Go straight
Go straight
Go straight
Go straight
Go straight
Go straight
Go straight
Go straight

2.2. Pick a number[1,10] exercise (from Fall 2019 semester)

# numbers picked in the classroom
picked_numbers = [9, 8, 4, 7, 9, 8, 7, 6, 8, 8, 7, 4, 7, 4, 2, 7, 4, 1, 2, 3, 9]

plt.hist(picked_numbers)
plt.show()

png

# numbers picked in by random number generator
randomly_picked = np.random.randint(1,11, size=len(picked_numbers))

plt.hist(randomly_picked)
plt.show()

png

len(randomly_picked)

# what if random number generator picks more
randomly_picked = np.random.randint(1,11, size=100)

plt.hist(randomly_picked)
plt.show()

png

2.3. Normal distribution

# generate 10 thousand numbers sampled from a normal distribution with mean=0, std deviation=0.5
norm1 = np.random.normal(0, 0.5, size=10000)

plt.hist(norm1, bins=50)
plt.axvline(x=0, color='k')
plt.show()
# as you can see in the histogram below, values are mostly dispersed 3 times greater than the standard deviation

png

# generate 10 thousand numbers sampled from a normal distribution with mean=5, std deviation=10
norm2 = np.random.normal(5, 10, size=10000)

plt.hist(norm2, bins=50)
plt.axvline(x=5, color='k')
plt.show()

png

2.4. Poisson distribution

# generate 16 numbers sampled from a poisson distribution with lamda (intensity)=100
poiss1 = np.random.poisson(100, size=16)

poiss1

array([102,  93,  90, 107, 100, 119, 107, 120, 113,  87, 114, 104,  98,
        86, 100,  97])

# the mean is close to the intensity value we gave in the beginning
poiss1.mean()

102.3125

plt.hist(poiss1)
plt.axvline(x=100, color='k')
plt.show()

png

# if we created more values with the same parameters, we would see a smoother distribution
poiss2 = np.random.poisson(100, size=1000)

plt.hist(poiss2)
plt.axvline(x=100, color='k')
plt.show()

png

3. Monte Carlo Simulation

Download code

3.1. Coin flipping example

import numpy as np
import matplotlib.pyplot as plt

# two possible values
# 0: Heads
# 1: Tails
np.random.choice([0,1])

# five coin flips
np.random.choice([0,1],size=5)

array([0, 0, 0, 0, 1])

# 10 runs
observations = np.random.choice([0,1],size=10)

observations

array([1, 0, 0, 0, 0, 1, 1, 0, 1, 0])

odds_tails = 100 * observations.sum() / len(observations)  # number of times we observed 1's / total tosses
odds_heads = 100 - odds_tails
print(odds_tails)
print(odds_heads)

40.0
60.0

runs = [1, 10, 100, 1000, 10000]
for run in runs:
    observations = np.random.choice([0,1],size=run)
    odds_tails = 100 * observations.sum() / len(observations)  # number of times we observed 1's / total tosses
    odds_heads = 100 - odds_tails
    print("After ",run, " runs - odds of seeing heads is ", odds_heads)

After  1  runs - odds of seeing heads is  100.0
After  10  runs - odds of seeing heads is  70.0
After  100  runs - odds of seeing heads is  45.0
After  1000  runs - odds of seeing heads is  54.0
After  10000  runs - odds of seeing heads is  50.42

3.1.1. Coin flipping example with sequential combinations

one_sequence = np.random.choice([0,1],size=3)
print(one_sequence)

[0 1 0]

3.1.2. Slow running code

counter = 0
num_of_runs = 10
for i in range(num_of_runs):
    one_sequence = np.random.choice([0,1],size=3)
    if one_sequence.sum() == 0:
        counter = counter + 1
p=counter/num_of_runs
print("Number of runs:", num_of_runs, " - P(Heads,Heads,Heads)", p)
    

Number of runs: 10  - P(Heads,Heads,Heads) 0.1

# copy/pasted from above - slow version
counter = 0
num_of_runs = 1000000
for i in range(num_of_runs):
    one_sequence = np.random.choice([0,1],size=3)
    if one_sequence.sum() == 0:
        counter = counter + 1
p=counter/num_of_runs
print("Number of runs:", num_of_runs, " - P(Heads,Heads,Heads)", p)

Number of runs: 1000000  - P(Heads,Heads,Heads) 0.124826

all_runs = np.random.randint(0,2,size=(10,3))

print(all_runs)

[[0 0 0]
 [0 1 1]
 [1 0 0]
 [0 1 1]
 [1 1 1]
 [0 0 1]
 [0 0 0]
 [0 0 1]
 [1 0 0]
 [0 1 1]]

counts = all_runs.sum(1)

zero_counts = counts==0

zero_counts.sum()

p = zero_counts.sum() / 10
print(p)

0.2

3.1.3. Faster running code

# faster version
num_of_runs = 100000000

entire_runs = np.random.choice([0,1],size=(num_of_runs,3))
entire_runs_count = entire_runs.sum(1)
count_of_HHH = (entire_runs_count == 0).sum()
p=count_of_HHH/num_of_runs
print("Number of runs:", num_of_runs, " - P(Heads,Heads,Heads)", p)
    

Number of runs: 100000000  - P(Heads,Heads,Heads) 0.12500933

3.2. The Pi number example

r = 1
# first location randomly set
x = np.random.uniform(-r,r)
y = np.random.uniform(-r,r)
print(x,y)

-0.4664022628726703 -0.9065451138461

import math
# if the distance to the center is lower than radius, it means the dart is within the circle
length = math.sqrt(x ** 2 + y ** 2)

print(length)

1.0194876724369872

# many runs....
runs = [10, 100, 1000, 10000, 100000]

for run in runs:
    count = 0
    
    for i in range(run):
        x = np.random.uniform(-r,r)
        y = np.random.uniform(-r,r)
        length = math.sqrt(x ** 2 + y ** 2)
        if length < r:
            count = count + 1
    
    p =  count / run
    # divided it by 4 to estimate the pi number
    pi =  p * 4
    #print("Ratio of darts inside circle:", p)
    print("Estimated pi value for ",run, " runs: ", pi)

Estimated pi value for  10  runs:  2.8
Estimated pi value for  100  runs:  2.8
Estimated pi value for  1000  runs:  3.088
Estimated pi value for  10000  runs:  3.1192
Estimated pi value for  100000  runs:  3.14476

3.3. Yahtzee example

def yahtzee():
    number_of_dice_kept = 0
    value_we_keep = 0

    for i in range(3):
        dice = np.random.randint(1,7,size=5-number_of_dice_kept)

        if value_we_keep > 0:
            for die in dice:
                if die == value_we_keep:
                    number_of_dice_kept = number_of_dice_kept + 1
        else:
            counts = np.zeros(6)
            for die in dice:
                counts[die-1] += 1

            number_of_dice_kept = int(counts.max())
            if number_of_dice_kept == 1:
                value_we_keep = 0
                number_of_dice_kept = 0
            else:
                value_we_keep = counts.argmax() + 1
    # option 1
    return number_of_dice_kept == 5 
    
    # Option 2
    # if number_of_dice_kept == 5:
    #     return True
    # else:
    #     return False
        

# monte carlo simulation
N=100000
number_of_yahtzee_scores = 0

for i in range(N):
    outcome = yahtzee()
    if outcome == True:
        number_of_yahtzee_scores = number_of_yahtzee_scores + 1

print("Ratio:", number_of_yahtzee_scores/N)
print("Percentage:", 100 * number_of_yahtzee_scores/N)

Ratio: 0.04506
Percentage: 4.506

3.4. Determining number of runs (N)

N=100

# 11/18/2019 lecture slide #7
N=100 # step 1
all_results_100 = []

for j in range(100): # step 5
    np.random.seed(j) # step 2

    number_of_yahtzee_scores = 0

    for i in range(N): # Step 3
        outcome = yahtzee()
        if outcome == True:
            number_of_yahtzee_scores = number_of_yahtzee_scores + 1

    estimated_result = number_of_yahtzee_scores/N # step 4
    all_results_100.append(estimated_result)

print(all_results_100)

[0.08, 0.04, 0.03, 0.01, 0.03, 0.04, 0.06, 0.03, 0.03, 0.06, 0.03, 0.02, 0.01, 0.08, 0.05, 0.02, 0.02, 0.01, 0.02, 0.05, 0.04, 0.05, 0.05, 0.05, 0.04, 0.05, 0.08, 0.07, 0.06, 0.07, 0.03, 0.05, 0.03, 0.04, 0.03, 0.06, 0.04, 0.03, 0.08, 0.07, 0.06, 0.08, 0.05, 0.06, 0.04, 0.06, 0.08, 0.06, 0.03, 0.05, 0.08, 0.02, 0.02, 0.04, 0.06, 0.06, 0.05, 0.04, 0.06, 0.04, 0.06, 0.02, 0.05, 0.07, 0.05, 0.02, 0.02, 0.05, 0.05, 0.08, 0.06, 0.03, 0.05, 0.02, 0.01, 0.02, 0.05, 0.09, 0.1, 0.03, 0.09, 0.02, 0.05, 0.03, 0.03, 0.03, 0.04, 0.05, 0.06, 0.09, 0.04, 0.07, 0.05, 0.04, 0.03, 0.05, 0.08, 0.03, 0.07, 0.03]

plt.hist(all_results_100)

(array([ 4., 12., 18., 13., 20., 14.,  6.,  9.,  3.,  1.]),
 array([0.01 , 0.019, 0.028, 0.037, 0.046, 0.055, 0.064, 0.073, 0.082,
        0.091, 0.1  ]),
 <a list of 10 Patch objects>)

png

N=1000

# 11/18/2019 lecture slide #7
N=1000 # step 1
all_results_1000 = []

for j in range(100): # step 5
    np.random.seed(j) # step 2

    number_of_yahtzee_scores = 0

    for i in range(N): # Step 3
        outcome = yahtzee()
        if outcome == True:
            number_of_yahtzee_scores = number_of_yahtzee_scores + 1

    estimated_result = number_of_yahtzee_scores/N # step 4
    all_results_1000.append(estimated_result)

print(all_results_1000)
plt.hist(all_results_1000)

[0.047, 0.044, 0.043, 0.042, 0.04, 0.043, 0.044, 0.046, 0.037, 0.044, 0.049, 0.036, 0.035, 0.045, 0.035, 0.04, 0.048, 0.044, 0.052, 0.043, 0.041, 0.048, 0.047, 0.04, 0.054, 0.047, 0.053, 0.043, 0.043, 0.05, 0.043, 0.04, 0.046, 0.049, 0.051, 0.037, 0.05, 0.05, 0.051, 0.046, 0.044, 0.053, 0.048, 0.044, 0.058, 0.046, 0.05, 0.056, 0.038, 0.056, 0.051, 0.052, 0.042, 0.044, 0.045, 0.05, 0.047, 0.046, 0.044, 0.047, 0.036, 0.04, 0.036, 0.034, 0.043, 0.053, 0.038, 0.053, 0.055, 0.045, 0.057, 0.04, 0.045, 0.04, 0.038, 0.041, 0.046, 0.051, 0.058, 0.053, 0.04, 0.053, 0.047, 0.041, 0.044, 0.045, 0.044, 0.053, 0.046, 0.046, 0.055, 0.042, 0.048, 0.044, 0.055, 0.056, 0.047, 0.042, 0.049, 0.047]

(array([ 6.,  5., 11., 11., 16., 20.,  8., 13.,  4.,  6.]),
 array([0.034 , 0.0364, 0.0388, 0.0412, 0.0436, 0.046 , 0.0484, 0.0508,
        0.0532, 0.0556, 0.058 ]),
 <a list of 10 Patch objects>)

png

# step 6 - calculating the standard deviation
standard_dev_100 = np.std(all_results_100)
print("Standard deviation for N=100:",standard_dev_100)

standard_dev_1000 = np.std(all_results_1000)
print("Standard deviation for N=1000:",standard_dev_1000)

Standard deviation for N=100: 0.020961631615883335
Standard deviation for N=1000: 0.005692547760010451

# calculating the confidence interval
mean_100 = np.mean(all_results_100) 
print(mean_100)

mean_1000 = np.mean(all_results_1000) 
print(mean_1000)

0.04690000000000001
0.04592999999999999

lower_bound_of_confidence_interval_100 = mean_100 - 1.96*standard_dev_100
upper_bound_of_confidence_interval_100 = mean_100 + 1.96*standard_dev_100

# if we use N=100
print("95% of the time, values will fall within ",lower_bound_of_confidence_interval_100,",",upper_bound_of_confidence_interval_100)

95% of the time, values will fall within  0.0058152020328686715 , 0.08798479796713135

lower_bound_of_confidence_interval_1000 = mean_1000 - 1.96*standard_dev_1000
upper_bound_of_confidence_interval_1000 = mean_1000 + 1.96*standard_dev_1000

# if we use N=1000
print("95% of the time, values will fall within ",lower_bound_of_confidence_interval_1000,",",upper_bound_of_confidence_interval_1000)

95% of the time, values will fall within  0.034772606390379504 , 0.05708739360962048

3.4.1. Compact version

def monte_carlo_run(N,number_of_repeats):
    all_results = []

    for j in range(number_of_repeats): # step 5
        np.random.seed(j) # step 2

        number_of_yahtzee_scores = 0

        for i in range(N): # Step 3
            outcome = yahtzee()
            if outcome == True:
                number_of_yahtzee_scores = number_of_yahtzee_scores + 1

        estimated_result = number_of_yahtzee_scores/N # step 4
        all_results.append(estimated_result)

    return all_results

results = monte_carlo_run(N=1000, number_of_repeats=1000)

mean = np.mean(results)
std = np.std(results)
plt.hist(results)
plt.axvline(x=mean,color="black",label="mean")
plt.axvline(x=mean-1.96*std,color="red",label="Lower bound of conf. int.")
plt.axvline(x=mean+1.96*std,color="purple",label="Upper bound of conf. int.")
plt.legend()
plt.show()

png

3.5. Monty Hall problem

# identify a random door with prize
doors = [1,2,3]
prize = np.random.choice(doors)
print("The door with prize:", prize)

The door with prize: 1

# make a choice
choice = np.random.choice(doors)
print("The door we chose:", choice)

The door we chose: 2

set(doors)

{1, 2, 3}

doors_with_no_prize_and_unchosen = set(doors)-set([choice,prize])
print(doors_with_no_prize_and_unchosen)

{3}

the_door_opened_by_host = np.random.choice(list(doors_with_no_prize_and_unchosen))
print(the_door_opened_by_host)

#  Strategy 1: we stick with current one
if prize == choice:
    print("You won")
else:
    print("You lost")

You lost

# Strategy 2: we change our mind and choose the other one
new_choice = set(doors) - set([the_door_opened_by_host]) - set([choice])
print(new_choice)
new_choice_int = list(new_choice)[0]

# we stick with current one
if prize == new_choice_int:
    print("You won")
else:
    print("You lost")

{1}
You won

3.5.1. Compact version of Monty Hall problem

def monty_hall(strategy):
    # identify a random door with prize
    doors = [1,2,3]
    prize = np.random.choice(doors)
    
    # make a choice
    choice = np.random.choice(doors)
    
    doors_with_no_prize_and_unchosen = set(doors)-set([choice,prize])
    
    the_door_opened_by_host = np.random.choice(list(doors_with_no_prize_and_unchosen))

    # if the function returns True, that means, you win
    #                       if False, you lose
    if strategy == 1: # stick with current choice
        return prize == choice
    
    elif strategy == 2: # change the choice
        new_choice = set(doors) - set([the_door_opened_by_host]) - set([choice])
        new_choice_int = list(new_choice)[0]
        
        return prize == new_choice_int

3.5.2. Test two strategies

monty_hall(1)

True

N = 1000
count_wins = 0

for i in range(N):
    result = monty_hall(1)
    
    if result == True:
        count_wins = count_wins + 1
print("Strategy 1")
print(100*count_wins/N)

Strategy 1
35.5

N = 1000
count_wins = 0

for i in range(N):
    result = monty_hall(2)
    
    if result == True:
        count_wins = count_wins + 1
print("Strategy 2")
print(100*count_wins/N)

Strategy 2
69.9