Module 09. Uncertainty in Models
Lecture Date 1: October 27, 2021 - Wednesday
Lecture Date 2: November 1, 2021 - Monday Slides-1
Lecture Date 3: November 3, 2021 - Wednesday Slides-2
Lecture Date 4: November 8, 2021 - Monday Slides-3
Lecture Date 5: November 10, 2021 - Wednesday
Lecture Date 6: November 15, 2021 - Monday Slides-4
Uncertainty is an inherent feature of many real world systems ranging from the weather to population behavior. In these series of lectures, we will try to understand why uncertainty exists and how to account for it. We will develop Python code to represent some simple uncertainties. We will also talk about more advanced uncertainty cases where equal chances is no longer the case. Finally, we will investigate the concept of Monte Carlo Simulation. Throughout these lectures, we will heavily use NumPy
and Matplotlib
.
Table of contents:
1. Random Numbers and the Concept of Equal Chances
import random
import numpy as np
import matplotlib.pyplot as plt
1.1. Default random number generation [0,1)
random.random() # built-in python random number generator
0.06307777120650893
np.random.random() # numpy random number generator (preffered)
0.3906705873305315
np.random.random(5) # 5 values to be generated
array([0.65348309, 0.27415441, 0.01372371, 0.71346053, 0.54695058])
np.random.random((2,3))
array([[0.61078621, 0.07792854, 0.11373331],
[0.04665458, 0.92737203, 0.3861151 ]])
1.2. Random Integer Generation
np.random.randint(5) # generates a number between 0 (inc) and 5 (exl)
3
np.random.randint(10,20) # generate a number between 10 and 20
14
np.random.randint(5,size=10)
array([3, 4, 0, 0, 3, 1, 4, 3, 1, 1])
np.random.randint(10,20,size=10)
array([10, 15, 16, 10, 10, 13, 15, 12, 19, 15])
np.random.randint(0,5,size=(3,4))
array([[4, 4, 0, 2],
[1, 3, 4, 4],
[0, 0, 4, 4]])
1.3. The concept of equal chances
np.random.uniform(0.,1.) # equal to np.random.random()
0.4921918549158054
np.random.uniform(-500.0, 500.0) # a random float between -500 and +500
389.21373647640064
np.random.uniform(0.0, 10.0, size=100) # 100 random floats between 0 and 10
array([7.73123491, 7.72278679, 7.84984113, 3.64597633, 3.09166542,
2.1372517 , 4.56584416, 4.55322728, 1.49055408, 7.80140388,
8.32915531, 6.80373627, 2.62093904, 3.28699214, 3.16933746,
9.47962056, 4.59755548, 9.93780824, 8.50222742, 0.67033916,
0.7680997 , 3.89928015, 5.10750796, 0.23213316, 7.22972843,
5.45403522, 0.14627873, 2.31024876, 3.64709778, 4.94125794,
8.9890525 , 4.96857027, 7.82692512, 3.82440752, 6.08652705,
3.15458908, 6.77813816, 2.34337811, 2.47766461, 4.47326407,
7.16964651, 5.03041095, 2.27432346, 2.58464947, 8.31960573,
6.1079084 , 4.06609776, 9.62319325, 5.44977887, 9.72312348,
3.35883305, 1.91267416, 1.78270747, 4.72766223, 4.33097185,
8.65538898, 2.61953746, 4.05091517, 9.78499707, 1.64899334,
2.74467793, 4.71359807, 8.08675696, 0.89846275, 4.35739942,
3.7082654 , 4.31793983, 5.00649306, 8.50431795, 8.02075301,
4.72356718, 2.33185499, 7.92272907, 8.60834256, 7.22456524,
5.34957216, 2.64242313, 7.09629589, 7.40626942, 0.07142494,
5.50727335, 4.06497208, 3.82767186, 0.47236227, 1.8954008 ,
9.9604546 , 4.94970934, 6.63181853, 1.11178444, 9.53742493,
3.4794558 , 8.80279326, 5.51649455, 6.99048747, 6.50757072,
8.6984726 , 8.81226183, 8.86915319, 6.22003787, 3.14325275])
np.random.uniform(0.0, 10.0, size=(3,3)) # 3x3 matrix populated w/ random floats between 0 and 10
array([[1.97134968, 9.8719721 , 3.25614689],
[3.63576096, 7.91214218, 3.48837422],
[9.66394753, 8.91367194, 3.30569732]])
1.4. Shuffling and permutation
cards_diamonds = [ str(i)+"_diamonds" for i in range(1,11)]
cards_spades = [str(i)+"_spades" for i in range(1,11)]
cards_hearts = [ str(i)+"_hearts" for i in range(1,11)]
cards_clubs = [ str(i)+"_clubs" for i in range(1,11)]
all_cards = np.concatenate([cards_diamonds, cards_spades, cards_hearts, cards_clubs])
print(all_cards)
['1_diamonds' '2_diamonds' '3_diamonds' '4_diamonds' '5_diamonds'
'6_diamonds' '7_diamonds' '8_diamonds' '9_diamonds' '10_diamonds'
'1_spades' '2_spades' '3_spades' '4_spades' '5_spades' '6_spades'
'7_spades' '8_spades' '9_spades' '10_spades' '1_hearts' '2_hearts'
'3_hearts' '4_hearts' '5_hearts' '6_hearts' '7_hearts' '8_hearts'
'9_hearts' '10_hearts' '1_clubs' '2_clubs' '3_clubs' '4_clubs' '5_clubs'
'6_clubs' '7_clubs' '8_clubs' '9_clubs' '10_clubs']
shuffled_cards = np.random.permutation(all_cards) # does not change the original array
shuffled_cards
array(['6_hearts', '10_hearts', '4_diamonds', '8_spades', '10_clubs',
'9_clubs', '2_hearts', '2_spades', '7_hearts', '10_diamonds',
'10_spades', '3_spades', '1_clubs', '8_clubs', '5_hearts',
'4_spades', '1_hearts', '7_spades', '9_spades', '1_spades',
'5_clubs', '3_clubs', '3_hearts', '2_clubs', '1_diamonds',
'5_diamonds', '6_spades', '9_diamonds', '6_diamonds', '4_clubs',
'8_diamonds', '2_diamonds', '7_diamonds', '9_hearts', '4_hearts',
'3_diamonds', '5_spades', '7_clubs', '6_clubs', '8_hearts'],
dtype='<U11')
np.random.shuffle(all_cards) # does change the original array
all_cards
array(['9_hearts', '5_clubs', '3_diamonds', '1_diamonds', '6_hearts',
'8_hearts', '3_clubs', '1_clubs', '9_diamonds', '5_spades',
'6_clubs', '2_diamonds', '10_diamonds', '2_hearts', '3_hearts',
'1_spades', '10_clubs', '2_spades', '8_clubs', '4_hearts',
'1_hearts', '7_clubs', '10_hearts', '6_diamonds', '10_spades',
'8_spades', '9_clubs', '4_spades', '7_diamonds', '3_spades',
'4_diamonds', '8_diamonds', '6_spades', '5_diamonds', '7_spades',
'9_spades', '2_clubs', '4_clubs', '7_hearts', '5_hearts'],
dtype='<U11')
2. The Concept of Unequal Chances
import numpy as np
import matplotlib.pyplot as plt
2.1. The traffic example
# easy but unpractical solution
chance = np.random.uniform(0.0, 100.0)
if chance <= 15:
print("Turn left")
elif 15 < chance <= 95:
print("Go straigt")
else:
print("Turn right")
Turn left
def direction():
chance = np.random.uniform(0.0, 100.0)
if chance <= 15:
print("Turn left")
elif 15 < chance <= 95:
print("Go straigt")
else:
print("Turn right")
for i in range(10):
direction()
Turn left
Go straigt
Go straigt
Go straigt
Go straigt
Turn left
Go straigt
Go straigt
Turn right
Go straigt
# more practical solution
direction = [1,2,3]
probabilities = [0.15, 0.8, 0.05]
selected_direction = np.random.choice(direction, p=probabilities)
if selected_direction == 1:
print("Turn left")
elif selected_direction == 2:
print("Go straight")
else:
print("Turn right")
Go straight
def direction2():
direction = [1,2,3]
probabilities = [0.15, 0.8, 0.05]
selected_direction = np.random.choice(direction, p=probabilities)
if selected_direction == 1:
print("Turn left")
elif selected_direction == 2:
print("Go straight")
else:
print("Turn right")
for i in range(10):
direction2()
Go straight
Turn left
Go straight
Go straight
Go straight
Go straight
Go straight
Go straight
Go straight
Go straight
2.2. Pick a number[1,10] exercise (from Fall 2019 semester)
# numbers picked in the classroom
picked_numbers = [9, 8, 4, 7, 9, 8, 7, 6, 8, 8, 7, 4, 7, 4, 2, 7, 4, 1, 2, 3, 9]
plt.hist(picked_numbers)
plt.show()
# numbers picked in by random number generator
randomly_picked = np.random.randint(1,11, size=len(picked_numbers))
plt.hist(randomly_picked)
plt.show()
len(randomly_picked)
21
# what if random number generator picks more
randomly_picked = np.random.randint(1,11, size=100)
plt.hist(randomly_picked)
plt.show()
2.3. Normal distribution
# generate 10 thousand numbers sampled from a normal distribution with mean=0, std deviation=0.5
norm1 = np.random.normal(0, 0.5, size=10000)
plt.hist(norm1, bins=50)
plt.axvline(x=0, color='k')
plt.show()
# as you can see in the histogram below, values are mostly dispersed 3 times greater than the standard deviation
# generate 10 thousand numbers sampled from a normal distribution with mean=5, std deviation=10
norm2 = np.random.normal(5, 10, size=10000)
plt.hist(norm2, bins=50)
plt.axvline(x=5, color='k')
plt.show()
2.4. Poisson distribution
# generate 16 numbers sampled from a poisson distribution with lamda (intensity)=100
poiss1 = np.random.poisson(100, size=16)
poiss1
array([102, 93, 90, 107, 100, 119, 107, 120, 113, 87, 114, 104, 98,
86, 100, 97])
# the mean is close to the intensity value we gave in the beginning
poiss1.mean()
102.3125
plt.hist(poiss1)
plt.axvline(x=100, color='k')
plt.show()
# if we created more values with the same parameters, we would see a smoother distribution
poiss2 = np.random.poisson(100, size=1000)
plt.hist(poiss2)
plt.axvline(x=100, color='k')
plt.show()
3. Monte Carlo Simulation
3.1. Coin flipping example
import numpy as np
import matplotlib.pyplot as plt
# two possible values
# 0: Heads
# 1: Tails
np.random.choice([0,1])
0
# five coin flips
np.random.choice([0,1],size=5)
array([0, 0, 0, 0, 1])
# 10 runs
observations = np.random.choice([0,1],size=10)
observations
array([1, 0, 0, 0, 0, 1, 1, 0, 1, 0])
odds_tails = 100 * observations.sum() / len(observations) # number of times we observed 1's / total tosses
odds_heads = 100 - odds_tails
print(odds_tails)
print(odds_heads)
40.0
60.0
runs = [1, 10, 100, 1000, 10000]
for run in runs:
observations = np.random.choice([0,1],size=run)
odds_tails = 100 * observations.sum() / len(observations) # number of times we observed 1's / total tosses
odds_heads = 100 - odds_tails
print("After ",run, " runs - odds of seeing heads is ", odds_heads)
After 1 runs - odds of seeing heads is 100.0
After 10 runs - odds of seeing heads is 70.0
After 100 runs - odds of seeing heads is 45.0
After 1000 runs - odds of seeing heads is 54.0
After 10000 runs - odds of seeing heads is 50.42
3.1.1. Coin flipping example with sequential combinations
one_sequence = np.random.choice([0,1],size=3)
print(one_sequence)
[0 1 0]
3.1.2. Slow running code
counter = 0
num_of_runs = 10
for i in range(num_of_runs):
one_sequence = np.random.choice([0,1],size=3)
if one_sequence.sum() == 0:
counter = counter + 1
p=counter/num_of_runs
print("Number of runs:", num_of_runs, " - P(Heads,Heads,Heads)", p)
Number of runs: 10 - P(Heads,Heads,Heads) 0.1
# copy/pasted from above - slow version
counter = 0
num_of_runs = 1000000
for i in range(num_of_runs):
one_sequence = np.random.choice([0,1],size=3)
if one_sequence.sum() == 0:
counter = counter + 1
p=counter/num_of_runs
print("Number of runs:", num_of_runs, " - P(Heads,Heads,Heads)", p)
Number of runs: 1000000 - P(Heads,Heads,Heads) 0.124826
all_runs = np.random.randint(0,2,size=(10,3))
print(all_runs)
[[0 0 0]
[0 1 1]
[1 0 0]
[0 1 1]
[1 1 1]
[0 0 1]
[0 0 0]
[0 0 1]
[1 0 0]
[0 1 1]]
counts = all_runs.sum(1)
zero_counts = counts==0
zero_counts.sum()
2
p = zero_counts.sum() / 10
print(p)
0.2
3.1.3. Faster running code
# faster version
num_of_runs = 100000000
entire_runs = np.random.choice([0,1],size=(num_of_runs,3))
entire_runs_count = entire_runs.sum(1)
count_of_HHH = (entire_runs_count == 0).sum()
p=count_of_HHH/num_of_runs
print("Number of runs:", num_of_runs, " - P(Heads,Heads,Heads)", p)
Number of runs: 100000000 - P(Heads,Heads,Heads) 0.12500933
3.2. The Pi number example
r = 1
# first location randomly set
x = np.random.uniform(-r,r)
y = np.random.uniform(-r,r)
print(x,y)
-0.4664022628726703 -0.9065451138461
import math
# if the distance to the center is lower than radius, it means the dart is within the circle
length = math.sqrt(x ** 2 + y ** 2)
print(length)
1.0194876724369872
# many runs....
runs = [10, 100, 1000, 10000, 100000]
for run in runs:
count = 0
for i in range(run):
x = np.random.uniform(-r,r)
y = np.random.uniform(-r,r)
length = math.sqrt(x ** 2 + y ** 2)
if length < r:
count = count + 1
p = count / run
# divided it by 4 to estimate the pi number
pi = p * 4
#print("Ratio of darts inside circle:", p)
print("Estimated pi value for ",run, " runs: ", pi)
Estimated pi value for 10 runs: 2.8
Estimated pi value for 100 runs: 2.8
Estimated pi value for 1000 runs: 3.088
Estimated pi value for 10000 runs: 3.1192
Estimated pi value for 100000 runs: 3.14476
3.3. Yahtzee example
def yahtzee():
number_of_dice_kept = 0
value_we_keep = 0
for i in range(3):
dice = np.random.randint(1,7,size=5-number_of_dice_kept)
if value_we_keep > 0:
for die in dice:
if die == value_we_keep:
number_of_dice_kept = number_of_dice_kept + 1
else:
counts = np.zeros(6)
for die in dice:
counts[die-1] += 1
number_of_dice_kept = int(counts.max())
if number_of_dice_kept == 1:
value_we_keep = 0
number_of_dice_kept = 0
else:
value_we_keep = counts.argmax() + 1
# option 1
return number_of_dice_kept == 5
# Option 2
# if number_of_dice_kept == 5:
# return True
# else:
# return False
# monte carlo simulation
N=100000
number_of_yahtzee_scores = 0
for i in range(N):
outcome = yahtzee()
if outcome == True:
number_of_yahtzee_scores = number_of_yahtzee_scores + 1
print("Ratio:", number_of_yahtzee_scores/N)
print("Percentage:", 100 * number_of_yahtzee_scores/N)
Ratio: 0.04506
Percentage: 4.506
3.4. Determining number of runs (N)
N=100
# 11/18/2019 lecture slide #7
N=100 # step 1
all_results_100 = []
for j in range(100): # step 5
np.random.seed(j) # step 2
number_of_yahtzee_scores = 0
for i in range(N): # Step 3
outcome = yahtzee()
if outcome == True:
number_of_yahtzee_scores = number_of_yahtzee_scores + 1
estimated_result = number_of_yahtzee_scores/N # step 4
all_results_100.append(estimated_result)
print(all_results_100)
[0.08, 0.04, 0.03, 0.01, 0.03, 0.04, 0.06, 0.03, 0.03, 0.06, 0.03, 0.02, 0.01, 0.08, 0.05, 0.02, 0.02, 0.01, 0.02, 0.05, 0.04, 0.05, 0.05, 0.05, 0.04, 0.05, 0.08, 0.07, 0.06, 0.07, 0.03, 0.05, 0.03, 0.04, 0.03, 0.06, 0.04, 0.03, 0.08, 0.07, 0.06, 0.08, 0.05, 0.06, 0.04, 0.06, 0.08, 0.06, 0.03, 0.05, 0.08, 0.02, 0.02, 0.04, 0.06, 0.06, 0.05, 0.04, 0.06, 0.04, 0.06, 0.02, 0.05, 0.07, 0.05, 0.02, 0.02, 0.05, 0.05, 0.08, 0.06, 0.03, 0.05, 0.02, 0.01, 0.02, 0.05, 0.09, 0.1, 0.03, 0.09, 0.02, 0.05, 0.03, 0.03, 0.03, 0.04, 0.05, 0.06, 0.09, 0.04, 0.07, 0.05, 0.04, 0.03, 0.05, 0.08, 0.03, 0.07, 0.03]
plt.hist(all_results_100)
(array([ 4., 12., 18., 13., 20., 14., 6., 9., 3., 1.]),
array([0.01 , 0.019, 0.028, 0.037, 0.046, 0.055, 0.064, 0.073, 0.082,
0.091, 0.1 ]),
<a list of 10 Patch objects>)
N=1000
# 11/18/2019 lecture slide #7
N=1000 # step 1
all_results_1000 = []
for j in range(100): # step 5
np.random.seed(j) # step 2
number_of_yahtzee_scores = 0
for i in range(N): # Step 3
outcome = yahtzee()
if outcome == True:
number_of_yahtzee_scores = number_of_yahtzee_scores + 1
estimated_result = number_of_yahtzee_scores/N # step 4
all_results_1000.append(estimated_result)
print(all_results_1000)
plt.hist(all_results_1000)
[0.047, 0.044, 0.043, 0.042, 0.04, 0.043, 0.044, 0.046, 0.037, 0.044, 0.049, 0.036, 0.035, 0.045, 0.035, 0.04, 0.048, 0.044, 0.052, 0.043, 0.041, 0.048, 0.047, 0.04, 0.054, 0.047, 0.053, 0.043, 0.043, 0.05, 0.043, 0.04, 0.046, 0.049, 0.051, 0.037, 0.05, 0.05, 0.051, 0.046, 0.044, 0.053, 0.048, 0.044, 0.058, 0.046, 0.05, 0.056, 0.038, 0.056, 0.051, 0.052, 0.042, 0.044, 0.045, 0.05, 0.047, 0.046, 0.044, 0.047, 0.036, 0.04, 0.036, 0.034, 0.043, 0.053, 0.038, 0.053, 0.055, 0.045, 0.057, 0.04, 0.045, 0.04, 0.038, 0.041, 0.046, 0.051, 0.058, 0.053, 0.04, 0.053, 0.047, 0.041, 0.044, 0.045, 0.044, 0.053, 0.046, 0.046, 0.055, 0.042, 0.048, 0.044, 0.055, 0.056, 0.047, 0.042, 0.049, 0.047]
(array([ 6., 5., 11., 11., 16., 20., 8., 13., 4., 6.]),
array([0.034 , 0.0364, 0.0388, 0.0412, 0.0436, 0.046 , 0.0484, 0.0508,
0.0532, 0.0556, 0.058 ]),
<a list of 10 Patch objects>)
# step 6 - calculating the standard deviation
standard_dev_100 = np.std(all_results_100)
print("Standard deviation for N=100:",standard_dev_100)
standard_dev_1000 = np.std(all_results_1000)
print("Standard deviation for N=1000:",standard_dev_1000)
Standard deviation for N=100: 0.020961631615883335
Standard deviation for N=1000: 0.005692547760010451
# calculating the confidence interval
mean_100 = np.mean(all_results_100)
print(mean_100)
mean_1000 = np.mean(all_results_1000)
print(mean_1000)
0.04690000000000001
0.04592999999999999
lower_bound_of_confidence_interval_100 = mean_100 - 1.96*standard_dev_100
upper_bound_of_confidence_interval_100 = mean_100 + 1.96*standard_dev_100
# if we use N=100
print("95% of the time, values will fall within ",lower_bound_of_confidence_interval_100,",",upper_bound_of_confidence_interval_100)
95% of the time, values will fall within 0.0058152020328686715 , 0.08798479796713135
lower_bound_of_confidence_interval_1000 = mean_1000 - 1.96*standard_dev_1000
upper_bound_of_confidence_interval_1000 = mean_1000 + 1.96*standard_dev_1000
# if we use N=1000
print("95% of the time, values will fall within ",lower_bound_of_confidence_interval_1000,",",upper_bound_of_confidence_interval_1000)
95% of the time, values will fall within 0.034772606390379504 , 0.05708739360962048
3.4.1. Compact version
def monte_carlo_run(N,number_of_repeats):
all_results = []
for j in range(number_of_repeats): # step 5
np.random.seed(j) # step 2
number_of_yahtzee_scores = 0
for i in range(N): # Step 3
outcome = yahtzee()
if outcome == True:
number_of_yahtzee_scores = number_of_yahtzee_scores + 1
estimated_result = number_of_yahtzee_scores/N # step 4
all_results.append(estimated_result)
return all_results
results = monte_carlo_run(N=1000, number_of_repeats=1000)
mean = np.mean(results)
std = np.std(results)
plt.hist(results)
plt.axvline(x=mean,color="black",label="mean")
plt.axvline(x=mean-1.96*std,color="red",label="Lower bound of conf. int.")
plt.axvline(x=mean+1.96*std,color="purple",label="Upper bound of conf. int.")
plt.legend()
plt.show()
3.5. Monty Hall problem
# identify a random door with prize
doors = [1,2,3]
prize = np.random.choice(doors)
print("The door with prize:", prize)
The door with prize: 1
# make a choice
choice = np.random.choice(doors)
print("The door we chose:", choice)
The door we chose: 2
set(doors)
{1, 2, 3}
doors_with_no_prize_and_unchosen = set(doors)-set([choice,prize])
print(doors_with_no_prize_and_unchosen)
{3}
the_door_opened_by_host = np.random.choice(list(doors_with_no_prize_and_unchosen))
print(the_door_opened_by_host)
3
# Strategy 1: we stick with current one
if prize == choice:
print("You won")
else:
print("You lost")
You lost
# Strategy 2: we change our mind and choose the other one
new_choice = set(doors) - set([the_door_opened_by_host]) - set([choice])
print(new_choice)
new_choice_int = list(new_choice)[0]
# we stick with current one
if prize == new_choice_int:
print("You won")
else:
print("You lost")
{1}
You won
3.5.1. Compact version of Monty Hall problem
def monty_hall(strategy):
# identify a random door with prize
doors = [1,2,3]
prize = np.random.choice(doors)
# make a choice
choice = np.random.choice(doors)
doors_with_no_prize_and_unchosen = set(doors)-set([choice,prize])
the_door_opened_by_host = np.random.choice(list(doors_with_no_prize_and_unchosen))
# if the function returns True, that means, you win
# if False, you lose
if strategy == 1: # stick with current choice
return prize == choice
elif strategy == 2: # change the choice
new_choice = set(doors) - set([the_door_opened_by_host]) - set([choice])
new_choice_int = list(new_choice)[0]
return prize == new_choice_int
3.5.2. Test two strategies
monty_hall(1)
True
N = 1000
count_wins = 0
for i in range(N):
result = monty_hall(1)
if result == True:
count_wins = count_wins + 1
print("Strategy 1")
print(100*count_wins/N)
Strategy 1
35.5
N = 1000
count_wins = 0
for i in range(N):
result = monty_hall(2)
if result == True:
count_wins = count_wins + 1
print("Strategy 2")
print(100*count_wins/N)
Strategy 2
69.9