# Trump/Clinton poll simulation

Find this notebook on the web at
<a class="quarto-xref" href="https://resampling-stats.github.io/latest-python/testing_counts_1.html#nte-trump_clinton">Note <span>21.6</span></a>.

What is the probability that a sample outcome such as actually observed
(840 Trump, 660 Clinton) would occur by chance if Clinton is “really”
ahead — that is, if Clinton has 50 percent (or more) of the support? To
restate in sharper statistical language: What is the probability that
the observed sample or one even more favorable to Trump would occur if
the universe has a mean of 50 percent or below?

Here is a procedure that responds to that question:

1.  Create a benchmark universe with one ball marked “Trump” and another
    marked “Clinton”
2.  Draw a ball, record its marking, and replace. (We sample with
    replacement to simulate the practically-infinite population of U. S.
    voters.)
3.  Repeat step 2 1500 times and count the number of “Trump”s. If 840 or
    greater, record “Y”; otherwise, record “N.”
4.  Repeat steps 3 and 4 perhaps 1000 or 10,000 times, and count the
    number of “Y”s. The outcome estimates the probability that 840 or
    more Trump choices would occur if the universe is “really” half or
    more in favor of Clinton.

Before we come to the simulation, we need some new code to tune our
histograms (see <a class="quarto-xref" href="https://resampling-stats.github.io/latest-python/probability_theory_3.html#sec-on-histograms"><span>Section 12.15.2</span></a>). We are
going to set the bins for the histogram using advanced ranges.

<div __quarto_custom="true" __quarto_custom_context="Block" __quarto_custom_id="27" __quarto_custom_type="Callout">
<div __quarto_custom_scaffold="true">

Advanced ranges

</div>
<div __quarto_custom_scaffold="true">

So far (<a class="quarto-xref" href="https://resampling-stats.github.io/latest-python/resampling_with_code.html#sec-ranges"><span>Section 5.9</span></a>) we have used
`np.arange` to make regular sequences of integers. For example, to make
an array of the sequential integers from 3 through 12, we could use:</div></div>

In [None]:
np.arange(3, 13)

Sometimes we want to be able to specify a step size — the gap between
the numbers in the sequence. In the sequence above, the gap (step)
between each number is 1. We might want some other step size. To create
a sequence of integers from 3 through 33 in steps of 5, we could write:

In [None]:
np.arange(3, 34, step=5)

Read this as “give me the sequence (range) of numbers, starting at 3, up
to but not including 34, in steps of 5.

So far we have used integers as the start, stop and step values, but we
could also use floating point values. For example, to get a sequence of
values starting at 0.1 up to and including 0.9, in steps of 0.2:

In [None]:
np.arange(0.1, 1, step=0.2)

With that background, we can proceed with the Python implementation of
the simulation procedure.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

rnd = np.random.default_rng()

# Number of repeats we will run.
n = 10_000

# Make an integer array to store the counts.
trumps = np.zeros(n, dtype=int)

for i in range(n):
    votes = rnd.choice(['Trump', 'Clinton'], size=1500)
    trumps[i] = np.sum(votes == 'Trump')

# Integer bins from 670 through 830 in steps of 5.
plt.hist(trumps, bins=range(670, 831, 5))
plt.title('Number of Trump voters of 1500 in null-world simulation')

# How often >= 840 Trump votes in random draw?
k = np.sum(trumps >= 840)
# As a proportion of simulated resamples.
kk = k / n

print('Proportion voting for Trump:', kk)

The value for `kk` is our estimate of the probability that Trump’s
“victory” in the sample would occur by chance if he really were behind.
In this case, our probability estimate is less than 1 in 10,000 (\&lt;
0.0001).