We suggested that you ignored the pairing of the before and after samples, and that is what we will do here. Then we will extend the treatment to take the pairing into account.
import numpy as npimport pandas as pdrnd = np.random.default_rng()df = pd.read_csv('data/hamilton.csv')before = np.array(df['score_before'])after = np.array(df['score_after'])observed_diff = np.mean(after) - np.mean(before)# Let us start with a permutation test.both = np.concatenate([before, after])n_before =len(before)# Samples in the null world.n_trials =10_000results = np.zeros(n_trials)for i inrange(n_trials): shuffled = rnd.permuted(both) fake_before = shuffled[:n_before] fake_after = shuffled[n_before:] fake_diff = np.mean(fake_after) - np.mean(fake_before) results[i] = fake_diff# We are interested in fake differences that are larger# in magnitude than the observed difference (hence "abs").# Here we have no prior hypothesis about which direction the difference# will go.k = np.sum(np.abs(results) >= np.abs(observed_diff))kk = k / n_trialsprint('Permutation p null-world abs >= abs observed:', kk)
Permutation p null-world abs >= abs observed: 0.2554
# Next a bootstrap test.n_after =len(after) # Of course, in our case, this will be == n_beforeresults = np.zeros(n_trials)for i inrange(n_trials): fake_before = rnd.choice(both, size=n_before) fake_after = rnd.choice(both, size=n_after) fake_diff = np.mean(fake_after) - np.mean(fake_before) results[i] = fake_diffk = np.sum(np.abs(results) >= np.abs(observed_diff))kk = k / n_trialsprint('Bootstrap p null-world abs >= abs observed:', kk)
Bootstrap p null-world abs >= abs observed: 0.2217
Finally we consider the pairs. Here we do take the pairs into account. We have some reason to think that the patients or cars vary in some substantial way in their level of depression, or their tendency to break down, but we believe that the patients’ response to treatment or the difference between the mechanics is the value of interest.
In that case, we are interested in the differences between the pairs. In the null world, these before / after (mechanic A / mechanic B) differences are random. In the null-world, where there is no difference between before/after or mechanics 1 and 2, we can flip the before / after (A / B) pairs and be in the same world.
Notice that flipping the before / after or A / B in the pair just changes the sign of the difference.
So we will simulate the effect of flipping the values in the pair, by choosing a random sign for the pair, where -1 means pair is flipped, and 1 means pair is in original order. We recalculated the mean difference with these random signs (flips) applied, and these will be our values in the null-world.
# A test of paired difference with sign flips for the null world.differences = after - beforeobserved_mdiff = np.mean(differences)n_both =len(differences)results = np.zeros(n_trials)for i inrange(n_trials):# Choose random signs to perform random flips of the pairs. signs = rnd.choice([-1, 1], size=n_both)# Do flips. fake_differences = signs * differences# Calculate mean difference and store result. results[i] = np.mean(fake_differences)k = np.sum(np.abs(results) >= np.abs(observed_mdiff))kk = k / n_trialsprint('Sign-flip p null-world abs >= abs observed:', kk)
Sign-flip p null-world abs >= abs observed: 0.0271
Notice that the sign-flip test, in which we preserve the information about the patients / cars, is much more convincing than the permutation or bootstrap tests above, where we choose to ignore that information.
This can occur when the values within the pairs (rows) are similar to each other, but less similar across different pairs (rows).
A sample of 20 batteries is tested, and the average lifetime is 28.85 months. Establish a 95 percent confidence interval for the true average value. The sample values (lifetimes in months) are listed below.
We use the “bootstrap” technique of drawing many bootstrap re-samples with replacement from the original sample, and observing how the re-sample means are distributed.
n_lifetimes =len(lifetimes)results = np.zeros(n_trials)for i inrange(n_trials):# Draw 20 lifetimes from "lifetimes, randomly and with replacement. fake_lifetimes = rnd.choice(lifetimes, size=n_lifetimes)# Find the average lifetime of the 20. fake_mean = np.mean(fake_lifetimes)# Keep score. results[i] = fake_meanplt.hist(results, bins=25)plt.title('Bootstrap distribution of mean battery lifetimes')plt.xlabel('Bootstrap mean battery lifetime')mean_limits = np.quantile(results, [0.025, 0.975])print('95% percent limits for mean lifetimes:', mean_limits)
95% percent limits for mean lifetimes: [27.65 29.95]
Derive a 95 percent confidence interval for the sample mean. Are there enough measurements to produce a satisfactory answer?
n_density =len(density)n_trials =10_000results = np.zeros(n_trials)for i inrange(n_trials): fake_density = rnd.choice(density, size=n_density) results[i] = np.mean(fake_density)plt.hist(results, bins=25)plt.title('Bootstrap distribution of density means')plt.xlabel('Bootstrap density means')mean_limits = np.quantile(results, [0.025, 0.975])print('95% percent limits for density mean:', mean_limits)
95% percent limits for density mean: [0.0185 0.022 ]
The observed correlation coefficient between voter participation and spread is moderate and negative. Is this more negative that what might occur by chance, if no correlation exists in some underlying population, from which this sample was taken?
Create two groups of paper cards: 25 with participation rates, and 25 with the spread values. Arrange the cards in pairs in accordance with the table, and compute the correlation coefficient between the shuffled participation and spread variables.
Shuffle one of the sets, say that with participation, and compute correlation between shuffled participation and spread.
Repeat step 2 many, say 1000, times. Compute the proportion of the trials in which correlation was at least as negative as that for the original data.
Note A.6: Notebook: Voter participation in 1844 election
n_trials =10_000results = np.zeros(n_trials)for i inrange(n_trials):# Shuffle the participation rates. shuffled = rnd.permuted(participation)# Compute re-sampled correlation. fake_r = np.corrcoef(shuffled, spread)[0, 1]# Keep the value in the results. results[i] = fake_rplt.hist(results, bins=25)plt.title('Distribution of shuffled correlations')plt.xlabel('Correlation with shuffled participation')# Count the trials when result <= observed.k = np.sum(results <= actual_r)# Compute the proportion of such trials.kk = k / n_trialsprint('Proportion of shuffled r <= observed:', np.round(kk, 2))
Proportion of shuffled r <= observed: 0.03
End of notebook: Voter participation in 1844 election
From this we may conclude that the voter participation rates probably are negatively related to the vote spread in the election. The actual value of the correlation (-.425) cannot be explained by chance alone. In our Monte Carlo simulation of the null-hypothesis a correlation that negative is found only about 3 percent of the time.
# The sum of products approach.actual_sop = np.sum(homeruns * strikeout)n_trials =10_000results = np.zeros(n_trials)for i inrange(n_trials): shuffled_runs = rnd.permuted(homeruns) fake_sop = np.sum(shuffled_runs * strikeout) results[i] = fake_sopplt.hist(results, bins=25)plt.title('Distribution of shuffled sum of products')plt.xlabel('Sum of products for shuffled homeruns')k = np.sum(results >= actual_sop)kk = k / n_trialsprint('p shuffled sum of products >= actual:', np.round(kk, 3))
p shuffled sum of products >= actual: 0.003
Interpretation: In 10,000 simulations, random shuffling very rarely produced a value as high as observed. Therefore, we conclude that random chance could not reasonably be responsible for the observed degree of correlation.
End of notebook: Homeruns and strikeout sum of products.
Again, we are looking at the correlation of home runs and strikeouts for major-league baseball players. This time we will use the correlation coefficient (\(r\)) measure.
Note A.8: Notebook: Homeruns and strikeout correlation
# The correlation approach.actual_r = np.corrcoef(homeruns, strikeout)[0, 1]n_trials =10_000results = np.zeros(n_trials)for i inrange(n_trials): shuffled_runs = rnd.permuted(homeruns) fake_r = np.corrcoef(shuffled_runs, strikeout)[0, 1] results[i] = fake_rplt.hist(results, bins=25)plt.title('Distribution of shuffled r')plt.xlabel('r for shuffled homeruns')k = np.sum(results >= actual_r)kk = k / n_trialsprint('p shuffled r >= actual:', np.round(kk, 3))
p shuffled r >= actual: 0.003
Interpretation: a correlation coefficient as high as the observed value (.62) occurred only about 0.3% of the time by chance. Hence, provisionally, we choose to reject chance as an explanation for such a high value of the correlation coefficient.
Notice, we get the same answer for correlation coefficients as we do for sum of products. In fact, correlation coefficients must give us the same answer (apart from random variation from the permutation), as the tests of association are equivalent when we compare between different orderings of the same sequences.
End of notebook: Homeruns and strikeout correlation
Interpretation: The observed correlation (.42) between the exchange rate and the money supply is seldom exceeded by random experiments with these data. Thus, the observed result 0.42 cannot be reasonably explained by chance alone and we conclude that it is statistically surprising.