3 What is probability?
Uncertainty, in the presence of vivid hopes and fears, is painful, but must be endured if we wish to live without the support of comforting fairy tales.” — Bertrand Russell (1945 p. xiv).
3.1 Introduction
The central concept for dealing with uncertainty is probability. Hence we must inquire into the “meaning” of the term probability. (The term “meaning” is in quotes because it can be a confusing word.)
You have been using the notion of probability all your life when drawing conclusions about what you expect to happen, and in reaching decisions in your public and personal lives.
You wonder: Will the kick from the 45 yard line go through the uprights? How much oil can you expect from the next well you drill, and what value should you assign to that prospect? Will you make money if you invest in tech stocks for the medium term, or should you spread your investments across the stock market? Will the next Space-X launch end in disaster? Your answers to these questions rest on the probabilities you estimate.
And you act on the basis of probabilities: You pay extra for an low-interest loan, if you think that interest rates are going to go up. You bet heavily on a poker hand if there is a high probability that you have the best hand. A hospital decides not to buy another ambulance when the administrator judges that there is a low probability that all the other ambulances will ever be in use at once. NASA decides whether or not to send off the space shuttle this morning as scheduled.
The idea of probability is essential when we reason about uncertainty, and so this chapter discusses what is meant by such key terms as “probability,” “chance”, “sample,” and “universe.” It discusses the nature and the usefulness of the concept of probability as used in this book, and it touches on the source of basic estimates of probability that are the raw material of statistical inferences.
3.2 The “Meaning” of “Probability”
Probability is difficult to define (Feller 1968), but here is a useful informal starting point:
A probability is a number from 0 through 1 that reflects how likely it is that a particular event will happen.
Any particular stated probability is an assertion that indicates how likely you believe it is that an event will occur.
If you give an event a probability of 0 you mean that you are certain it will not happen. If you give probability 1 to an event, you mean you are certain that it will happen. For example, if I give you one card from deck that you know contains only the standard 52 cards — before you look at the card, you can give probability 0 to the card being a joker, because you are certain the pack does not contain any joker cards. If I then select only the 14 spades from that deck, and give you a card from that selection, you will say there is probability 1 that the card is a black card, because all the spades are black cards.
A probability estimate of .2 indicates that you think there is twice as great a chance of the event happening as if you had estimated a probability of .1. This is the rock-bottom interpretation of the term “probability,” and the heart of the concept.
A given probability may be expressed in terms of probability, odds, or chances, and I shall use all three terms to help familiarize you with them.
Let us say we think there is a probability of 0.1 that it will rain tomorrow.
We can restate this probability by saying there is a one in 10 chance that it will rain tomorrow (\(1 / 10 = 0.1\)). Giving the chances as 1 in 10, or 2 in 20, or 10 in 100, is the same as saying the probability is 0.1.
If we multiply the probability by 100 we get the percent chance — another way of saying the probability. Here we have a \(0.1 * 100\) = 10% chance of rain. We could also say that the chances of rain are 10 in 100.
Odds are still another way of expressing probability. Here we think of our outcome of interest — a day with rain and compare it to our outcome that is not of interest — a day without rain. Our probability of 0.1 means that we expect one day with rain in every 10 days, and therefore, one day with rain for every nine days without rain. We can express the 0.1 probability of rain as odds 1 to 9 (of a rainy day), or 9 to 1 against a rainy day.
“Likelihood” is a term related to “probability” but is not a complete synonym for it — it has a specific and technical meaning in probability and statistics.
The idea of probability arises when you are not sure about what will happen in an uncertain situation. For example, you may lack information and therefore can only make an estimate. If someone asks you your name, you do not use the concept of probability to answer; you know the answer to a very high degree of surety. To be sure, there is some chance that you do not know your own name, but for all practical purposes you can be quite sure of the answer. If someone asks you who will win tomorrow’s baseball game, however, there is a considerable chance that you will be wrong no matter what you say. Whenever there is a reasonable chance that your prediction will be wrong, the concept of probability can help you.
The concept of probability helps you to answer the question, “How likely is it that…?” The purpose of the study of probability and statistics is to help you make sound appraisals of statements about the future, and good decisions based upon those appraisals. The concept of probability is especially useful when you have a sample from a larger set of data — a “universe” — and you want to know the probability of various degrees of likeness between the sample and the universe. (The universe of events you are sampling from is also called the “population,” a concept to be discussed below.) Perhaps the universe of your study is all high school graduates in 2018. You might then want to know, for example, the probability that the universe’s average SAT (university entrance) score will not differ from your sample’s average SAT by more than some arbitrary number of SAT points — say, ten points.
We have said that a probability statement is about the future. Well, usually. Occasionally you might state a probability about your future knowledge of past events — that is, “I think I’ll find out that…” — or even about the unknown past. (Historians use probabilities to measure their uncertainty about whether events occurred in the past, and the courts do, too, though the courts hesitate to say so explicitly.)
Sometimes one knows a probability, such as in the case of a gambler playing black on an honest roulette wheel, or an insurance company issuing a policy on an event with which it has had a lot of experience, such as a life insurance policy. But often one does not know the probability of a future event. Therefore, our concept of probability must include situations where extensive data are not available.
All of the many techniques used to estimate probabilities should be thought of as proxies for the actual probability. For example, if Mission Control at Space Central simulates what should and probably will happen in space if a valve is turned aboard a space craft just now being built, the test result on the ground is a proxy for the real probability of what will happen when the crew turn the valve in the planned mission.
In some cases, it is difficult to conceive of any data that can serve as a proxy. For example, the director of the CIA, Robert Gates, said in 1993 “that in May 1989, the CIA reported that the problems in the Soviet Union were so serious and the situation so volatile that Gorbachev had only a 50-50 chance of surviving the next three to four years unless he retreated from his reform policies” (The Washington Post, January 17, 1993, p. A42). Can such a statement be based on solid enough data to be more than a crude guess?
The conceptual probability in any specific situation is an interpretation of all the evidence that is then available. For example, a wise biomedical worker’s estimate of the chance that a given therapy will have a positive effect on a sick patient should be an interpretation of the results of not just one study in isolation, but of the results of that study plus everything else that is known about the disease and the therapy. A wise policymaker in business, government, or the military will base a probability estimate on a wide variety of information and knowledge. The same is even true of an insurance underwriter who bases a life-insurance or shipping-insurance rate not only on extensive tables of long-time experience but also on recent knowledge of other kinds. Each situation asks us to make a choice of the best method of estimating a probability — whether that estimate is objective — from a frequency series — or subjective, from the distillation of other experience.
3.3 The nature and meaning of the concept of probability
It is confusing and unnecessary to inquire what probability “really” is. (Indeed, the terms “really” and “is,” alone or in combination, are major sources of confusion in statistics and in other logical and scientific discussions, and it is often wise to avoid their use.) Various concepts of probability — which correspond to various common definitions of the term — are useful in particular contexts. This book contains many examples of the use of probability. Work with them will gradually develop a sound understanding of the concept.
There are two major concepts and points of view about probability — frequency and degrees of belief. Each is useful in some situations but not in others. Though they may seem incompatible in principle, there almost never is confusion about which is appropriate in a given situation.
Frequency: The probability of an event can be said to be the proportion of times that the event has taken place in the past, usually based on a long series of trials. Insurance companies use this when they estimate the probability that a thirty-five-year-old teacher will die during a period for which he wants to buy an insurance policy. (Notice this shortcoming: Sometimes you must bet upon events that have never or only infrequently taken place before, and so you cannot reasonably reckon the proportion of times they occurred one way or the other in the past.)
Degree of belief: The probability that an event will take place or that a statement is true can be said to correspond to the odds at which you would bet that the event will take place. (Notice a shortcoming of this concept: You might be willing to accept a five-dollar bet at 2-1 odds that your team will win the game, but you might be unwilling to bet a hundred dollars at the same odds.)
See (Barnett 1982, chap. 3) for an in-depth discussion of different approaches to probability.
The connection between gambling and immorality or vice troubles some people about gambling examples. On the other hand, the immediacy and consequences of the decisions that the gambler has to make give the subject a special tang. There are several reasons why statistics use so many gambling examples — and especially tossing coins, throwing dice, and playing cards:
- Historical: The theory of probability began with gambling examples of dice analyzed by Cardano, Galileo, and then by Pascal and Fermat.
- Generality: These examples are not related to any particular walk of life, and therefore they can be generalized to applications in any walk of life. Students in any field — business, medicine, science — can feel equally at home with gambling examples.
- Sharpness: These examples are particularly stark, and unencumbered by the baggage of particular walks of life or special uses.
- Universality: Many other texts use these same examples, and therefore the use of them connects up this book with the main body of writing about probability and statistics.
Often we’ll begin with a gambling example and then consider an example in one of the professional fields — such as business and other decision-making activities, biostatistics and medicine, social science and natural science — and everyday living. People in one field often can benefit from examples in others; for example, medical students should understand the need for business decision-making in terms of medical practice, as well as the biostatistical examples. And social scientists should understand the decision-making aspects of statistics if they have any interest in the use of their work in public policy.
3.4 Back to Proxies
Example of a proxy: The “probability risk assessments” (PRAs) that are made for the chances of failures of nuclear power plants are based, not on long experience or even on laboratory experiment, but rather on theorizing of various kinds — using pieces of prior experience wherever possible, of course. A PRA can cost a nuclear facility many millions of dollars.
Another example: If a manager of a high-street store looks at the sales of a particular brand of smart watches in the last two Decembers, and on that basis guesses how likely it is that she will run out of stock if she orders 200 smart watches, then the last two years’ experience is serving as a proxy for future experience. If a sales manager just “intuits” that the odds are 3 to 1 (a probability of .75) that the main local competitor will not meet a price cut, then all her past experience summed into her intuition is a proxy for the probability that it will really happen. Whether any proxy is a good or bad one depends on the wisdom of the person choosing the proxy and making the probability estimates.
How does one estimate a probability in practice? This involves practical skills not very different from the practical skills required to estimate with accuracy the length of a golf shot, the number of carpenters you will need to build a house, or the time it will take you to walk to a friend’s house; we will consider elsewhere some ways to improve your practical skills in estimating probabilities. For now, let us simply categorize and consider in the next section various ways of estimating an ordinary garden variety of probability, which is called an “unconditional” probability.
3.5 The various ways of estimating probabilities
Consider the probability of drawing an even-numbered spade from a deck of poker cards (consider the queen as even and the jack and king as odd). Here are several general methods of estimation, where we define each method in terms of the operations we use to make the estimate:
Experience.
The first possible source for an estimate of the probability of drawing an even-numbered spade is the purely empirical method of experience. If you have watched card games casually from time to time, you might simply guess at the proportion of times you have seen even-numbered spades appear — say, “about 1 in 15” or “about 1 in 9” (which is almost correct) or something like that. (If you watch long enough you might come to estimate something like 6 in 52.)
General information and experience are also the source for estimating the probability that the sales of a particular brand of smart watch this December will be between 200 and 250, based on sales the last two Decembers; that your team will win the football game tomorrow; that war will break out next year; or that a United States astronaut will reach Mars before a Chinese astronaut. You simply put together all your relevant prior experience and knowledge, and then make an educated guess.
Observation of repeated events can help you estimate the probability that a machine will turn out a defective part or that a child can memorize four nonsense syllables correctly in one attempt. You watch repeated trials of similar events and record the results.
Data on the mortality rates for people of various ages in a particular country in a given decade are the basis for estimating the probabilities of death, which are then used by the actuaries of an insurance company to set life insurance rates. This is systematized experience — called a frequency series.
No frequency series can speak for itself in a perfectly objective manner. Many judgments inevitably enter into compiling every frequency series — deciding which frequency series to use for an estimate, choosing which part of the frequency series to use, and so on. For example, should the insurance company use only its records from last year, which will be too few to provide as much data as is preferable, or should it also use death records from years further back, when conditions were slightly different, together with data from other sources? (Of course, no two deaths — indeed, no events of any kind — are exactly the same. But under many circumstances they are practically the same, and science is only interested in such “practical” considerations.)
Given that we have to use judgment in probability estimates, the reader may prefer to talk about “degrees of belief” instead of probabilities. That’s fine, just as long as it is understood that we operate with degrees of belief in exactly the same way as we operate with probabilities; the two terms are working synonyms.
There is no logical difference between the sort of probability that the life insurance company estimates on the basis of its “frequency series” of past death rates, and the manager’s estimates of the sales of smart watches in December, based on sales in that month in the past two years.1
The concept of a probability based on a frequency series can be rendered almost useless when all the observations are repetitions of a single magnitude — for example, the case of all successes and zero failures of space-shuttle launches prior to the Challenger shuttle tragedy in the 1980s; in those data alone there was almost no basis to estimate the probability of a shuttle failure. (Probabilists have made some rather peculiar attempts over the centuries to estimate probabilities from the length of a zero-defect time series — such as the fact that the sun has never failed to rise (foggy days aside!) — based on the undeniable fact that the longer such a series is, the smaller the probability of a failure; see e.g., (Whitworth 1897, xix–xli). However, one surely has more information on which to act when one has a long series of observations of the same magnitude rather than a short series).
Simulated experience.
A second possible source of probability estimates is empirical scientific investigation with repeated trials of the phenomenon. This is an empirical method even when the empirical trials are simulations. In the case of the even-numbered spades, the empirical scientific procedure is to shuffle the cards, deal one card, record whether or not the card is an even-number spade, replace the card, and repeat the steps a good many times. The proportions of times you observe an even-numbered spade come up is a probability estimate based on a frequency series.
You might reasonably ask why we do not just count the number of even-numbered spades in the deck of fifty-two cards — using the sample space analysis you see below. No reason at all. But that procedure would not work if you wanted to estimate the probability of a baseball batter getting a hit or a lighter producing flame.
Some varieties of poker are so complex that experiment is the only feasible way to estimate the probabilities a player needs to know.
The resampling approach to statistics produces estimates of most probabilities with this sort of experimental “Monte Carlo” method. More about this later.
Sample space analysis and first principles.
A third source of probability estimates is counting the possibilities — the quintessential theoretical method. For example, by examination of an ordinary die one can determine that there are six different numbers that can come up. One can then determine that the probability of getting (say) either a “1” or a “2,” on a single throw, is 2/6 = 1/3, because two among the six possibilities are “1” or “2.” One can similarly determine that there are two possibilities of getting a “1” plus a “6” out of thirty-six possibilities when rolling two dice, yielding a probability estimate of 2/36 = 1/18.
Estimating probabilities by counting the possibilities has two requirements: 1) that the possibilities all be known (and therefore limited), and few enough to be studied easily; and 2) that the probability of each particular possibility be known, for example, that the probabilities of all sides of the dice coming up are equal, that is, equal to 1/6.
Mathematical shortcuts to sample-space analysis.
A fourth source of probability estimates is mathematical calculations. (We will introduce some probability calculation rules in Chapter 9.) If one knows by other means that the probability of a spade is 1/4 and the probability of an even-numbered card is 6/13, one can use probability calculation rules to calculate that the probability of turning up an even-numbered spade is 6/52 (that is, 1/4 x 6/13). (This is multiplication rule introduced in Section 8.12). If one knows that the probability of a spade is 1/4 and the probability of a heart is 1/4, one can then calculate that the probability of getting a heart or a spade is 1/2 (that is 1/4 + 1/4). (We are using the addition rule from Section 8.7.) The point here is not the particular calculation procedures, which we will touch on later, but rather that one can often calculate the desired probability on the basis of already-known probabilities.
It is possible to estimate probabilities with mathematical calculation only if one knows by other means the probabilities of some related events. For example, there is no possible way of mathematically calculating that a child will memorize four nonsense syllables correctly in one attempt; empirical knowledge is necessary.
Kitchen-sink methods.
In addition to the above four categories of estimation procedures, the statistical imagination may produce estimates in still other ways such as a) the salesman’s seat-of-the-pants estimate of what the competition’s price will be next quarter, based on who-knows-what gossip, long-time acquaintance with the competitors, and so on, and b) the probability risk assessments (PRAs) that are made for the chances of failures of nuclear power plants based, not on long experience or even on laboratory experiment, but rather on theorizing of various kinds — using pieces of prior experience wherever possible, of course. Any of these methods may be a combination of theoretical and empirical methods.
As an example of an organization struggling with kitchen-sink methods, consider the estimation of the probability of failure for the tragic flight of the Challenger shuttle, as described by the famous physicist Nobelist Richard Feynman. This is a very real case that includes just about every sort of complication that enters into estimating probabilities.
…Mr. Ullian told us that 5 out of 127 rockets that he had looked at had failed — a rate of about 4 percent. He took that 4 percent and divided it by 4, because he assumed a manned flight would be safer than an unmanned one. He came out with about a 1 percent chance of failure, and that was enough to warrant the destruct charges.
But NASA [the space agency in charge] told Mr. Ullian that the probability of failure was more like 1 in \(10^5\).
I tried to make sense out of that number. “Did you say 1 in \(10^5\)?”
“That’s right; 1 in 100,000.”
“That means you could fly the shuttle every day for an average of 300 years between accidents — every day, one flight, for 300 years — which is obviously crazy!”
“Yes, I know,” said Mr. Ullian. “I moved my number up to 1 in 1000 to answer all of NASA’s claims — that they were much more careful with manned flights, that the typical rocket isn’t a valid comparison, etcetera.”
But then a new problem came up: the Jupiter probe, Galileo, was going to use a power supply that runs on heat generated by radioactivity. If the shuttle carrying Galileo failed, radioactivity could be spread over a large area. So the argument continued: NASA kept saying 1 in 100,000 and Mr. Ullian kept saying 1 in 1000, at best.
Mr. Ullian also told us about the problems he had in trying to talk to the man in charge, Mr. Kingsbury: he could get appointments with underlings, but he never could get through to Kingsbury and find out how NASA got its figure of 1 in 100,000 (Feynman and Leighton 1988, 179–80).
Feynman tried to ascertain more about the origins of the figure of 1 in 100,000 that entered into NASA’s calculations. He performed an experiment with the engineers:
…“Here’s a piece of paper each. Please write on your paper the answer to this question: what do you think is the probability that a flight would be uncompleted due to a failure in this engine?”
They write down their answers and hand in their papers. One guy wrote “99-44/100% pure” (copying the Ivory soap slogan), meaning about 1 in 200. Another guy wrote something very technical and highly quantitative in the standard statistical way, carefully defining everything, that I had to translate — which also meant about 1 in 200. The third guy wrote, simply, “1 in 300.”
Mr. Lovingood’s paper, however, said:
“Cannot quantify. Reliability is judged from:
- past experience
- quality control in manufacturing
- engineering judgment”
“Well,” I said, “I’ve got four answers, and one of them weaseled.” I turned to Mr. Lovingood: “I think you weaseled.”
“I don’t think I weaseled.”
“You didn’t tell me what your confidence was, sir; you told me how you determined it. What I want to know is: after you determined it, what was it?”
He says, “100 percent” — the engineers’ jaws drop, my jaw drops; I look at him, everybody looks at him — “uh, uh, minus epsilon!”
So I say, “Well, yes; that’s fine. Now, the only problem is, WHAT IS EPSILON?”
He says, “\(10^-5\).” It was the same number that Mr. Ullian had told us about: 1 in 100,000.
I showed Mr. Lovingood the other answers and said, “You’ll be interested to know that there is a difference between engineers and management here — a factor of more than 300.”
He says, “Sir, I’ll be glad to send you the document that contains this estimate, so you can understand it.”
Later, Mr. Lovingood sent me that report. It said things like “The probability of mission success is necessarily very close to 1.0” — does that mean it is close to 1.0, or it ought to be close to 1.0? — and “Historically, this high degree of mission success has given rise to a difference in philosophy between unmanned and manned space flight programs; i.e., numerical probability versus engineering judgment.” As far as I can tell, “engineering judgment” means they’re just going to make up numbers! The probability of an engine-blade failure was given as a universal constant, as if all the blades were exactly the same, under the same conditions. The whole paper was quantifying everything. Just about every nut and bolt was in there: “The chance that a HPHTP pipe will burst is \(10^-7\).” You can’t estimate things like that; a probability of 1 in 10,000,000 is almost impossible to estimate. It was clear that the numbers for each part of the engine were chosen so that when you add everything together you get 1 in 100,000. (Feynman and Leighton 1988, 182–83).
We see in the Challenger shuttle case very mixed kinds of inputs to actual estimates of probabilities. They include frequency series of past flights of other rockets, judgments about the relevance of experience with that different sort of rocket, adjustments for special temperature conditions (cold), and much much more. There also were complex computational processes in arriving at the probabilities that were made the basis for the launch decision. And most impressive of all, of course, are the extraordinary differences in estimates made by various persons (or perhaps we should talk of various statuses and roles) which make a mockery of the notion of objective estimation in this case.
Working with different sorts of estimation methods in different sorts of situations is not new; practical statisticians do so all the time. We argue that we should make no apology for doing so.
The concept of probability varies from one field of endeavor to another; it is different in the law, in science, and in business. The concept is most straightforward in decision-making situations such as business and gambling; there it is crystal-clear that one’s interest is entirely in making accurate predictions so as to advance the interests of oneself and one’s group. The concept is most difficult in social science, where there is considerable doubt about the aims and values of an investigation. In sum, one should not think of what a probability “is” but rather how best to estimate it. In practice, neither in actual decision-making situations nor in scientific work — nor in classes — do people experience difficulties estimating probabilities because of philosophical confusions. Only philosophers and mathematicians worry — and even they really do not need to worry — about the “meaning” of probability2.
3.6 The relationship of probability to other magnitudes
An important argument in favor of approaching the concept of probability as an estimate is that an estimate of a probability often (though not always) is the opposite side of the coin from an estimate of a physical quantity such as time or space.
For example, uncertainty about the probability that one will finish a task within 9 minutes is another way of labeling the uncertainty that the time required to finish the task will be less than 9 minutes. Hence, if estimation is appropriate for time in this case, it should be equally appropriate for probability. The same is true for the probability that the quantity of smart watches sold will be between 200 and 250 units.
Hence the concept of probability, and its estimation in any particular case, should be no more puzzling than is the “dual” concept of time or distance or quantities of smart watches. That is, lack of certainty about the probability that an event will occur is not different in nature from lack of certainty about the amount of time or distance in the event. There is no essential difference between whether a part 2 inches in length will be the next to emerge from the machine, or what the length of the next part will be, or the length of the part that just emerged (if it has not yet been measured).
The information available for the measurement of (say) the length of a car or the location of a star is exactly the same information that is available with respect to the concept of probability in those situations. That is, one may have ten disparate observations of a car’s length which then constitute a probability distribution, and the same for the altitude of a star in the heavens.
In a book of puzzles about probability (Mosteller 1987, problem 42), this problem appears: “If a stick is broken in two at random, what is the average length of the smaller piece?” This particular puzzle does not even mention probability explicitly, and no one would feel the need to write a scholarly treatise on the meaning of the word “length” here, any more than one would one do so if the question were about an astronomer’s average observation of the angle of a star at a given time or place, or the average height of boards cut by a carpenter, or the average size of a basketball team. Nor would one write a treatise about the “meaning” of “time” if a similar puzzle involved the average time between two bird calls. Yet a rephrasing of the problem reveals its tie to the concept of probability, to wit: What is the probability that the smaller piece will be (say) more than half the length of the larger piece? Or, what is the probability distribution of the sizes of the shorter piece?
The duality of the concepts of probability and physical entities also emerges in Whitworth’s discussion (1897) of fair betting odds:
…What sum ought you fairly give or take now, while the event is undetermined, in exchange for the assurance that you shall receive a stated sum (say $1,000) if the favourable event occur? The chance of receiving $1,000 is worth something. It is not as good as the certainty of receiving $1,000, and therefore it is worth less than $1,000. But the prospect or expectation or chance, however slight, is a commodity which may be bought and sold. It must have its price somewhere between zero and $1,000. (p. xix.)
…And the ratio of the expectation to the full sum to be received is what is called the chance of the favourable event. For instance, if we say that the chance is 1/5, it is equivalent to saying that $200 is the fair price of the contingent $1,000. (p. xx.)…
The fair price can sometimes be calculated mathematically from a priori considerations: sometimes it can be deduced from statistics, that is, from the recorded results of observation and experiment. Sometimes it can only be estimated generally, the estimate being founded on a limited knowledge or experience. If your expectation depends on the drawing of a ticket in a raffle, the fair price can be calculated from abstract considerations: if it depend upon your outliving another person, the fair price can be inferred from recorded statistics: if it depend upon a benefactor not revoking his will, the fair price depends upon the character of your benefactor, his habit of changing his mind, and other circumstances upon the knowledge of which you base your estimate. But if in any of these cases you determine that $300 is the sum which you ought fairly to accept for your prospect, this is equivalent to saying that your chance, whether calculated or estimated, is 3/10... (p. xx.)
It is indubitable that along with frequency data, a wide variety of other information will affect the odds at which a reasonable person will bet. If the two concepts of probability stand on a similar footing here, why should they not be on a similar footing in all discussion of probability? I can think of no reason that they should not be so treated.
Scholars write about the “discovery” of the concept of probability in one century or another. But is it not likely that even in pre-history, when a fisherperson was asked how long the big fish was, s/he sometimes extended her/his arms and said, “About this long, but I’m not exactly sure,” and when a scout was asked how many of the enemy there were, s/he answered, “I don’t know for sure...probably about fifty.” The uncertainty implicit in these statements is the functional equivalent of probability statements. There simply is no need to make such heavy work of the probability concept as the philosophers and mathematicians and historians have done.
3.7 What is “chance”?
The study of probability focuses on events with randomness — that is, events about which there is uncertainty whether or not they will occur. And the uncertainty refers to your knowledge rather than to the event itself. For example, consider this physical illustration with a remote control. The remote control has a front end that should point at the TV that it controls, and a back end that will usually be pointing at me, the user of the remote control. Call the front — the TV end, and the back — the sofa end of the remote control.
I spin the remote control like a baton twirler. If I hold it at the sofa end and attempt to flip it so that it turns only half a revolution, I can be almost sure that I will correctly get the TV end and not the sofa end. And if I attempt to flip it a full revolution, again I can almost surely get the sofa end successfully. It is not a random event whether I catch the sofa end or the TV end (here ignoring those throws when I catch neither end) when doing only half a revolution or one revolution. The result is quite predictable in both these simple maneuvers so far.
When I say the result is “predictable,” I mean that you would not bet with me about whether this time I’ll get the TV or the sofa end. So we say that the outcome of my flip aiming at half a revolution is not “random.”
When I twirl the remote control so little, I control (almost completely) whether the sofa end or the TV end comes down to my hand; this is the same as saying that the outcome does not occur by chance.
The terms “random” and “chance” implicitly mean that you believe that I cannot control or cannot know in advance what will happen.
Whether this twirl will be the rare time I miss, however, should be considered chance. Though you would not bet at even odds on my catching the sofa end versus the TV end if there is to be only a half or one full revolution, you might bet — at (say) odds of 50 to 1 — that I will make a mistake and get it wrong, or drop it. So the very same flip can be seen as random or determined depending on what aspect of it we are looking at.
Of course you would not bet against me about my not making a mistake, because the bet might cause me to make a mistake purposely. This “moral hazard” is a problem that emerges when a person buys life insurance and may commit suicide, or when a boxer may lose a fight purposely. The people who stake money on those events say that such an outcome is “fixed” (a very appropriate word) and not random.
Now I attempt more difficult maneuvers with the remote control. I can do \(1\frac{1}{2}\) flips pretty well, and two full revolutions with some success — maybe even \(2\frac{1}{2}\) flips on a good day. But when I get much beyond that, I cannot determine very well whether I’ll get the sofa or the TV end. The outcome gradually becomes less and less predictable — that is, more and more random.
If I flip the remote control so that it revolves three or more times, I can hardly control the process at all, and hence I cannot predict well whether I’ll get the sofa end or the TV end. With 5 revolutions I have absolutely no control over the outcome; I cannot predict the outcome better than 50-50. At that point, getting the sofa end or the TV end has become a completely random event for our purposes, just like flipping a coin high in the air. So at that point we say that “chance” controls the outcome, though that word is just a synonym for my lack of ability to control and predict the outcome. “Chance” can be thought to stand for the myriad small factors that influence the outcome.
We see the same gradual increase in randomness with increasing numbers of shuffles of cards. After one shuffle, a skilled magician can know where every card is, and after two shuffles there is still much order that s/he can work with. But after (say) five shuffles, the magician no longer has any power to predict and control, and the outcome of any draw can then be thought of as random chance.
At what point do we say that the outcome is “random” or “pure chance” as to whether my hand will grasp the TV end, the sofa end, or at some other spot? There is no sharp boundary to this transition. Rather, the transition is gradual; this is the crucial idea, and one that I have not seen stated before.
Whether or not we refer to the outcome as random depends upon the twirler’s skill, which influences how predictable the event is. A baton twirler or juggler might be able to do ten flips with a non-random outcome; if the twirler is an expert and the outcome is highly predictable, we say it is not random but rather is determined.
Again, this shows that the randomness is not a property of the physical event, but rather of a person’s knowledge and skill.
3.8 What Do We Mean by “Random”?
We have defined “chance” and “random” as the absence of predictive power and/or explanation and/or control. Here we should not confuse the concepts of determinacy-indeterminacy and predictable-unpredictable. What matters for decision purposes is whether you can predict. Whether the process is “really” determinate is largely a matter of definition and labeling, an unnecessary philosophical controversy for our purposes (and perhaps for any other purpose).3
The remote control in the previous demonstration becomes unpredictable — that is, random — even though it still is subject to similar physical processes as when it is predictable. I do not deny in principle that these processes can be “understood,” or that one could produce a machine that would — like a baton twirler — make the course of the remote control predictable for many turns. But in practice we cannot make the predictions — and it is the practical reality, rather than the principle, that matters here.
When I flip the remote control half a turn or one turn, I control (almost completely) whether it comes down at the sofa end end or the TV end, so we do not say that the outcome is chance. Much the same can be said about what happens to the predictability of drawing a given card as one increases the number of times one shuffles a deck of cards.
Consider, too, a set of fake dice that I roll. Before you know they are fake, you assume that the probabilities of various outcomes is a matter of chance. But after you know that the dice are loaded, you no longer assume that the outcome is chance. This illustrates how the probabilities you work with are influenced by your knowledge of the facts of the situation.
Admittedly, this way of thinking about probability takes some getting used to. Events may appear to be random, but in fact, we can predict them — and vice versa. For example, suppose a magician does a simple trick with dice such as this one:
The magician turns her back while a spectator throws three dice on the table. He is instructed to add the faces. He then picks up any one die, adding the number on the bottom to the previous total. This same die is rolled again. The number it now shows is also added to the total. The magician turns around. She calls attention to the fact that she has no way of knowing which of the three dice was used for the second roll. She picks up the dice, shakes them in her hand a moment, then correctly announces the final sum.
Method:. When the spectator rolls the dice, they get three numbers, one from each of the three dice. Call these numbers \(a\), \(b\) and \(c\). Then he chooses one die — it doesn’t matter which, but let’s say he chooses the third die, with value \(c\). He adds the bottom of the third die to the total. Here’s the trick: the total of opposite faces on a standard die always add up to 7; 1 is opposite 6, 2 is opposite 5, and 3 is opposite 4. So the total is now \(a + b + 7\). Then the spectator rolls the third die again, to get a new number \(d\). The total is now \(a + b + 7 + d\). When the magician turns round she can see what \(a\) and \(b\) and \(d\) are, so to get the right final total, she just needs to add 7 (Gardner 1985, p259). Ben Sparks does a nice demonstration of the trick on Numerphile YouTube.
The point here is that, until you know the trick, you (the magician) cannot predict the final sum, so the magician and the spectator consider the result as random. If you do know the trick, you can predict the result, and it is not random. Whether something is “random” or not, depends on what you know.
Consider the distributions of heights of various groups of living things (including people). When we consider all living things taken together, the shape of the overall distribution — many individuals at the tiny end where the viruses are found, and very few individuals at the tall end where the giraffes are — is determined mostly by the distribution of species that have different mean heights. Hence we can explain the shape of that distribution, and we do not say that is determined by “chance.” But with a homogeneous cohort of a single species — say, all 25-year-old human females in the U.S. — our best description of the shape of the distribution is “chance.” With situations in between, the shape is partly due to identifiable factors — e.g. age — and partly due to “chance.”
Or consider the case of a basketball shooter: What causes her or him to make (or not make) a basket this shot, after a string of successes? Much must be ascribed to chance variation. But what causes a given shooter to be very good or very poor relative to other players? For that explanation we can point to such factors as the amount of practice or natural talent.
Again, all this has nothing to do with whether the mechanism is “really” chance, unlike the arguments that have been raging in physics for a century. That is the point of the remote control demonstration. Our knowledge and our power to predict the outcome gradually transits from non-chance (that is, “determined”) to chance (“not determined”) in a gradual way even though the same sort of physical mechanism produces each throw of the remote control.
Earlier I mentioned that when we say that chance controls the outcome of the remote control flip after (say) five revolutions, we mean that there are many small forces that affect the outcome. The effect of each force is not known, and each is independent of the other. None of these forces is large enough for me (as the remote control twirler) to deal with, or else I would deal with it and be able to improve my control and my ability to predict the outcome. This concept of many small influences — “small” meaning in practice those influences whose effects cannot be identified and allowed for — which affect the outcome and whose effects are not knowable and which are independent of each other is important in statistical inference. For example, as we will see later, when we add many unpredictable deviations together, and plot the distribution of the result, we end up with the famous and very common bell-shaped normal distribution — this striking result comes about because of a mathematical phenomenon called the Central Limit Theorem.4
3.9 Randomness from the computer
We now have the idea of random variation as being variation we cannot predict. For example, when we flip the remote control through many rotations, we can no longer easily predict which end will land in our hand. We can call the result of any particular flip — random — because we cannot predict whether the result will be TV end or sofa end.
We still know some things about the result — it will be one of two options — TV or sofa (unless we drop it). But we cannot predict which. We say the result of each flip is random if we cannot do anything to improve our prediction of 50% for TV (or sofa) end on the next flip.
We are not saying the result is random in any deep, non-deterministic sense — we are only saying we can treat the result as random, because we cannot predict it.
Now consider getting random numbers from the computer, where the numbers can either be 0 or 1. This is rather like tossing a fair coin, where the results are 0 and 1 rather than “heads” and “tails”.
When we ask the computer for a random choice between 0 and 1, we accept it is random-enough, or random-like, if we can’t do anything to predict which of 0 or 1 we will get on any one trial. We can’t do better than guessing that the next value will be — say — 0 — and whichever number we guess, we will only ever have a 50% chance of being correct. We are not saying the computer is giving truly random numbers in the sense that they are fundamentally not deterministic, it is only giving us numbers we cannot distinguish from truly random numbers, because we cannot in practice do anything to predict them. The technical term for random numbers from the computer is therefore pseudo-random — meaning, like random numbers, in the sense they are effectively unpredictable. Effectively unpredictable means there is no practical way for you, or even a very powerful computer, to do anything to improve your prediction of the next number in the series.
3.10 The philosophers’ dispute about the concept of probability
Those who call themselves “objectivists” or “frequentists” and those who call themselves “personalists” or “Bayesians” have been arguing for hundreds or even thousands of years about the “nature” of probability. The objectivists insist (correctly) that any estimation not based on a series of observations is subject to potential bias, from which they conclude (incorrectly) that we should never think of probability that way. They are worried about the perversion of science, the substitution of arbitrary assessments for value-free data-gathering. The personalists argue (correctly) that in many situations it is not possible to obtain sufficient data to avoid considerable judgment. Indeed, if a probability is about the future, some judgment is always required — about which observations will be relevant, and so on. They sometimes conclude (incorrectly) that the objectivists’ worries are unimportant.
As is so often the case, the various sides in the argument have different sorts of situations in mind. As we have seen, the arguments disappear if one thinks operationally with respect to the purpose of the work, rather than in terms of properties, as mentioned earlier.
Here is an example of the difficulty of focusing on the supposed properties of the mechanism or situation: The mathematical theorist asserts that the probability of a die falling with the “5” side up is 1/6, on the basis of the physics of equally-weighted sides. But if one rolls a particular die a million times, and it turns up “5” less than 1/6 of the time, one surely would use the observed proportion as the practical estimate. The probabilities of various outcomes with cheap dice may depend upon the number of pips drilled out on a side. In 20,000 throws of a red die and 20,000 throws of a white die, the proportions of 3’s and 4’s were, respectively, .159 and .146, .145 and .142 — all far below the expected proportions of .167. That is, 3’s and 4’s occurred about 11 percent less often that if the dice had been perfectly formed, a difference that could make a big difference in a gambling game (Bulmer 1979, 18).
It is reasonable to think of both the engineering method (the theoretical approach) and the empirical method (experimentation and data collection) as two alternative ways to estimate a probability. The two methods use different processes and different proxies for the probability you wish to estimate. One must adduce additional knowledge to decide which method to use in any given situation. It is sensible to use the empirical method when data are available. (But use both together whenever possible.)
In view of the inevitably subjective nature of probability estimates, you may prefer to talk about “degrees of belief” instead of probabilities. That’s fine, just as long as it is understood that we operate with degrees of belief in exactly the same way as we operate with probabilities. The two terms are working synonyms.
Most important: One cannot sensibly talk about probabilities in the abstract, without reference to some set of facts. The topic then loses its meaning, and invites confusion and argument. This also is a reason why a general formalization of the probability concept does not make sense.
3.11 The relationship of probability to the concept of resampling
There is no all-agreed definition of the concept of the resampling method in statistics. Unlike some other writers, I prefer to apply the term to problems in both pure probability and statistics. This set of examples may illustrate:
Consider asking about the number of hits one would expect from a 0.250 (25 percent) batter in a 400 at-bat season. One would call this a problem in “probability.” The sampling distribution of the batter’s results can be calculated by formula or produced by Monte Carlo simulation.
Now consider examining the number of hits in a given batter’s season, and asking how likely that number (or fewer) is to occur by chance if the batter’s long-run batting average is 0.250. One would call this a problem in “statistics.” But just as in example (1) above, the answer can be calculated by formula or produced by Monte Carlo simulation. And the calculation or simulation is exactly the same as used in (1).
Here the term “resampling” might be applied to the simulation with considerable agreement among people familiar with the term, but perhaps not by all such persons.
Next consider an observed distribution of distances that a batter’s hits travel in a season with 100 hits, with an observed mean of 150 feet per hit. One might ask how likely it is that a sample of 10 hits drawn with replacement from the observed distribution of hit lengths (with a mean of 150 feet) would have a mean greater than 160 feet, and one could easily produce an answer with repeated Monte Carlo samples. Traditionally this would be called a problem in probability.
Next consider that a batter gets 10 hits with a mean of 160 feet, and one wishes to estimate the probability that the sample would be produced by a distribution as specified in (3). This is a problem in statistics, and by 1996, it is common statistical practice to treat it with a resampling method. The actual simulation would, however, be identical to the work described in (3).
Because the work in (4) and (2) differ only in question (4) involving measured data and question (2) involving counted data, there seems no reason to discriminate between the two cases with respect to the term “resampling.” With respect to the pairs of cases (1) and (2), and (3) and (4), there is no difference in the actual work performed, though there is a difference in the way the question is framed. I would therefore urge that the label “resampling” be applied to (1) and (3) as well as to (2) and (4), to bring out the important fact that the procedure is the same as in resampling questions in statistics.
One could easily produce examples like (1) and (2) for cases that are similar except that the drawing is without replacement.5 And one could adduce the example of prices in different state liquor control systems (see Section 12.15) which is similar to cases (3) and (4) except that sampling without replacement seems appropriate. Again, the analogs to cases (2) and (4) would generally be called “resampling.”
The concept of resampling is defined in a more precise way in Section 8.9.
3.12 Conclusion
We define “chance” as the absence of predictive power and/or explanation and/or control.
When the remote control rotates more than three or four turns I cannot control the outcome — whether TV or sofa end — with any accuracy. That is to say, I cannot predict much better than 50-50 with more than four rotations. So we then say that the outcome is determined by “chance.”
As to those persons who wish to inquire into what the situation “really” is: I hope they agree that we do not need to do so to proceed with our work. I hope all will agree that the outcome of flipping the TV gradually becomes unpredictable (random) though still subject to similar physical processes as when predictable. I do not deny in principle that these processes can be “understood,” certainly one can develop a machine (or a baton twirler) that will make the outcome predictable for many turns. But this has nothing to do with whether the mechanism is “really” something one wants to say is influenced by “chance.” This is the point of the demonstration with the sofa and TV ends of the remote control. The outcome traverses from non-chance (determined) to chance (not determined) in a smooth way even though the physical mechanism that produces the revolutions remains much the same over the traverse.
At one time, some writers believed there was a difference between “objectively sharply defined” and “objectively vague” probabilities. Raiffa (1968) gives a clear example of why this is not so:
Suppose you are confronted with two options. In option 1, you must toss coin 1 (which is fair and true), guess heads or tails, and win $1.00 if you match and lose $1.00 if you fail to match. In option 2, you have a 50-50 chance of getting coin 2, which has two heads, or of getting coin 3, which has two tails. Not knowing whether you are tossing coin 2 or 3, you must call, toss, and get the payoffs as in option 1. With option 1, the probability of the toss coming out heads is .5; with option 2, the same probability is either 0 or 1, and since the chance of each in turn is .5, the probability of heads is ultimately .5 once again. Nothing is to be gained by saying that one .5 is sharply defined and that the other is fuzzy. Of course, if, and this is a big “if,” you could experiment with the coin you will toss before you are obliged to declare, then the two options are manifestly asymmetrical. Barring this privilege, the two options are equivalent (Raiffa 1968, 108).↩︎
This does not mean that I think that people should confine their learning to what they need in their daily work. Having a deeper philosophical knowledge than you ordinarily need can help you deal with extraordinary problems when they arise.↩︎
The idea that our aim is to advance our work in improving our knowledge and our decisions, rather than to answer “ultimate” questions about what is “really” true is in the same spirit as some writing about quantum theory. In 1930 Ruarck and Urey wrote: “The reader who feels disappointed that the information sought in solving a dynamical problem on the quantum theory is [only] statistical … should console himself with the thought that we seldom need any information other than that which is given by the quantum theory.” (1930, 622).↩︎
The Central Limit Theorem is an interesting mathematical result that proves something you can show for yourself by simulation — that if we take means of many values drawn from any shape of distribution, and then look at the distribution of the resulting means, it will be close to the normal (bell-curve) distribution. If you are interested in a technical (mathematical) explanation of this result, see the Wikipedia page on the Central Limit Theorem.↩︎
One example of drawing without replacement is the sampling version of Ronald Fisher’s permutation test — see (Fisher 1935; Fisher 1960, chap. II, section 5).↩︎