FIRST THOUGHTS ABOUT RESAMPLING FOR WORK AND LEARNING
Julian L. Simon and Peter Bruce
Probability theory and its offspring, inferential
statistics, constitute perhaps the most frustrating branch of
human knowledge.
Right from its beginnings in the seventeenth century, the
great mathematical discoverers knew that the probabilistic way of
thinking -- which we'll call "prob-stats" for short -- offers
enormous power to improve our decisions and the quality of our
lives. Prob-stats can aid a jury's deliberation about whether to
find guilty a person charged with murder...reveal if a new drug
boosts survival from a cancer...help steer a spacecraft to
Saturn...inform the manager when to take a pitcher out of the
baseball game...aid a wildcatter to calculate how much to invest
in an oil well...and a zillion other good things, too.
Yet until very recently, when the resampling method came
along, scholars were unable to convert this powerful body of
theory into a tool that laypersons could and would use freely in
daily work and personal life. Instead, only professional
statisticians feel themselves in comfortable command of the prob-
stats way of thinking. The most frequent applications are by
social and medical scientists, who know that prob-stats is
indispensable to their work yet too often fear and misuse it.
It was about 1615 that Italian gamblers brought to Galileo
Galilei a problem in the game of three dice. The theorists of
the day had figured as equal the chances of getting totals of 9
and 10 (also 11 and 12), because there are the same number of
ways (six) of making those points -- for example, a nine can be
126, 135, 144, 234, 225, and 333. But players had found that in
practice 10 is made more often than 9. How come?
Galileo then invented the device of the "sample space" of
possible outcomes. He colored three dice white, gray, and black,
and systematically listed every possible permutation. The
previous theorists - including Gottfried Leibniz - had instead
lumped together into a single category the various possible ways
of getting (say) a 3, 3, and 4 to make 10. That is, they listed
combinations rather than permutations, and various combinations
contain different numbers of permutations.
Galileo's analysis confirmed the gamblers' empirical
results. Ten does come up more frequently than 9, because there
are 27 permutations that add to 10 whereas there are only 25
permutations that add to 9.
The use of repeated trials to learn what the gamblers wanted
to know illustrates the power of the resampling method -- which
we can simply call "simulation" or "experimentation" here. And
with sufficient repetitions, one can arrive at as accurate an
answer as desired. Not only is the resampling method adequate,
but in the case of three dice it was a better method than
deductive logic, because it gave the more correct answer. Though
the only logic needed was enumeration of the possibilities, it
was too difficult for the doctors of the day. The powers of a
Galileo were necessary to produce the correct logic.
Even after Galileo's achievement, the powers of Blaise
Pascal and Pierre Fermat were needed to correctly analyze with
the multiplication rule another such problem - the chance of at
least one ace in four dice throws. (This problem, presented by
the Chevalier de la Mere, is considered the origin of probability
theory.) For lesser mathematical minds, the analysis was too
difficult. Yet ordinary players were able to discern the correct
relative probabilities, even though the differences in
probabilities are slight in both the Galileo and Pascal-Fermat
problems. Simulation's effectiveness is its best argument.
One might rejoin that the situation is different after
Galileo, Pascal, Fermat and their descendants have invented
analytic methods to handle such problems correctly. Why not use
already existing analytic methods instead of resampling?
The existence of a correct algorithm does not imply that it
will be used appropriately, however. And a wrongly-chosen
algorithm is far worse than no algorithm at all -- as the
Chevalier's pocketbook attested. In our own day, decades of
experience have proven that "pluginski" -- the memorization of
formulas that one cannot possibly understand intuitively -- may
enable one to survive examinations, but does not provide usable
scientific tools.
Prob-stats is the bane of the large numbers of students who
consider the statistics course a painful rite of passage -- like
fraternity paddling -- on the way to an academic degree. Among
those who study it, at the end of the semester most happily put
prob-stats out of their minds forever.
The statistical community has made valiant attempts to
ameliorate this sad situation. Great statisticians have
struggled to find interesting and understandable ways to teach
prob-stats. Learned committees and professional associations
have wrung their hands in despair, and spent millions of dollars
creating television series and text books.
Despite successes, these campaigns to promote prob-stats
have largely failed. The enterprise smashes up against an
impenetrable wall - the body of complex algebra and tables that
only a rare expert understands right down to the foundations.
For example, almost no one can write the formula for the "Normal"
distribution that is at the heart of most statistical tests.
Even fewer understand its meaning. Yet without such
understanding, there can be only rote learning.
Almost every student of probability and statistics simply
memorizes the rules. Most users of prob-stats select their
methods blindly, understanding little or nothing of the basis for
choosing one method rather than another. This often leads to
wildly inappropriate practices, and contributes to the damnation
of statistics. That's wny many have concluded that probabilistic
statistical inference is too difficult and that students should
be taught mainly descriptive statistic, rather than how to draw
inferences probabilistically, which is really the heart of
statistics.
The new resampling method, in combination with the personal
computer, promises to change all this. Resampling may finally
realize the great potential of statistics and probability.
Resampling estimates probabilities by numerical experiments
instead of with formulae -- by flipping coins or picking numbers
from a hat, or with the same operations simulated on a computer.
And the computer language-program RESAMPLING STATS performs these
operations in a transparently clear and simple fashion.
The best mathematicians now accept resampling theoretically.
And controlled studies show that people ranging from engineers
and scientists down to seventh graders quickly handle more
problems correctly than with conventional methods. Furthermore,
in contrast to the older conventional statistics, which is a
painful and humiliating experience for most students at all
levels, the published studies show that students enjoy resampling
statistics.
THE REAPPEARANCE OF RESAMPLING IN THE HISTORY OF STATISTICS
Resampling returns to a very old tradition. In ancient
times, mathematics in general, and statistics in particular,
developed from the needs of governments and rich men to count
their armies and flocks, and to enumerate the taxpayers and their
possessions. Up until the beginning of the twentieth century,
the term "statistic" meant "state-istics", the number of
something the "state" was interested in -- soldiers, births, or
what-have-you. Even today, the term "statistic" usually means
the quantity of something, such as the important data for the
United States in the Statistical Abstract of the United States.
These numbers are now known as "descriptive statistics," in
contrast to "inferential statistics" which is the science that
tells us how reliable is a set of descriptive statistics.
Another stream of thought appeared by way of gambling in
France in the 17th century. Throughout history people had
learned about the odds in gambling games by experimental trial-
and-error experience. To find the chance of a given hand
occurring in a card game, a person would deal out a great many
hands and count the proportion of times that the hand in question
occurred. That was the resampling method, plain and simple.
Then in the year 1654, the French nobleman Chevalier de Mere
asked the great mathematician and philosopher Blaise Pascal to
help him deduce what the odds ought to be in some gambling games.
Pascal, the famous Pierre Fermat, and others went on from there
to develop analytic probability theory, and Jacob Bernouilli and
Abraham DeMoivre initiated the formal theory of statistics. The
experimental method disappeared into mathematical obscurity
except for its use when a problem is too difficult to be answered
theoretically, as happens from time to time in the development of
statistical tests -- for example, the development of the famous
t-test by "Student", the pen-name of William S. Gossett -- and
the World War II "Monte Carlo" simulations for complex military
"operations research" problems such as how best to search for
submarines with airplanes.
Later on, these two streams of thought -- descriptive
statistics and probability theory -- joined together. Users of
descriptive statistics wondered about the accuracy of the data
originating from both sample surveys and experiments. Therefore,
statisticians applied the theory of probability to assessing the
accuracy of data, and thereby created the theory of inferential
statistics.
Resampling is best understood by seeing it being learned,
and Chapter III-1 transcribes a class in action. Here is a brief
snippet from a typical class, whether seventh-graders or PhD's in
statistics:
The instructor says "Good day" and immediately asks, "What
are the chances that if I have four children that three of those
children will be girls?" Someone says "Put a bunch of kids into
a hat and pick out four at a time". After the laughs, Teach
says, "Nice idea, but it might be a bit tough to do...Other
suggestions?"
Someone else says, "Have four kids and see what you get."
Teach praises this idea because it points toward learning from
experiment, one of the key methods of science. Then s/he adds,
"But let's say you have four children once. Will that be enough
to give you an acceptable answer?" So they discuss how big a
sample is needed, which brings out the important principle of
variability in the samples you draw. Teach then notes that to
have (say) a hundred families, it could take quite some time,
plus some energy and money, so it doesn't seem to be practical at
the moment. "Another suggestion?"
Someone suggests taking a survey of families with four
children. Teach applauds this idea, too, because it focuses on
getting an answer by going out and looking at the world. But
what if a faster answer is needed?
A student wonders if it is possible to "do something that is
like having kids. Put an equal number of red and black balls in
a pot, and pull four of them out. That would be like a family."
This kicks off discussion about how many balls are needed, and
how they should be drawn, which brings out some of the main
concepts in probability -- sampling with or without replacement,
independence, and the like.
Then somebody wonders whether the chance of having a girl
the first time you have a child is the same as the chance of a
red ball from an urn with even numbers of red and black balls, an
important question indeed. This leads to discussion of whether
50-50 is a good approximation. This brings up the question of
the purpose of the estimate, and the instructor suggests that a
clothing manufacturer wants to know how many sets of matched
girls' dresses to make. For that purpose, 50-50 is okay, the
class says.
Coins are easier to use than balls, all concur. Someone
wonders whether four coins flipped once give the same answer as
one coin flipped four times. Eventually all agree that trying it
both ways is the best way to answer the question.
Teach commissions one student to orchestrate the rest of the
class in a coin-flipping exercise. Then the question arises: Is
one sample of (say) thirty coin-flip "families" enough? So the
exercise is repeated several times, and the class is impressed
with the variability from one sample of thirty to the next. Once
again the focus is upon variability, perhaps the most important
idea inherent in prob-stats.
Or another example: Teach asks, "What are the chances that
basketball player Magic Johnson, who averages 47 percent success
in shooting, will miss 8 of his next 10 shots?" The class shouts
out joking suggestions such as "Go watch Magic," and "Try it
yourself on the court." Teach responds, "Excellent ideas, good
scientific thinking, but not feasible now. What else could we
do?"
The students discuss a coin-flipping experiment, but someone
complains that flipping coins or dealing cards would be
wearisome. Aha! Now Teach breaks out the computer and suggests
doing the task faster, more accurately, and more pleasurably with
the following computer instructions:
REPEAT 100 obtain a hundred simulation trials
GENERATE 10 1,1OO A generate 10 numbers randomly between 1
and 100
COUNT A 1,53 B count the number of misses in the trial
(Magic's shooting average is 47% hits, 53% misses.)
SCORE B Z record the result of the trial
END end the repeat loop for a single trial
COUNT Z 8,10 K count the number of trials with 8 or more
misses
The histogram (Figure 1) shows the results of 100 trials, and
Figure 2 shows the results of 1000 trials. The amount of
variability obviously diminishes as the as the number of trials
increases, an important lesson.
Figures 1 and 2
Then Teach asks: "If you see Magic Johnson miss 8 of 10
shots after he has returned from an injury, should you think that
he is in a shooting slump?" Now the probability problem has been
inverted into a problem in statistical inference -- testing the
hypothesis that Magic is in a slump. And with proper
interpretation the same computer program yields the appropriate
answer -- about 6.5 percent of the time Magic will miss 8 or more
shots out of 10, even if he is not in a slump. So you probably
don't take him out or stop him from shooting. Understanding this
sort of variability over time is the key to Japanese quality
control.
Now the instructor changes the question again and asks: "If
you observe a player -- call him Houdini -- succeed with 47 of
100 shots, how likely is it that if you were to observe the same
player take a great number of shots -- a thousand or ten thousand
-- his long-run average would turn out to be 53 per cent or
higher?" A sample of 47 baskets out of 100 shots could come from
players of quite different "true" shooting percentages.
Resampling can help us make transparent several different
approaches to this problem in "inverse probability".
Clearly, we need to have some idea of how much variation
there is in samples from shooters like Houdini. If we have no
other information, we might reasonably proceed as if the 47/100
sample is our best estimate of Houdini's "true" shooting
percentage. We could take repeated samples from a 47% shooting
machine to estimate how great is the variation among shooters
with long-run averages in that vicinity, from which we could
estimate the likelihood that the "true" average is 53%. (This is
the well-known "confidence interval" approach. In truth, the
logic is a bit murky, but that is seldom a handicap in daily
practice.)
Alternatively, we might be interested in a particular
shooting percentage -- say, Houdini's lifetime average before a
shoulder injury. In such a case, we might want to know whether
the 47 for 100 is just a spell of below-average shooting, or an
indication that the injury has affected his play. In this
situation, we could repeatedly sample from a 53% shooting machine
to see how likely a 47/100 sample is. Using this "hypothesis-
testing" approach, if we find that the 47/100 sample is very
unusual, we conclude that the injury is hampering Houdini; if
not, not.
Consider still another possibility: If Houdini is a rookie
with no history in the league, we might want to apply additional
knowledge about how often 53% shooters are encountered in the
league. Here we might bring in information about the
"distribution" of the averages of other players, or of other
rookies, to learn the likelihood of a 47/100 sample in light of
such a distribution -- a "Bayesian" approach to the matter.
The resampling approach to problems like this one helps
clarify the problem. Because there are no formulae to fall back
upon, you are forced to think hard about how best to proceed.
Foregoing these crutches may make the problem at hand seem
confusing and difficult, which is sometimes distressing. But in
the long run it is also the better way, because it forces you to
come to terms with the subtle nature of such problems rather than
sweeping these subtle difficulties under the carpet. You will
then be in a better position to choose a step-by-step logical
procedure which fits the circumstances.
To repeat, in the absence of black-box computer programs and
cryptic tables, the resampling approach forces you to directly
address the problem at hand. Then, instead of asking "Which
formula should I use?" students ask such questions as "Why is
something `significant' if it occurs 4% of the time by chance,
yet not `significant' if a random process produces it 8% of the
time?"
A DEFINITION AND GENERAL PROCEDURE FOR RESAMPLING
Let us define resampling to include problems in inferential
statistics, as well as problems in probability, with this
"operational definition": Using the entire set of data you have
in hand, or using the given data-generating mechanism (such as a
die) that is a model of the process you wish to understand,
produce new samples of simulated data, and examine the results of
those samples. That's it in a nutshell. In some cases, it may
also be appropriate to amplify this procedure with additional
assumptions.
Problems in pure probability may at first seem different in
nature than problems in statistical inference. But the same
logic as stated in this definition applies to both varieties of
problems. The only difference is that in probability problems
the "model" is known in advance -- say, the model implicit in a
deck of poker cards plus a game's rules for dealing and counting
the results -- rather than the model being assumed to be best
estimated by the observed data, as in resampling statistics.
The steps used in solving the particular problems in the
previous section have been chosen to fit the specific facts. We
can also describe a more general procedure which simulates what
we are doing when we estimate a probability using resampling
operations.
Step A. Construct a simulated "universe" of cards or dice
or some other randomizing mechanism whose composition is similar
to the universe whose behavior we wish to describe and
investigate. The term "universe" refers to the system that is
relevant for a single simple event. For example:
a) A coin with two sides, or two sets of random numbers "1-
105" and "106-205", simulates the system that produces a single
male or female birth, when we are estimating the probability of
three girls in the first four children. Notice that in this
universe the probability of a girl remains the same from trial
event to trial event -- that is, the trials are independent --
demonstrating a universe from which we sample with replacement.
b) An urn containing a hundred balls, 47 red and 53 black,
simulates the system that produces 47 percent baskets.
Hard thinking is required in order to determine the
appropriate "real" universe whose properties interest you.
Step(s) B. Specify the procedure that produces a pseudo-
sample which simulates the real-life sample in which we are
interested. That is, specify the procedural rules by which the
sample is drawn from the simulated universe. These rules must
correspond to the behavior of the real universe in which you are
interested. To put it another way, the simulation procedure must
produce simple experimental events with the same probabilities
that the simple events have in the real world. For example:
a) In the case of three daughters in four children, you can
draw a card and then replace it if you are using a deck of red
and black cards. Or if you are using a random-numbers table, the
random numbers automatically simulate replacement. Just as the
chances of having a boy or a girl do not change depending on the
sex of the preceding child, so we want to ensure through
replacement that the chances do not change each time we choose
from the deck of cards.
b) In the case of Magic Johnson's shooting, the procedure
is to consider the numbers 1-47 as "baskets", and 48-100 as
"misses".
Recording the outcome of the sampling must be indicated as
part of this step, e.g. "record `yes' if girl or basket, `no' if
a boy or a miss."
Step(s) C. If several simple events must be combined into a
composite event, and if the composite event was not described in
the procedure in step B, describe it now. For example:
a) For the three girls in four children, the procedure for
each simple event of a single birth was described in step B. Now
we must specify repeating the simple event four times, and
counting whether the outcome is or is not three girls.
b) In the case of Magic Johnson's ten shots, we must draw
ten numbers to make up a sample of shots, and examine whether
there are 8 or more misses.
Recording of "three or more girls" or "two or less girls",
and "8 or more misses" or "7 or fewer", is part of this step.
This record indicates the results of all the trials and is the
basis for a tabulation of the final result.
Step(s) D. Calculate the probability of interest from the
tabulation of outcomes of the resampling trials. For example:
the proportions of a) "yes" and "no", and b) "8 or more" and "7
or fewer", estimate the likelihood we wish to estimate in step C.
There is indeed more than one way to skin a cat (ugh!). And
there is always more than one way to correctly estimate a given
probability. Therefore, when reading through the list of steps
used to estimate a given probability, please keep in mind that a
particular list is not sacred or unique; other sets of steps will
also do the trick.
Under the supervision of Kenneth Travers and Simon at the
University of Illinois during the early 1970s, PhD candidates
Carolyn Shevokas and David Atkinson studied how well students
learned resampling, working with experimental and control groups
of junior college and four-year college students. Both found
that with resampling methods -- even without the help of computer
simulation -- students produce a larger number of correct answers
to numerical problems than do students taught with conventional
methods. Furthermore, attitude tests as well as teacher
evaluations showed that students enjoy the subject much more, and
are much more enthusiastic about it than conventional methods.
It is an exciting experience to watch graduate engineers or
high-school boys and girls as young as 7th grade re-invent from
scratch the resampling substitutes for the conventional tests
that drive college students into confusion and despair. Within
six or nine hours of instruction, students are generally able to
handle problems usually dealt with only in advanced university
courses.
The computer-intensive resampling method also provides a
painless and attractive introduction to the use of computers. And
it can increase teacher productivity in the school and university
systems while giving students real hands-on practice.
Monte Carlo methods have long been used to teach
conventional methods. But resampling has nothing in common with
this teaching of conventional "parametric" statistics. Rather,
resampling is an entirely different method, and one of its
strengths is that it does not depend upon the assumption that the
data resemble the "Normal" distribution. Resampling should be
the method of choice for dealing with a wide variety of everyday
statistical problems -- perhaps most of them.
To repeat, the purpose of resampling is not to teach
conventional statistics. Rather, resampling breaks completely
with the conventional thinking that dominated the field until the
past decade, rather than being a supplement to it or an aid to
teaching it.
For those in academia and business who may use statistics in
their work but who will never study conventional analytic methods
to the point of practical mastery -- that is, almost everyone --
resampling is a functional and easily-learned alternative. But
resampling is not intended to eliminate analytic methods for
those who would be mathematical statisticians. For them,
resampling can help to understand analytic methods better. And
it may be especially useful for the introduction to statistics of
mathematically-disadvantaged students. (The method is in no way
intellectually inferior to analytic methods, however; it is
logically satisfactory as well as intuitively compelling.)
Though we and the mathematical statisticians who have
written about resampling have an identical intellectual
foundation, they and we are pointed in different directions.
They see their work as intended mainly for complex and difficult
problems; we view resampling as a tool for all (or almost all)
tasks in prob-stats. Our interest is in providing a powerful
tool that researchers and decision-makers, rather than
statisticians, can use with small chance of error and with sound
understanding of the process.
Like all innovations, resampling has encountered massive
resistance. The resistance has largely been conquered with
respect to mathematical statistics and advanced applications.
But instruction in the use of resampling at an introductory
level, intended for simple as well as complex problems, still
faces a mix of apathy and hostility.
CONCLUSION
Estimating probabilities with conventional mathematical
methods is often so complex that the process scares many people.
And properly so, because the difficulties lead to frequent
errors. The statistical profession has long expressed grave
concern about the widespread use of conventional tests whose
foundations are poorly understood. The recent ready availability
of statistical computer packages that can easily perform
conventional tests with a single command, irrespective of whether
the user understands what is going on or whether the test is
appropriate, has exacerbated this problem. This has led teachers
to emphasize descriptive statistics and even ignore inferential
statistics.
Probabilistic analysis is crucial, however. Judgments about
whether to allow a new medicine on the market, or whether to re-
adjust a screw machine, require more than eyeballing the data to
assess chance variability. But until now, the practice and
teaching of probabilistic statistics, with its abstruse structure
of mathematical formulas cum tables of values based on
restrictive assumptions concerning data distributions -- all of
which separate the user from the actual data or physical process
under consideration -- have not kept pace with recent
developments in the practice and teaching of descriptive
statistics.
Beneath every formal statistical procedure there lies a
physical process. Resampling methods allow one to work directly
with the underlying physical model by simulating it. The term
"resampling" refers to the use of the given data, or a data
generating mechanism such as a die, to produce new samples, the
results of which can then be examined.
The resampling method enables people to obtain the benefits
of statistics and probability theory without the shortcomings of
conventional methods, because it is free of mathematical formulas
and restrictive assumptions and is easy to understand and use,
especially in conjunction with the computer language and program
RESAMPLING STATS.
page # teachbk II-1tool May 7, 1996