CHAPTER II-3
THE SAGA OF RESAMPLING STATISTICS
I'll try to liven up the story a bit by telling it as a
drama.
Too much book-learning, too little understanding. The
students had swallowed but not digested a bundle of statistical
ideas which now misled them, taught by professors who valued
fancy mathematics even if useless or wrong.
It was the spring of 1967 at the University of Illinois, a
class in research methods in business with four graduate students
working toward the PhD degree. I required each student to start
and finish an empirical research project as a class project. Now
the students were presenting their work in class. Each used
wildly wrong statistical tests to analyze their data.
"Why do you use the technique of seemingly-unrelated
regressions?" I asked Moe Taher (names here are fictitious).
"I want to be up-to-date," said Taher.
"How much statistics have you studied?" I asked.
"Two undergraduate and three graduate courses," Taher
answered proudly.
I cradled my head in my hands, frustrated because a simple
count of the positive and negative cases in Taher's sample was
enough to reveal a clear-cut conclusion. The fancy method the
student used was window-dressing, and wrong at that. It was the
same story with the other three students.
All had had several courses in statistics. But when the
time came to apply even the simplest statistical ideas and tests
in their research projects, they were lost. Their courses had
plainly failed to equip them with the simplest usable statistical
tools.
I wondered: How could I teach the students to distill the
meaning from their data? Simple statistical methods suffice in
most cases. But by chasing after the latest sophisticated
fashions the students overlook these simple methods, and instead
use unsound methods.
I remembered trying to teach a friend a concept in
elementary statistics by illustrating it with some coin flips. I
wondered: Given that the students' data had a random element,
could not the data be "modeled" with coins or cards or random
numbers, doing away with any need for complicated formulas?
Next class I shelved the scheduled topics, and tried out
some problems using the resampling method (though that label had
not yet been invented). First I had the students estimate the
chance of getting two pairs in a poker hand by dealing out hands.
Then I asked them the chances of getting three girls in a four-
child family. After they recognized that they did not know the
correct formula, I demanded an answer anyway. After some other
interesting ideas -- of the sort illustrated later -- one of the
students eventually suggested flipping coins.
With that the class was off to the races. Soon the students
were inventing ingenious ways to get answers -- and sound answers
-- to very subtle questions in probability and statistics by
flipping coins and using random numbers. The students were
excited, and so was I.
Then it was natural to wonder: Could even children learn
this powerful way of dealing with the world's uncertainty? And
might it be possible that young people who had not yet been
influenced by formula-type methods would pick up these simulation
methods even faster than the graduate students?
Max Beberman, the inventive guru of the "new math", then
headed the mathematics department in the University High School.
In literally four minutes, Beberman agreed that the method had
promise, and asked me if I would be willing to spend some hours
with a class of volunteer juniors and seniors. This quick
acceptance surprised me, because in the prior weeks I had shown
the method to several colleagues in various departments
(including mathematics) but had been received with a thundering
lack of enthusiasm. This was to be repeated again and again:
The most creative mathematicians and scientists respond favorably
to the method, whereas the more humdrum professors tend to be
unenthusiastic or hostile.
The dozen high-school kids in Uni High's special math course
had a ball. In six class hours they were able to discover
solutions and generate correct numerical answers for the entire
range of problems ordinarily found in a semester-long university
introductory statistics class. Furthermore, the students loved
the work.
Together with Allen Holmes, the regular teacher of the
class, I published the results in The Mathematics Teacher. The
article generated a bit of discussion, but it petered out over
the next few years.
Burning with the zealot's fire, I presented the new method
to any group or class that would listen. The response was
generally cool. The most curious experience was when, in the
spring of 1969, I was teaching at Hebrew University in Jerusalem.
Louis Guttman, the famous psychometrician, found the concept
interesting, and invited me to lecture on it to a statistics
workshop. The first part of the lecture was a disaster. The
audience looked blank and uncomprehending, and I broke into a
cold sweat. Later it came out that the Israeli audience did not
know the game of poker, from which I drew several examples, which
is why they did not understand what I was saying.
Over the objections of the Random House editor (one of the
few such battles I've ever won), my 1969 Basic Research Methods
in Social Science included five chapters detailing various
applications. Not only did I hope to reach some working
researchers who teach statistics (as distinguished from
mathematical statisticians), but I wanted to stake out the ground
for the future moment when the statistics profession would
finally come to these methods - as I believed it inevitably
would.
The method did not sweep into the high schools and
universities and research laboratories like a tidal wave - or
even a trickle. Of course, entirely new scientific ideas often
take decades to penetrate people's thinking. And there were some
signs of progress. Nevertheless, progress was almost
imperceptible.
After developing one of the applications of resampling -- a
powerful substitute for the t-test in the tradition of the
"Exact" permutation (also called "randomization" test), based on
an idea by R.A. Fisher and worked out by E. J. G. Pitman-- I sent
it to a journal for publication. The editor referred me to two
papers as predecessors. Neither Fisher nor Pitman apparently had
thought of sampling among permutations, but Meyer Dwass in 1957,
and J.H. Chung and D. Fraser 1958, had discovered that
development. One could argue that those two papers could claim
the main discovery of resampling, though they had limited
themselves to just that single application.
Years later I found that Alvan Feinstein had re-discovered
the Monte Carlo version of Fisher's permutation test in 1962
(???) though he did not make much fuss about it, and much later
later Brian Manly (1991) also rediscovered it and made it the
centerpiece of a book on statistical inference in biology.
I did not bother to write up any more specific applications
for technical journals because I figured that they would not be
perceived as fundamentally new, given that the resampling idea is
the key discovery and the rest is elaboration. And the idea of
using the method across the board - which I saw as the central
idea - was not something that one could present in a technical
paper.
At about the same time there was a lengthy correspondence
with William Kruskal. I argued to him (and also in journal
publications) that resampling methods could fill all statistical
needs, and are sufficient as a body of knowledge, even if
traditional analytic methods have advantages to the professional
mathematician in providing additional insights. This observation
I consider the most important proposition about resampling as a
method. Kruskal did not accept those claims.
In the course of the correspondence, Kruskal asked if
resampling could handle the confidence intervals, and I proceeded
to show how it could be done with what is now known as the
"bootstrap"; a closely-related example was in the 1969 book.
This was the idea that ten years later, when independently stated
by Bradley Efron, took the world of mathematical statistics by
storm and is now regarded as one of the handful of great
twentieth-century discoveries in statistics.
In the textbook on research methods in social science which
I published in 1969, I included five (?) chapters on resampling
methods, intending the chapters to be a basic compendium as well
as a device for staking out the field. The series editor who
worked closely on the research methods text with me was Hanan
Selvin, a sociologist who also was an accomplished statistician.
Amiable and broad-minded though Hanan was, I could never get
even him enthusiastic about resampling. He loved his formal
mathematics, though he never tried to dragoon me into offering a
conventional treatment of statistics. He would have preferred,
however, if I had omitted the resampling material.
My reason for using the bootstrap device only in the context
of sample size rather than confidence intervals (except in the
correspondence with Kruskal) was mainly that confidence intervals
wee at that time (at perhaps still to this date) almost never
seen in practice in the fields to which the book was mainly
addressed - sociology, business, and economics. I may also have
been deterred by the difficulty of interpretation of the concept
of confidence intervals in an introductory text. (Hanan Selvin
wrote perhaps the first - and still well-known - paper
criticising common use of significance tests. And he was even
less sympathetic toward confidence intervals, as I remember,
which I think also contributed toward my leaving out a treatment
of them in the book.)
It seemed to me then - and seems to me still - that it is
the idea of considering first a resampling test for all
situations is the radical idea, and the bootstrap itself is
rather obvious once one develops the resampling propensity. That
is why I did not consider it a huge discovery when I used it in
the context of choosing a sample size (1969, p. 000) or when I
set it out in detail in the context of confidence intervals in my
correspondence with William Kruskal.
After a presentation of the resampling method in 1972 or so
to the University of Illinois mathematics department seminar of
Joseph Doob -- by general agreement as good a probabilist as
there is on the face of the earth -- Doob said not a word. At
the end of the seminar I asked him: "Are you silent because you
find problems with the method?" Doob answered: "No theoretical
problems. My only question is whether you can teach teachers to
teach it." A prophetic statement.
An early difficulty with resampling had been that users and
students complained that dealing cards, flipping coins, and
consulting tables of random numbers gets tiresome. Therefore, in
1973, with the programming assistance of Dan Weidenfeld, I
developed the computer language called RESAMPLING STATS (earlier
called SIMPLE STATS). My method was to work through a series of
problems one by one, write down the steps need to handle the
problem with non-computer methods, and then design a computer
command that would mimic the non-computer operation; by the time
I had worked through fifteen or so types of problems, I figured
that I had covered most of the necessary operations, at least for
a start. We then published a letter about it in American
Statistician.
Early in the 1970s I got in touch with Kenneth Travers, who
was responsible for secondary mathematics at the College of
Education at the University of Illinois. He liked the idea, and
we agreed that together we would organize systematic controlled
experimental tests of the method. And over the next several
years Travers served as PhD adviser to several students who did
just that. He also organized summer workshops at the University
of Illinois for high school teachers, and wrote texts co-authors.
He melded the resampling material with the conventional approach
as a tactical device, and kept clear of sharp statements of the
method. (He even refused to be a co-author with me and his two
PhD students of an article mentioned below, though he properly
could share the credit.)
Carolyn Shevokas's thesis (see Chapter 00) studied junior
college students who had little aptitude for mathematics. She
taught the resampling approach to two groups of students (one
with and one without computer), and taught the conventional
approach to a control group. She then tested the groups on
problems that could be done either analytically or by resampling.
Students taught with the resampling method were able to solve
more than twice as many problems correctly as students who were
taught the conventional approach.
David Atkinson taught the resampling approach and the
conventional approach to matched classes in general mathematics
at a small college (see Chapter 00). The students who learned
the resampling method did better on the final exam with questions
about general statistical understanding. They also did much
better solving actual problems, producing 73 percent more correct
answers than the conventionally-taught control group.
These experiments were (and are) strong evidence that
students who learn the resampling method are able to solve
problems better than are conventionally taught students. And
since then we have acquired a mess of corroborating
evidence.
A book describing a range of applications seemed a possible
way to get the message out. So I wrote such a book in about
1973, using the chapters in the 1969 research methods text as the
base. But though I sent the typescript to dozens of publishers,
I could not find a taker - right up to 1992.
When doing empirical work I found a resampling approach
useful again and again. For example, when comparing elasticities
of consumption of cigarettes with respect to price (Lyon and
Simon, 1968) it was natural to do a resampling test comparing
states with higher and lower income levels. And in a complex
econometric paper on the effect of advertising expenditures on
sales (1969), I used a bootstrap procedure to decide whether
successive variables were likely to be meaningful. But - an
experience shared by many researchers in the 1980s - referees did
not comprehend the procedure and therefore I found it prudent or
required not to mention the resampling test in the final texts.
Over the years, I sent various materials to many of the
notables in statistics. Kruskal responded with extended thought-
ful correspondence. But others - such as Frederick Mosteller,
who now says about the bootstrap that "There's no question but
that it's very, very important" (New York Times, Nov. 8, 1988,
C1, C6) - did not even acknowledge my letters.
The mainframe computer program Dan Weidenfeld wrote was not
interactive, and therefore an erroneous comma could force the
user to wait another day for another try, at which time another
comma might be out of order. No professional programmer seemed
able to or interested in producing an interactive program until a
bright high school kid, Derek Kumar, came along. By this time it
was 1981, so Kumar wrote a lovely little program for the Apple
computer. The lack of readily available computing power and tools
had been an additional obstacle. The advent of the PC has
changed that. (Later on at the University of Maryland, an
interactive program for the IBM PC was developed with the help of
Chad McDaniels and others; Carlos Puig has brought the program to
the state of the art.)
Then, in the late 1970s a great wave of work followed
Efron's initial publications on the bootstrap. Efron and other
mathematical statisticians focused on studying the properties of
the bootstrap, especially for advanced applications. In
contrast, my main point is that resampling could and perhaps
should be used for all (or almost all) probabilistic-statistical
applications, simple as well as complex.
Eric Noreen apparently reinvented the entire resampling idea
for himself in the context of accounting and related business
problems (1989), and has attracted attention to the idea with his
emphasis on the role of the computer, calling these "computer-
intensive methods".
Among applied statisticians interest has recently exploded,
in conjunction with the availability of easy, fast, and
inexpensive computer simulations. The bootstrap excited the most
interest at first, but across-the-board use of these methods now
seems at hand. An entire book has now appeared with this
message:
Basically, there is a computer-intensive alternative to
just about every conventional parametric and nonparametric
test. If the significance of a test statistic can be
assessed using conventional techniques, then its significance
can almost always be assessed using computer-intensive
techniques. The reverse is not true, however. (Eric Noreen,
Computer Intensive Methods for Testing Hypotheses, Wiley,
1989.)
Leading mathematical statisticians - starting with Doob, as
mentioned earlier - agree that the resampling method is logically
flawless and intellectually beyond reproach. But still there is
enormous resistance to introducing the method for everyday use,
and in teaching. In response I have made this still-standing
public offer:
I will stake $5,000 in a contest against any teacher
of conventional statistics, with the winner to be
decided by whose students get the larger number of both
simple or complex realistic numerical problems correct,
when teaching similar groups of students for a limited
number of class hours -- say, six or ten. And if I
should win, as I am confident that I will, I will con-
tribute the winnings to the effort to promulgate this
teaching method. (Here it should be noted that I am
far from being the world's best teacher, and I certain-
ly am not among the more charming. It is the material
that I have going for me, and not my personality or
teaching skills.)
Alas, no takers.
(This is not the sort of talk heard in academia every day,
and it turns off some conservative types, so I add, "The intel-
lectual history of probability and statistics began with gambling
games and betting. Therefore, perhaps a lighthearted but very
serious offer would not seem inappropriate here.")
Though statisticians are busily exploring the properties of
the bootstrap, and applying it regularly to problems that are
difficult with conventional analysis, interest has been
concentrated on the method's properties rather than in its
instruction and use. Almost no one has made it her/his business
to take these ideas to teachers and students and introduce
resampling into the regular curriculum in high schools and
colleges The American Statistical Association has recently moved
in this direction, as has the National Council of Teachers of
Mathematics, which urges that simulation be given as much
attention as analytics in teaching probability.
Like all innovations, this one has encountered massive
resistance. Many factors always militate against adoption of new
technology, including the accumulated intellectual and emotional
investment in existing methods. Early on, leading statisticians
either did not accept the idea, or ignored it; now they say that
the method is a great breakthrough, but should not be taught to
introductory students. Numerous technical journals rejected
articles on the method because it is too simple and lacks "real
mathematics"; for years publishers have turned down my book about
resampling on grounds that the ideas are sound but that there
would be no market because instructors would not accept them --
and the publishers may be right. The National Science Foundation
has rejected applications for grants in several categories, on
assorted grounds. School systems have simply been too preoccu-
pied with their usual business to be willing to develop new
curricula. The American Statistical Association has invested
large amounts of money and effort in developing a video series
and printed materials to try to teach the old ways more effec-
tively.
There is no conspiracy. But individually, just about every
channel has been closed. This is despite the fact that no one
any longer denies the basic validity or the practical usefulness
of these ideas. Resistance stems from many roots.
ROOTS OF RESISTANCE TO RESAMPLING
Legions of instructors have an investment in their stocks of
conventional knowledge, their reputations, and their lecture
notes, which it is costly for them to replace with an unfamiliar
method. Some lack conviction that resampling is better than
"real" mathematics. Others reject simulation methods because
they find "real math" more aesthetic, and cannot or will not
recognize that most people do not share their mathematical
aptitudes and aesthetic tastes. Still others won't teach the
method because they feel that it is difficult to do eye-catching
"sophisticated" research with it that will be published well and
advance their careers. And some applied departments use analytic
statistics as a tool to weed out students who do not care for
mathematics.
Furthermore, resampling requires more spontaneity on the
part of the instructor that do conventional formulas. The
instructor must interact with the students as they invent anew
the appropriate methods for particular problems. Many instructors
are more comfortable simply handing down formulae from on high,
with the students scrambling to keep up and too frantic to ask
hard questions such as "Where do the data come from?" In many
schools, too, there has been the logistic problem that computers
are absent.
(Another difficulty is that my central scholarly interest is
the economics of population, which has absorbed most of his
energies over the years. And I have not been a card-carrying
statistician, which inevitably puts off the statistical
establishment.)
Over the years I have made a vast number of attempts, along
a great number of lines, to interest people in the subject. The
major jump has been being joined by Peter Bruce, a former foreign
service officer and recent MBA, who is now promoting the method
full time.
Commercial distribution of the computer has been one of the
dissemination methods we have worked on, not primarily to make
money but rather to use the power of the market mechanism to
reach persons who may be interested. To date, however, marketing
initiatives have not produced the revenue necessary to get the
enterprise flying. I have been financing this operation out of
savings, just because there has seemed no other way to give these
ideas a chance to be used.
Another reason for the lack of penetration into the
curriculum is the usual barrier against innovation -- the
conservatism of the instructors who have a huge investment in
their stock of conventional knowledge, their reputations, and
their lecture notes; this is discussed at greater length in
Chapter 00.
THE RELATIONSHIP OF RESAMPLING TO THE HISTORY OF STATISTICS
Resampling returns to a very old tradition. In ancient
times, mathematics in general, and statistics in particular,
developed from the needs of governments and rich persons to
number armies, flocks, and especially to count the taxpayers and
their possessions. Up until the beginning of the twentieth
century, the term "statistic" meant the number of something the
"state" was interested in -- soldiers, births, or what-have-you.
In many cases, the term "statistic" still means the number of
something; the most important statistics for the United States
are in the Statistical Abstract of the United States. These
numbers are now known as "descriptive statistics."
Another stream of thought appeared by way of gambling in
France in the 17th century. Throughout history people had
learned about the odds in gambling games by trial-and-error
experience. But in the year 1654, the French nobleman Chevalier
de Mere asked the great mathematician and philosopher Pascal to
help him determine what the odds ought to be in some gambling
games. Pascal, the famous Fermat, and others went on from there
to develop modern probability theory.
Later on these two streams of thought came together. People
wanted to know the accuracy of their descriptive statistics, not
only the descriptive statistics originating from sample surveys
but also the numbers arising from experiments. Therefore,
statisticians applied the theory of probability to the accuracy
of the data arising from sample surveys and experiments; this is
the theory of inferential statistics.
Later, probability theory began to be developed for another
context in which there is uncertainty -- decision-making.
Descriptive statistics like those used by insurance companies --
for example, the number of people per thousand in each age
bracket who die in a five-year period -- have for centuries been
used in deciding how much to charge for insurance policies.
The likelihoods usually can be estimated on the basis of a great
many observations with rather good precision without complex
calculation, and the main statistical task is gathering this
information.
In business and political decision-making situations,
however, one usually works with likelihoods that are based on
very limited information, often little better than guesses. The
question is how best to combine these guesses about various
likelihoods into an overall likelihood estimate. Therefore, in
the modern probabilistic theory of decision-making in business,
politics, and war, the emphasis is on methods of combining
estimates of probabilities which depend upon each other in
complicated ways in order to arrive at a desirable decision --
similar to the gambling games which were the origin of
probability and statistics.
Estimating probabilities with conventional mathematical
methods is often so complex that the process scares many people.
And properly so, because the difficulties lead to errors. The
statistical profession has expressed grave concern about the
widespread use of conventional tests whose foundations are poorly
understood. The ready availability of statistical computer
packages that can easily perform these tests with a single
command, irrespective of whether the user understands what is
going on or whether the test is appropriate, has exacerbated this
problem.
Probabilistic analysis is crucial, however. Judgments about
whether to allow a new medicine on the market, or whether to re-
adjust a screw machine, require more than eyeballing the data to
assess chance variability. But until now, the practice and
teaching of probabilistic statistics, with its abstruse structure
of mathematical formulas, tables of values, and restrictive
assumptions concerning data distributions -- all of which
separate the user from the actual data or physical process under
consideration -- has not kept pace with recent developments in
the practice and teaching of descriptive statistics.
Beneath every formal statistical procedure there lies a
physical process. Resampling methods allow one to work directly
with the underlying physical model by simulating it. The term
"resampling" refers to the use of the given data, or a data
generating mechanism such as a die, to produce new samples, the
results of which can then be examined. The term `computer-
intensive methods' is also used to refer to techniques such as
these.
The resampling method enables people to obtain the benefits
of statistics and predictability without the shortcomings of
conventional methods, because it is free of mathematical formulas
and restrictive assumptions. Hence the method seemed to have
extraordinary promise. Not only did Simon think so, but so did
some of the readers of his
THE FUTURE?
What will happen? Eventually progress will win out, as
always. The question is: How long will it take? How many more
bad analyses will be done in science and business because black-
box formulae are misused? How many more students will suffer and
be turned off of this extraordinarily valuable tool of thinking
and acting? More about this in Chapter 00
AFTERNOTE:
IMAGINED DIALOGUE WITH A STATISTICIAN RE RESAMPLIHG STATS
U: Is resampling theoretically acceptable?
S: It has its practical drawbacks, but there is nothing
wrong with it theoretically.
U: Was that always the opinion of the profession?
S: [Laughs]. To tell the truth, earlier it was the
opposite. People said it was OK practically, but no good
theoretically.
U: What about Simon's claim to some priority with
resampling and bootstrap?
S: Ridiculous. Efron invented the bootstrap way back
before 1979.
U: What about Simon's claim to have written and taught this
stuff back in the 1960s?
S: That wasn't bootstrap, just Monte Carlo probability,
which is old stuff.
U: Have you looked at Simon's writings?
S: No, I don't need to.
U: How come you don't need to?
S: No one cites it. If it was really original it would
have been picked up, used, and now cited.
U: Then how come you know that it is just old-stuff Monte
Carlo probability?
S: It must be....
And on and on.