CHAPTER I-6
THE RELATIONSHIP BETWEEN PROBABILITY-MATHEMATICS
AND STATISTICAL INFERENCE
This chapter distinguishes between a) the theory of
probability estimation and its practical estimation on the one
hand, and b) inferential statistics on the other.
Here is a brief statement of the relationship between the
two bodies of knowledge: Inferential statistics equals
probability models and calculations plus a set of rules for which
probability model to use to fit particular situations, plus a set
of principles of interpretation of the results of the
manipulation of the probability model.
The term "probability theory" refers to situations in which
you know the nature of the system you are working with, and you
wish to estimate the probability that the system will produce one
or more particular events. For example, you can assume you know
from the start the nature of a deck of bridge cards, and you want
to estimate say the probability that such a deck with 13 spades
among 52 cards will produce ten spades in the first thirteen
cards dealt.
In contrast, the term "inferential statistics" refers to
situations in which you do not know the nature of the system you
are dealing with, and you want to infer the nature of the system
from the evidence in hand. For example, someone may deal 10
spades to you in the first 13 cards, and you -- not knowing what
kind of deck it is -- want to estimate how likely it is that the
deck has only 13 spades among the 52 cards, or that in fact it
has a larger proportion of spades.
To put it another way, in an inferential-statistics
situation we want to characterize aspects of an unknown system;
the mean and the median are examples of parameters that we wish
to infer about an unknown system. In contrast, probability
theory tells us about the probability of particular occurrences
within systems whose parameters we already know.
Probability theory clearly is relevant to situations such as
gambling with cards or dice where the physical nature of the
system is known. It is also relevant to such business situations
as life insurance, where the overall probabilities of dying at
each age are well known from a great deal of prior experience.
(Business situations in which one does not know the structure of
the situation but is prepared to assume the nature of the
structure can similarly be dealt with using probability theory.)
Inferential statistical thinking is particularly relevant
for scientific investigations. In much of science the researcher
tries to determine the nature of an unknown system from the
evidence that s/he collects about it. It also is relevant to most
decision-making. We may therefore define inferential statistics
as the quantification of uncertainty. Whenever we are uncertain,
and are willing to bring data to bear or otherwise quantify the
simple probabilities, inferential statistics is relevant.
Some writers distinguish between the role of chance in a)
measurement and estimation, and b) theory. The end of the tunnel
called "probability" tends to apply to theoretical uses - for
example, in genetics, and in oligopoly theory in economics. The
end of the tunnel we call "statistics" tends to apply to
measurement and estimation - for example, discriminating between
hypotheses, and putting reliability bounds on estimates. Quantum
theory seems to be a mix of the two.
A probability question asks: With system A, what might
happen? A statistics question asks: What caused the results Z
which did happen? You answer the statistics question by matching
the results produced by known systems A, B, C...to the results Z
to see which come closer and which are less close.
Though in statistics we want to know if results Z were
produced by system A, the operational question asks whether
system A often does produce results like Z. The distinction
between "was produced" and "does produce" is often very
subtle.
Problems referred to as "probability" in standard texts have
these two steps in common:
1. Stipulate one or more universes (populations), which may
be a generating mechanism such as a die or a population such as
the residents of the United States.
2. Describe possible samples from the stipulated
universe(s) in terms of their likelihoods.
All problems in statistical inference also contain kernels
of probability problems which include the above two steps. In
addition, problems in statistics also include a third step:
3a. If a test of a hypothesis: Compare the observed data
against the results of step 2 to see how frequently the observed
sample or one that is even more "surprising" arises.
3b. If an investigation of confidence limits: Find the
boundaries which partition the results in step 2 into one or two
groups at the tail(s) of chosen size (say five percent) into the
most surprising results and those which are less surprising.
In addition, in problems of statistical inference the
decision about which universe to stipulate in step 1 can be very
complex because it is likely to be influenced by the purpose for
which the work is being done as well as the scientific styles and
tastes of the statistician and researcher. This implies that the
choice of universe can seem arbitrary and hence is often
controversial.
Also, the decision about which comparisons to make between
the observed sample and the probabilistic results in step 2 can
be both complex and controversial in problems of inference. This
is another way in which inferential work is less "objective" and
"mechanical" than are problems in probability; that is, the
calculations are more straightforward in problems that are only
probabilistic rather than also inferential.
The close connection between the two sorts of problems can
be seen in the fact that the statistics question - What is the
likelihood that this sample Z comes from a universe that has
properties X and Y? - is answered with the same computation as
the probability question: What is the likelihood that a universe
with properties X and Y will produce a sample like Z?
Here is another way or looking at the definition of
"resampling":
1. Consider asking about the number of hits one would expect
from a .250 (25 percent) batter in a 400 at-bat season. One
would call this a problem in "probability". The answer can be
calculated by formula or produced by Monte Carlo simulation.
2. Now consider examining the number of hits in a given
batter's season, and asking how likely that number (or fewer) is
to occur by chance if the batter's long-run batting average is
.250. One would call this a problem in "statistics". Just as in
(1) above, the answer can be calculated by formula or produced by
Monte Carlo simulation. And the calculation or simulation is
exactly the same as used in (1).
Here the term "resampling" might be applied to the
simulation with considerable agreement among people familiar with
the term, but perhaps not by all.
3. Next consider an observed distribution of distances that
a batter's hits travel in a season with 100 hits, with an
observed mean of 150 feet. One might ask how likely it is that a
sample of 10 hits drawn with replacement from the observed
distribution (mean of 150 feet) would have a mean greater than
160 feet, and one could easily produce an answer with repeated
Monte Carlo samples. Traditionally this would be called a
problem in probability.
4. Next consider that a batter gets 10 hits with a mean of
160 feet, and one wishes to ask the probability that the sample
would be produced by a distribution as specified in (3). This is
a problem in statistics, and by now common practice would treat
it with a bootstrap technique - called "resampling" by definition
of all. The bootstrap simulation would, however, be identical to
the work described in (3).
Because the work in (4) and (2) differ only in the former
being measured data and the latter being counted data, there
seems no reason to discriminate between the two with respect to
the term "resampling". With respect to the pairs of (1) and (2),
and (3) and (4), there is no difference in the actual work
performed, though there is a difference in the way the question
is framed. I would therefore urge that the label "resampling" be
applied to (1) and (3) as well as to (2) and (4), to bring out
the important fact that the procedure is the same as in
resampling questions in statistics.
One could easily produce examples like (1) and (2) except
that the drawing is without replacement, as in the sampling
version of Fisher's permutation test - for example, a tea taster.
And one could adduce the example of prices in different state
liquor control systems (see Chapter III-1) which is similar to
(3) and (4) except that sampling without replacement seems more
appropriate. Again, the analogs to (2) and (4) would generally
be called "resampling".
As an example of the relationship between probability and
statistics, consider the first published case of statistical
inference, by John Arbuthnot. As we shall see in greater length
in Chapter 00, Arbuthnot in 1712 observed that year after year
the number of boys born in London was larger than the number of
girls, and he wanted to know if he could properly infer from the
sample that this is a natural law. He proceeded sensibly by
considering how likely it would be to see a larger proportion of
boys 82 years in a row if the probability is "really" .5 for each
sex.
The terms "probability" and "statistical inference" are used
as labels in a great variety of ways and do not correspond to any
neat differences. For example, the field known as "statistical
mechanics" in physics has nothing to do with statistical
inference or any other topic usually known as "statistics", but
is exactly the sort of problem usually treated under the label
"probability". And it is an open question whether the first use
of the Normal distribution - in astronomy, to decide whether
certain observations of stars should be considered flukes or not
- should be considered probability or statistics, as it is also
unclear whether forecasts based on statistical data - say,
weather forecasts - are exercises in one or the other discipline.
Here is an example of a case that is hard to classify: The
Bureau of Standards sends out small quantities of test reagents
to laboratories, along with a statement of the likelihood that
the reagent is within certain boundaries of some property. The
boundaries are derived from a sample of the main reservoir of the
reagent. This plus-or-minus statement can be seen as a question
asked about the probability that a known universe (but known from
perhaps only 10 observations) will produce a specimen with a
given mean and standard deviation. Or one can see the prediction
as inferred from the sample evidence of the ten observations of
the reservoir.
Perhaps we should think of statistics as meta-probability,
though some probability questions are not statistics. Or we can
say that all statistics questions are probabilistic, but not all
probability questions are statistical.
Probability is a very easy subject compared to statistics.
Probability is almost purely mathematical, in the sense that the
probabilist usually need not worry about the purpose of the work,
or the design of the study, but need simply answer the question
as posed: the probability that the spaceship will hit the moon,
or that the factory will leak pollution into Bhopal, or that
three of four machines will cease to function today.
Furthermore, the mathematics of probability is easy to
teach, if one agrees that simulation is an admissible technique.
This may be seen in how supposedly-challenging problems in
probability are easy to do correctly with simulation. (See
Chapter 00.)
Statistics is the opposite of probability in those respects.
Even more important, the issue of purpose is at the root of every
problem in statistics. Should one do a one-tail test or a two-
tail test? The answer must depend upon the purpose of the
investigation.
An example: Four black FBI agents were sitting together at
a table in the restaurant, as were four white FBI agents, and the
white agents were served in reasonable time but the black agents
were not. The court sought to answer the question: Was there
probably a pattern here, or would it likely happen by chance? It
is easy to consider these data a standard probability problem
turned around. But is it possible to say anything sensible about
this event without knowing anything more about the context - how
many seating patterns have been observed, and so on? I think
not. In this respect, statistics is quite different than
probability. And teaching or publishing only the simple
mathematics of this problem profoundly misleads the reader, the
court (if it is involved), or the student.
Unlike probability, statistics is inevitably intertwined
with - or is a handmaiden to - research methods, and issues like
choice of design. For example, should one use a paired-
comparisons setup or not?
Some writers have considered probability problems to be
deduction whereas statistical inference they call induction (see
Chapter I-2 on this concept), on the grounds that the postulated
universe is known in the former but inferred in the latter. For
example, the likely behavior of a sample of incomes drawn from
the U. S. population can be considered a deduction, and an
estimate of the population's incomes from a sample considered
induction. However, I consider both to be exercises in induction
because they arrive at conclusions on the basis of insufficient
knowledge; a probabilistic statement is made without the micro-
knowledge of the random-selection process that would permit
perfect selection. Only where all necessary information is
available to draw a conclusion with certainty can one consider
the activity to be deduction, in this view.
For a non-statistical example, a prisoner escapes through
cell bars that have been bent. Did the prisoner bend the bars
with his arms? Sherlock Holmes might put together a set of clues
that would allow him to draw a conclusion for sure - an act of
deduction. But if we have imperfect knowledge of the prisoner's
strength and of the strength necessary to bend the bars, we can
only induce an imperfect conclusion.
If we draw a sample of prisoners, and test their abilities
to bend bars of the thickness of those in the escapee's cell, and
if we have no reason to believe that the escapee was of other
than average strength, we might say something about the
probability that he bent the bars with his arms. This would seem
to differ from Holmes' "deduction" only in its supposed
certainty. (Incidentally, whether such a statement should be
considered part of the study of "probability" or of "statistics"
is unclear, illustrating the lack of clarity in the boundary
between them.)
The term "inverse probability" has caused so much confusion
and controversy over the past two centuries that it may be wisest
to forego using it. (For enlightening discussion of the topic,
see Stigler, 1986)
Just as every problem in statistics contains a kernel of a
problem in probability (as in the Arbuthnot example earlier,
which will be discussed at greater length in Chapter 00), just
about every problem in probability can be imagined to have a dual
problem in statistics. For example, a problem in probability may
ask: What is the probability that if a firm reassigns managers
in 30 cities by lottery (allowing managers to draw the cities
they are now in), two or more managers will draw the cities that
they are now in? One can turn around this situation and ask:
The firm conducted its lottery and observed 7 matches out of 30.
Is there any reason to believe that the lottery is fixed? The
latter is a statistics problem which is handled by computing how
likely such a result is to occur by chance, though as is the case
with all statistical problems, more is involved in the
interpretation than the probability calculation.
Returning to the black and white FBI agents: First, one can
see how this is a standard probability problem turned around.
One would explore this question by asking: If there are four
officers of each color in the universe and four are chosen
randomly, what is the probability that the result will be four
blacks (and therefore also four whites?) That is a pure question
in probability.
Second and interesting, is it possible to say anything
sensible about this event without knowing anything more about the
context - how many seating patterns have been observed, and so
on? I think not. In this respect, statistics is quite different
than probability. And teaching or publishing only the simple
mathematics of this problem profoundly misleads the reader, the
court (if it is involved), or the student.
THE RELATIONSHIP OF PROBABILITY TO THE CONCEPT OF RESAMPLING
There is no all-agreed definition of the concept of the
resampling method in statistics. Unlike some other writers, I
prefer to apply the term to problems in both pure probability and
to those in statistics. This set of examples may illustrate:
1. Consider asking about the number of hits one would expect
from a .250 (25 percent) batter in a 400 at-bat season. One
would call this a problem in "probability". The answer can be
calculated by formula or produced by Monte Carlo simulation.
2. Now consider examining the number of hits in a given
batter's season, and asking how likely that number (or fewer) is
to occur by chance if the batter's long-run batting average is
.250. One would call this a problem in "statistics". But just
as in example (1) above, the answer can be calculated by formula
or produced by Monte Carlo simulation. And the calculation or
simulation is exactly the same as used in (1).
Here the term "resampling" might be applied to the
simulation with considerable agreement among people familiar with
the term, but perhaps not by all such persons.
3. Next consider an observed distribution of distances that
a batter's hits travel in a season with 100 hits, with an
observed mean of 150 feet per hit. One might ask how likely it
is that a sample of 10 hits drawn with replacement from the
observed distribution of hit lengths (with a mean of 150 feet)
would have a mean greater than 160 feet, and one could easily
produce an answer with repeated Monte Carlo samples.
Traditionally this would be called a problem in probability.
4. Next consider that a batter gets 10 hits with a mean of
160 feet, and one wishes to estimate the probability that the
sample would be produced by a distribution as specified in (3).
This is a problem in statistics, and by 1996, common statistical
practice would treat it with a bootstrap technique - called
"resampling" by all. The actual bootstrap simulation would,
however, be identical to the work described in (3).
Because the work in (4) and (2) differ only in question (4)
involving measured data and question (2) involving counted data,
there seems no reason to discriminate between the two cases with
respect to the term "resampling". With respect to the pairs of
cases (1) and (2), and (3) and (4), there is no difference in the
actual work performed, though there is a difference in the way
the question is framed. I would therefore urge that the label
"resampling" be applied to (1) and (3) as well as to (2) and (4),
to bring out the important fact that the procedure is the same as
in resampling questions in statistics.
One could easily produce examples like (1) and (2) for cases
that are similar except that the drawing is without replacement,
as in the sampling version of Fisher's permutation test - for
example, a tea taster. And one could adduce the example of
prices in different state liquor control systems (see Chapter 8)
which is similar to cases (3) and (4) except that sampling
without replacement seems appropriate. Again, the analogs to
cases (2) and (4) would generally be called "resampling".
The concept of resampling is defined in a more precise way
in Chapter 00. Fuller discussion may be found in Chapter 00.