A LESSON IN RESAMPLING STATISTICS
Julian L. Simon
Teacher ("T"): Good morning. Let's talk about poker. What
is the chance of getting a pair of two cards of the same
denomination -- two fives, say, or two queens -- in a hand of
five cards dealt to you?
Student Abel: l in 5.
T: What do you mean by "1 in 5"?
Students: [Silence]
T: You mean that every single time you deal five hands you
can expect to get a pair?
Doug: One in five times on the average.
T: How sure are you that it's one in five?
Abel: Well, it seems to me that I usually get a pair about
every five times.
T: What would you say if I told you it's not one in five,
but instead the chances are 1 in 2?
Becky: I'd say "Prove it".
T: Who said "Prove it?"
Becky: Me -- I say it's about one in twenty.
T: So we've got a variety of views here -- one in twenty,
one in five, one in two. How would you go about finding out
who's right?
Becky: Ask an expert.
T: Well, that's one possibility. Getting advice from
people who know a lot about a subject is always a wise first
tactic. But how would you know for sure whether the so-called
expert knows what she or he is talking about? Finding an expert
who is really an expert is not easy unless you are an expert
yourself.
Let's assume that you don't have a tested expert handy. How
would you go about finding a reliable answer on your own?
Charlie: Calculate from how many cards are in the deck, and
how many cards you have. Use a formula.
T: Okay, how exactly should we calculate? Does anyone here
know what the right formula is?
[Silence]
T: Does that mean that we are stuck? Is there anything we
can do if we don't know the formula? And by the way, people
often think they know the right formula but don't, and therefore
calculate the wrong answer. That is a very big danger unless you
are a skilled mathematician.
Is there anything we can do now?
Charlie: Deal some hands.
T: Deal some hands? That's a wild and radical idea.
[Laughter] What do you mean?
Charlie: Deal some cards.
T: Give us an example of what you mean.
Charlie: Play poker and keep track.
T: Let's be more specific. How would you do it?
Charlie: Okay, deal five cards --
T: Well, just by coincidence I brought a few cards with me.
[Dumps thirty decks of cards on the table.] Pass them around.
Charlie, tell us exactly what to do with the cards. You're the
boss. Stand up here in front and give us instructions.
[Charlie gets up and comes up front]
Charlie: Okay, you students [laughter] this is what we're
going to do. Everybody deal out five cards.
Becky: Do we shuffle the deck first?
T: Good question. Should they shuffle, Charlie?
Charlie: First shuffle the deck and then deal five cards.
[Students shuffle and deal a hand.]
Charlie: How many of you have a pair?
[Students raise their hands if they have a pair.]
T: [Charlie] Now what?
Charlie: We can say that the chances are 7 out of 12 [the
number who have a pair among the 12 students] that you get a
pair.
T: [To the class] Does that do it? Is that our answer?
Abel: Next time we might get a different number of pairs.
T: Why is that?
Abel: Because the results differ from deal to deal.
T: Very important. Very very important. The difference
from trial to trial is one of the key ideas in probability and
statistics -- is the idea of random variability. The results
vary from one event to to the next. A large proportion of the
world's mistakes in business, sports, and politics occur because
people do not recognize random variability for what is, and
instead attach some meaning to the pattern in one particular
trial.
So what should we do about the random variability?
Becky: Deal the cards again and again, and mark down the
results.
T: So we must keep track of the results. Alright, Becky,
you're in charge now, tell us what to do.
Becky: Everybody shuffle your cards.
Doug: Do we have to shuffle the cards? How about just
dealing a second hand from the deck? Would it make a difference
whether we do that, or instead shuffle the deck and deal out five
cards from the entire shuffled deck?
[Becky is silent.]
T: What do you all think? Does it make a difference
whether we simply deal a second hand from the unshuffled deck, or
shuffle and start again?
[Some hubbub, various voices and opinions]
T: So there is a difference of opinion. How should we
settle the difference of opinion?
[Silence]
T: We can't answer every question at once. Let's assume
for the moment that it doesn't matter, but let's also agree that
we will settle the question later by the best possible method --
that is, try it out both ways. May I have your permission to
postpone?
Of course if we do replace the five cards in the hand we
deal, and use the entire deck, Doug's comment is very important,
because if you replace the cards and don't shuffle them you have
a big problem.
Now what, Becky?
Becky: Deal another hand.
Charlie: Wait a minute. How many people are playing in
this game? You could have like five people playing, or three
people playing. Wouldn't that make a difference?
T: You say the chances might be different if you had five
people playing or three people. That's a very interesting
question. But let's put that aside for the moment, and go on
with what we were doing.
Essie: Shuffle them up and do it again.
T: How many times are we going to do this, Essie?
Essie: Everyone should deal ten hands.
T: You're the boss, Becky, tell people what to do.
Becky: Everybody, ten times, deal a hand, see if you have a
pair, write down what you get. Do the whole thing ten times.
[Much dealing and writing]
Becky: Each of you tell me how many pairs you got.
[Gets the the results and writes them on the board.]
T: So what's the answer, Becky?
Becky: The chances are 55 out of 120.
T: What's that as a fraction, and as a probability?
Abel: Eleven twenty-fourths, or about 46 per cent.
T: Are 120 hands enough?
Foxey: Yeah.
T: Well, 120 hands might be enough. Obviously it depends
on how accurate you want to be, right? If we had more time, we
could deal out another 120 hands, and compare the result. If
there wasn't much difference we could be satisfied. Or we could
do it again and again. And sooner or later we would get enough
accuracy to safely play poker with, which is what we are
interested in here.
So that's how you could go about finding out the chances of
getting one pair or two pair or a royal flush in poker. If you
tried to figure it out mathematically it might take you a lot
longer to learn what you need to know. You might have to wait a
few years until you go to college and then take two courses or
six courses in probability theory, then work out the formula, and
even then there would still be a fair chance you would wind up
with the wrong formula. But with the method you all have just
worked out, you're going to get a very good answer.
Now, What are the chances of getting a seven in two throws
of the dice? Of course you've all lived very sheltered lives
and none of you have ever seen a pair of dice before, right?
[laughter] So what are the chances of throwing a seven?
[Silence]
Doug: Throw the dice and see.
T: Good move, Doug. Throw the dice once, and then what?
Doug: Write down what happens.
T: Then what?
Doug: Do it again.
T: Alright, Doug, you're the boss. You get it done.
Narrator: Doug runs the class experiment, which we won't
show to save time.
T: Now let's consider a different kind of problem. Let's
say that somebody comes along and says, what are the chances if I
have four children that three of those children will be girls?
How would you go about finding that out?
Foxey: Shuffle up a bunch of kids and deal out four.
[Laughter]
T: Sounds fine in theory, but it might be a bit difficult
to actually carry out...How about some other suggestions?.
Essie: Have four kids and see what you get.
T: Sounds good. But let's say you have four children once.
Is that going to be enough to give you a decent answer?
Charley: No. You need more families.
T: How many families do we need?
Charley: How about a hundred families?
T: So you're going to produce a hundred families. That's
reasonable. But it could take you a little while to have a
hundred families, a little strength and energy and money. So we
scratch our heads and say, hold on here. Producing a hundred
families is a very sensible idea, but it doesn't seem to be
practical at the moment.
Another suggestion?
Doug: Take a survey.
T: What do you mean by "take a survey"?
Doug: You go around and ask people who have four children
how many are girls.
T: Super idea. Absolutely super. A survey is a terrific
idea because it focuses us on trying to get an answer to a
problem like this one by going out and looking at the world
instead of just trying to do mathematics. Nothing wrong with
mathematics, but there's always a great deal to be said for
trying to get the answer by going out into the world and looking.
How many families are you going to survey, Doug?
Doug: A hundred.
T: Any particular families?
Doug: Families with four children.
T: What are you going to ask the hundred families?
Doug: How many of your children are girls?
T: You're going to find a hundred families that have four
kids, and ask each one how many are girls. Sounds good. Any
problems?
Essie: It's going to take a lot of time to find a hundred
families with four kids.
T: Yes, but it's a lot quicker than growing a hundred
families. I'll bet if the twelve of you went out now, by the end
of the day you could find a hundred families with four children
and you could get a pretty good answer to this.
T: Let's try it. Okay teacher?
Regular class teacher: We have some other things we have to
do today, unfortunately.
T: Okay, but let's remember that we could try it, and as
scientists that would be an excellent way to do it.
T: Is there another way we can tackle the problem? What
else can we do? Let's say that some businessperson comes in here
and says, "I'm going to give you a thousand dollars if you can
come up with a pretty good answer inside of one hour." You don't
have time to take a survey. What would you do?
Think about it for a few minutes. Keep in mind that a good
solution might be worth a thousand bucks. That should be enough
to make you think.
Foxey: You can think about your friends's families that
have four kids, and count how many of them have three girls.
T: Terrific idea. That's like taking a survey, but a lot
faster. Maybe that will get you the thousand dollars.
Without in any way being critical of that terrific idea,
let's ask how else might you go about it. Think back to the
first problems we solved with poker and dice.
Charlie: Simulation.
T: Simulation? What's a simulation?
Charlie: You take something like a four-sided die or
something like that.
T: In other words, you want to do something here in the
classroom which is like having kids. Can somebody get more
specific?
Essie: We could put an equal number of red and black balls
in a pot, and pull four of them out. That would be like a
family.
T: Does that make sense?
Several students: Yeah.
T: Essie, how many balls are you going to put in the pot?
Essie: Four of each.
T: How about if we put in two of each -- two red and two
black -- and you reach in and you mush them around and take out
four.
Essie: That wouldn't work.
T: Why not?
Essie: Because you'd have to have at least three red ones.
T: Exactly. So you couldn't possibly get three red ones if
you only had four balls, two red and two black. How about if you
only had six balls in there?
Essie: That wouldn't work, either.
T: Why wouldn't it work?
Essie: Because you couldn't have a combination of all
girls.
T: That's right. If every combination isn't possible,
there obviously is something wrong. Now what about four red and
four black?
George: The chance of getting four girls would still be
pretty small.
T: Let's see what is going on when we only have a few balls
in the pot? What is the chance of having a girl the first time
you have a child?
Class voices: Fifty-fifty. One in two. Fifty percent.
etc.
T: If you have four red and four black balls, what is the
chance of getting one red one? Becky?
Becky: Fifty per cent.
T: What is the chance of having a girl the second time a
real family has a child?
Becky: Fifty per cent again, I guess.
T: Now, what is the chance of drawing a red ball from a pot
that starts with four red and four black, after you draw a red
ball?
Doug: Three in seven, which is less than fifty percent.
T: Right you are, Doug. So you can see why we can't have a
pot with just three red or three black, or 4 and 4, or 10 and 10,
for the same sort of reason.
Foxey: But if we have a big pot of both red and black balls,
it would almost be okay, wouldn't it?
T: You're right, Foxey. That would be a very satisfactory
approximation. But we would need a lot of balls.
Is there some other method we could use to get around this
problem?
Let's try someone we haven't heard from lately. George, what
would you do? How would you go about it? What are you going to
put into the pot and how are you going to deal with it?
George: How about putting just two balls in, one red and one
black, and put the ball back after you draw it?
T: Bingo. You've got it exactly. We call this "sampling
with replacement", meaning that we put the ball back each time to
keep the chance of drawing a red one the same.
George, tell us exactly how we would go about making an
estimate of the chances of getting three girls in four children
using just the two balls.
George: Draw a ball, and write down what color it is.
Repeat that four times. Count the number of red balls. If the
number is "3", write down "yes", otherwise write down "no".
T: Is once through enough?
George: Do the whole operation about a hundred times.
T: Does that make sense, class?
Class voices: Yeah, yes, okay...
T: That procedure would work quite well. But we don't have
any balls. Essie, you suggested the balls. Is there any way
that we could use this thing instead? [Holds up a quarter.]
Essie: I suppose we could flip a coin and the head could be
like red, like a girl, and the tail like black.
T: Absolutely. And a coin will be easier to think about
later on. So -- how would we do it with a coin?
George: Flip the coin, Teach.
T: [Flips]. Heads. Now what?
George: Record it.
T: You do it, George. Now what are we going to do next?
George: Do it four times.
T: Ok, do it George.
[Does it]
T: What happened?
George: Two and two.
T: What does that mean?
George: It means we didn't get three girls.
T: Now what?
George: We've got to do it a lot of times.
T: Can you get the class to help you, George? Yes? Then go
ahead and do it. Come on up here and do it. I suggest you put
the results on the blackboard.
George [comes up to front]: Everybody take a a coin, flip
it, write down what you get, and do that four times.
[All do it]
George: What did you get? Abel? [Writes on board] Becky?
[Etc.]
T: What do the results say, George?
George: The results say that 2 out of 12 times we get three
girls.
Charlie: What happens if we get four girls? Do we count
that?
Narrator: Here there is discussion about whether four girls
should be counted. T ends by emphasizing that the decision
should be made with an eye to the purpose for which the estimate
is being developed.
T: Let's continue. Do we have enough trials?
Essie: With only 12, we might get different results next
time.
T: Okay, how many more trials should we do?
Essie: Let's do a hundred altogether.
T: Okay, let's let George do it. [A couple of students
groan at the joke.]
Narrator: George presides over a hundred trials and
compiles the results from each student on the blackboard.
T: What do we do with the results, Essie?
Essie: We count the number of yes's and make a ratio.
T: A ratio of what?
Essie: The ratio of yes's to yes's plus no's, because we
want to know what proportion of all the times we get yes, right?
So we compute the ratio of the yes's to all the times we tried,
all the families we had. And that will be our answer.
T: Sounds good to me. When the guy with the thousand bucks
comes storming in here and says, "Have you got my answer?," we can
say, "Ah yes," very coolly. And we'll be a thousand dollars
richer.
Foxey: I have a question. Do an equal number of boys and
girls get born? Are boys fifty per cent?
T: That is an important question. And the answer is "No."
About 105 boys are born for every 100 girls, or 106 or 104,
depending on the country. Now I ask you, Foxey, is the fact
that the ratio is, say, 105 to 100, rather than 100 to 100, a
difference big enough to spoil our method here?
Foxey: No.
T: Why not?
Foxey: Because 100 to 100 might be close enough.
T: Yes, you are right that we're interested in getting an
answer which we can consider close enough for what we want to do.
In practical life we're never interested in getting a perfectly
accurate answer, because there is never a perfectly accurate
answer. That is, the question is only whether 100 to 100 rather
than 105 to 100 is good enough for our purposes here. But that
means we've got to ask what our purposes are here.
Maybe we should ask the person who's offering to give you a
thousand dollars, "What do you want this estimate for?" And if
this person says, "Well I want to go into business making boys
clothes and girls clothes," then probably an answer which is off
by as much as would be caused by 100-100 instead of 105-100
wouldn't cause much harm. If we were trying to aim a rocket at
the moon, however, this procecure might cause us to be off target
by thousands of miles. In that case we would be sensible to pay
more attention to the accuracy and carry out the procedure a bit
differently. So it is crucial always to know just how much
accuracy we need.
Let's say that the 105 to 100 isn't all that much of a
problem for our purposes, and assume it's fifty-fifty for
convenience.
We're doing terrifically. The only problem is that this
cardshuffling and coinflipping takes time, and in more complex
problems it would take even more time. So let's speed up the
work with a handy-dandy card-dealer and coin-flipper called a
computer, this machine here. We're going to make this machine do
the same thing that we did with our coins. But we've got to tell
this machine some special words to get it to do what we want it
to do, because it is not as smart as you kids are.
Let's get the computer to flip coins for us, or rather, to
do something which "simulates" flipping coins, which in turn
simulates having children. Of course the machine doesn't really
flip coins. Rather, it only deals with symbols like numbers and
letters. So let's let "1" be a girl, and "2" be a boy.
Before we begin to write a program, we've got to do the
really hard stuff, like figuring out how to turn the machine on.
Narrator: Here we briefly show how to insert a floppy disk,
find the "On" switch, and call up the program RESAMPLING STATS
with the command "Stats". The students also are shown how to
begin with the main menu [show] and get a file [show] and then
edit a file [show cursor movement] and afterwards how to run the
file from the main menu. They are also shown that there is a
tutorial for them to study when they are alone.
T: We first give the computer a command that tells it to
make numbers. The command we use to make numbers is "generate."
[show GENERATE on screen]
You must spell each of these commands exactly, and provide
it exactly the information it requires. If you write "yenerate"
or "venerate" the machine isn't going to understand you, although
if we wanted to, we could write a program that would correctly
read most of our errors. But ordinarily the computer is very,
very specific. You've got to get it right. But if you get the
commands right, the computer won't make a mistake. So it's a
pretty good deal -- you do your part correctly, and the machine
will do its part correctly.
We want to generate four numbers, "1"s and "2"s, chosen
randomly just like flipping a coin. So we look in the Manual, or
on this "Quick List," which tells us that the first number we
write after "generate" specifies to the computer how many numbers
to generate randomly, using a random-number device inside the
computer that works like a lottery.
How many numbers do we need?
Doug: A hundred.
T: That might be the number of families we want to create.
But first we must tell the computer how many children in one
family, just as in our first step when working with coins we
decided how many times to flip a coin to get one family in our
first step.
Foxey: Four numbers.
T: Okay, we write "GENERATE (4)"
The Manual tells us that the next part of the GENERATE
command is the numbers the computer is going to make for us.
Let's make it one's and two's, but it could be "zero's" and
"one's" or whatever. So we're going to randomly generate four
numbers that are either "1" or "2".
Now we must put these numbers someplace so that we can keep
track of them. We tell the computer to put them in a little slot
someplace, and we'll call that slot "A", a special location in
the computer. So we write "GENERATE (4) (1,2) (A)".
Up until now I have been putting parentheses around what we
call the "parameters" of the command. The Apple program requires
that we do that. But for the IBM program the parentheses are not
necessary, and a space between the parameters is sufficient to do
what we call "delimit" each parameter. From here on I'll leave
off the parentheses for convenience.
Now we must tell the computer to count how many girls are
born. The next command logically is called "count". The Manual
says that we must first tell the computer where to count. So we
tell the computer to look in location A where we had put the
result from the previous step.
Next we tell the computer what to count in A -- the number
of "1"s for girls -- and where to to put the result of the COUNT,
which we decide will be location J. The command then is COUNT A
J 1.
These actions by the computer simulate what we do with
coins. We have now constructed one family with those two
commands.
We must keep a record of this result, so we put it on a
scoreboard inside the computer with the command SCORE. We must
tell the computer where to put the score. (I always call the
scoreboard Z.) We've also got to tell the computer where to look
for the result -- the Scalar J where we had stashed the result.
So -- score J Z.
You said we need not just one trial "family" but a hundred
families. So we've got to tell the computer to carry out this
whole operation a bunch of times. We order REPEAT a hundred
times to make one family. We put the REPEAT command at the
beginning of the commands for a single trial, along with the
number of repetititions we want, and then we use the command END
to finish a repetition.
You don't need to know this word, but just for the fun of it
we have just completed a "loop", which makes sense because the
machine goes round and round that loop a hundred times between
REPEAT and END.
When you get finished going around this loop you stop
because it told you how many times to go around this loop, a
hundred times. Okay? So now we've got the results of a hundred
familites. Right?
After we have completed our hundred families we need to
check the record on our scoreboard. We COUNT among the hundred
yes's (that is, 3's) and no's (that is, numbers other than 3) how
many yes'es there are. We put the answer in K and PRINT it.
Now we can extract our result from the machine. So we tell
the machine to PRINT the result. In this case the word PRINT
tells the machine to show the result on the screen. We could
also print on paper. So let's actually print. [show PRINT.]
We want to know if we got three girls. See we have our
scoreboard show the number of families with zero girls, one girl,
two girls, three girls, or four girls, in each and every family.
Of course we especially want to know how many families with three
girls.
Now we must tell the computer to RUN the program. Let's.
The program is doing it, it's going through the loops right
now.
Now we can look at Z for each case you looked to see the
number of girls. And we can look at K to see how many families
out of the hundred had three and exactly three girls.
So far we have worked problems in "probability". Let's now
consider a problem in the sub-field of probability called
"statistics". First I'm going to tell you something you won't
believe. Professional baseball players do not suffer from
slumps, and professional basketball players do not have "hot
hands".
Anybody here ever hear of Larry Bird? Well, in the first
three games of the 1988 NBA playoff series between Boston and
Detroit, Larry Bird got only baskets 20 of the 57 shots he
attempted in the first three games. Everybody agreed that Bird
was in a slump. As the Washington Post said (May 30, 1988, p.
D4):
Larry Bird is so cold he couldn't throw a beach ball in
the ocean...
They fully expect Bird to come out of his
horrendous shooting slump...
It is safe to assume that if Bird doesn't shake out of
his slump Monday, it will be difficult and probably
even impossible for Boston [continue]
What does "slump" mean? If it means anything it means that
the chance of Bird scoring a basket at the end of that period is
lower than usual. And coaches and players usually conclude that
the player should take fewer shots than usual because he does not
have a "hot hand".
Narrator: In a regular class, the following ideas would be
drawn from the students by the instructor. For lack of time, the
instructor will simply lecture.
But did Bird really have a "cool" hand? That is, was his
shooting eye less good during this period than it usually is? Or
could that sequence of events have occurred just by chance, just
as if he was a coin, which coin cannot have a hot hand? The
coin's chance of success and failure stays the same from flip to
flip, even though gamblers feel that a coin or a set of dice is
hot or cold when the coin shows a long run of misses. Therefore,
let's see just how unusual it would be for a coin that "succeeds"
48 percent of the time to show a "slump" like Bird's.
First we generate 57 numbers between 1 and 100.
GENERATE 57 1,100 A [show on screen, or printout]
Next, we count how many of those 57 shots were "baskets",
that is, were between 1 and 48 (remember that Bird is a 48
percent shooter on the average).
COUNT A 1,48 J
Next we score the result.
SCORE J Z
Then we repeat those operations 1000 times by putting a REPEAT
statement in front of those three operations that make up one
trial, and an END statement after them. Our program now looks
like this:
REPEAT 1000
GENERATE 57 1,100 A
COUNT A 1,48 J
SCORE J Z
END
Afterwards, we count the number of trials in which the
result is fewer than 21 baskets.
COUNT Z K < 21
Then we PRINT the result from K, and the results for the separate
trials in Z.
REPEAT 1000
GENERATE 57 1,100 A
COUNT A 1,48 J
SCORE J Z
END
COUNT Z < 21 K
PRINT Z K
Now let's run our program and see what we get.
[Program runs. Show program]
The results suggest that in about four trials out of a
hundred, our simulated Larry Bird gets 20 or fewer baskets in 57
shots. That means that even if nothing changes in his shooting,
during one in every 25 series of 57 shots, on average, he would
shoot that poorly or worse. (This does not mean that the chances
are 24 in 25 that such an event did not happen by chance.
Rather, it means that in every hundred sets of 57 shots, we can
expect four to be that poor. Similarly, we can expect some
series to seem terrific when they also are occurring just by
chance without a change in the system.)
It would seem, then, that it would be a a mistake for the
Celtics to tell Bird to do anything different after this cold
streak than ordinarily. Bird should take just as many shots as
usual, in his usual style, just as one continues to use a coin
even after it has come down heads a bunch of times in a row. In
other words, if it ain't broke, don't fix it.
Here we note the importance of the context in which we get
the data. The reason we are not impressed with a 4-in-100
probability, and continue to expect that in his upcoming games
Bird will have shooting success at his long-run average of 48
percent, is that Bird shoots hundreds and hundreds of shots each
year, and sooner or later he will have a set of 57 shots with
very poor results, a set of shots with very good results, and a
variety of other outcomes. But if this were a person for whom
we had no other information - say, a high school basketball
player at the beginning of his first season - then our best guess
would be that in the future he would shoot baskets at the rate of
20 in 57.
Understanding variability of this kind is the key to
Japanese quality control, taught to them by an American
statistician named Edward Deming. And resampling is a remarkably
effective and easy tool to use in studying such quality control
in practical situations.
lesson 9-175 dir statwork August 9, 1992