CHAPTER III-4
THE PEDAGOGICAL USES OF PUZZLES
Students mostly come into an introductory statistics course
with fear, and sometimes with loathing. When taught the
resampling way, the instructor's first task is to resassure the
students that they need not fear. Though the course is
difficult because inference is difficult, and though it will
require hard thinking, they will be able to understand all that
is taught, and most of them will finish the course having enjoyed
it and having been glad they took it. The instructor can
honestly say these things because they are demonstrably true, as
Chapter I-1 shows.
Baffling though simple probability puzzles can help students
see that simulation works. Our practice is to begin a course by
telling students that the most important element in the course is
the learn the habit of saying "Try it" when faced with a problem
in probability or statistics, and then to actually do so. We
then continue with the famous two-heads puzzle, in connection
with a one-question test that can be said - only half in jest -
to increase the class's IQ.
ships and Monty Hall.
HOW IQ'S DOUBLE IN TEN MINUTES
On Tuesday, September 12, 1995 the Style section of The
Washington Post presented an IQ test, many of whose questions
were mathematical.
Ever wonder just how smart you are? Gary Gruber...has
constructed this short test - just 12 questions - to challenge
your intelligence and help you determine whether you really have
the smarts. There is no time limit.
The first question was as follows (except that I substitute 20
blue and 20 brown socks for 40 and 40):
Suppose I have 20 blue socks and 20 brown socks in a drawer. If
I reach into the drawer without looking at the socks, what is the
smallest number of socks I must take out to make sure that I have
a pair of socks of the same color?
On September 13 I read that question to an introductory
statistics class at the University of Maryland, and asked the
students to write down and hand in their answers. The
distribution of answers was as follows: 5 "1's"; 7 "2's"; 12
"3's"; 1 "4"; 9 "40s". Even before we know the right answer, the
great dispersion among the answers proves conclusively that many
of the students got the wrong answer.
Without further ado, I a) took out a deck of 80 playing
cards, half red and half black, b) shuffled them, and c) began to
lay them one by one out on the light table so that they showed on
the screen. "Tell me to stop when we get a pair", I said.
I dealt a red, then another red.
"Stop", someone hollered. So I replaced the two cards,
shuffled again, and dealt out a red, a black, then another red.
"Stop", I heard. So I repeated the operation, and again,
ten times.
At this point I again asked the class the answer to the
original question. All except one confused kid said "Three", and
he quickly changed his mind. A miracle! We have moved from a
sum of 12 correct answers out of 34 to a sum of 33 correct
answers out of 34. Speaking analogically, we might say that the
collective IQ of the class has more than doubled in less than ten
minutes.
Please note that the experiment not only led people to reach
the correct answer inductively - which is plenty of benefit by
itself - but it also led most of them to understand why the
correct answer is what it is, a bonus.
Now consider what would have happened if immediately after
reading the question to the class I had given the following
instruction: "Before answering the question, test out any answer
by experimenting a number of times with a set of cards or other
devices that can be likened to a drawer full of socks, or even
with an actual drawer of 80 socks", and made available to them a
variety of materials including playing cards.
Since then this experiment has been done with a slightly
more sophisticated research with other classes. The numbers of
socks are changed to 20 and 20 so that one deck of cards will
suffice. The questions are written and presented on paper or by
overhead projector in order to ensure that no one can claim he
s/he misheard or that the problem was wrongly stated. Index
cards are given out with spaces numbered 1-4 for studentsto write
their answers on.
The instructor opens with "We're now going to double the
smarts of this class", following on the language in the statement
of the socks problem. Then the socks problem is presented, and
the students are instructed to write down their answers. Then
without further discussion the follow problem is presented:
I toss two coins into the air and catch them on my palm. I
look at the two of them and say, "One of the coins shows a head.
What is the probability that the other shows a head, too?"
The students are asked to write down their answers. Then
the class is asked for their answers orally, and when most or all
have indicated answers - perhaps by raising their hands in a big
class - the instructor tells them that most or all are wrong, and
ends with "Now what are you going to do?'
Usually silence follows. So the instructor says: "What
would you do if you knew that in ten minutes someone is going to
come through that door and give a thousand dollars to all who can
tell her or him the right answer. After some banter about
bribing the instructor for the right answer, either someone
spontaneously says "Try it", or the instructor induces that
response. The instructor distributes coins, and tells the
students to write down their new answers on line three. And when
they have finished, s/he polls the students and shows that most
now have reached the correct answer.
Incidentally, Marilyn vos Savant published the same problem
in Parade, as follows:
A shopkeeper says she has two new baby beagles to show you, but
she doesn't know whether they're male, female, or a pair. You
tell her that you want only a male, and she telephones the fellow
who's giving them a bath. "Is at least one a male?" she asks
him. "Yes!" she informs you with a smile. What is the
probability that the other one is [also] a male?
vos Savant gave the answer as one in three, and many PhDs
wrote her to say - with great confidence - that she was all
wrong. It is a crucial part of the lesson we want the students
to learn that with the simulation method, students can obtain
correct answers to problems that baffle highly-trained
professionals when they attempt to address the problems with only
reason and mathematical deduction.
Another way to do this problem with simulation is with a
random number number, this also is shown to the students, as
follows: Consider a two-digit column of random numbers in Table
3, using odd numbers for females and even for males. The first
forty lines are sufficient to suggest the correct probability,
and also to make clear the mechanism: Two-female pairs, a fourth
of the cases, are excluded from the sample. And mixed pairs -
which give a "no" answer - are two-thirds of the remaining pairs,
whereas the only pairs that give "yes" answers - two males - are
only a third of the remaining pairs. So once more simulation
gets it right very quickly and easily, whereas the deductive
method of mathematical logic results in much confusion.
Next the instructor shows the group that the experiment can
also be done with two pairs of playing cards, each pair
containing one red ("heads") and one black ("tails"), choosing at
random one card from each pair.
Then the instructor says: "Now think again about the socks
problem, and write down an answer on line 4 of the index card".
The results of a typical class were as follows: INSERT
RESULTS
If we call zero answers correct "No smarts", and all answers
correct "Total smarts", can we not say that raising the score
from 2 of 15 correct to 10 of fifteen correct - as was the case
in my spring class - more than doubles smarts? Or more soberly,
is it not legitimate to say that one can raise a group's IQ by
giving a specific instruction, or even the more general
instruction to obtain an answer to any mathematical question that
permits it (and many or most questions do) by using the
simulation process of experiment with actual physical objects?
At this point someone says: "But you haven't actually
raised people's IQ," or "You haven't really made them smarter."
Is that so? IQ is defined as the score on an IQ test. All
attempts to find some "real" entity that IQ supposedly represents
have been fruitless. So if one can raise a test score in this
fashion, one can just as legitimately claim to have raised IQ as
one can by special training of young children. But no need to
argue this linguistic point. The key finding is that this
procedure greatly increases people's ability to reach sound
solutions to problems in probability.
If one can raise IQs as markedly as can this device, two
important questions arise:
1. If it can be done this way for these questions, why not
in this or other ways for other questions?
2. If educating people to remember and practice the simple
instruction "Try it" can increase the proportion of correct
answers to this question - and to the entire range of questions
in probability and statistics (as it does; see Simon, Atkinson,
and Shevokas (1976); Simon and Bruce (1995) - why do we not
teach people this method in addition to, if not as a substitute
for, conventional formulaic methods in statistics and
probability?
SOME OTHER CLASSIC PUZZLES
The Problem of Three Chests
Here is another problem that shows the power of simulation:
A Spanish treasure fleet of three ships was sunk at sea off
Mexico. One ship had a trunk of gold forward and another aft,
another ship had a trunk of gold forward and a trunk of silver
aft, while a third ship had a trunk of silver forward and another
trunk of silver aft. Divers just found one of the ships and a
trunk of silver in it. They are now taking bets about whether
the other trunk found on the same ship will contain silver or
gold. What are fair odds?
This is a restatement of a problem that Joseph Bertrand posed
early in the 19th century. In the Goldberg variation:
"Three identical boxes each contain two coins. In one
box both are pennies, in [the second both are nickels,
and in the third there is one penny and one nickel.
A man chooses a box at random and takes out a coin. If the coin
is a penny, what is the probability that the other coin in the
box is also a penny?"
The following simulation arrives at the correct answer:
1. Construct three urns containing the numbers "7,7",
"7,8", and "8,8" respectively.
2. Choose an urn at random, and shuffle the numbers in it.
3. Choose the first element in the chosen urn's vector. If
"8", stop trial and make no further record. If "7", continue.
4. Record the second element in the chosen urn's vector on
the scoreboard.
5. Repeat steps (2 - 5), and calculate the proportion "7's"
on a scoreboard. (The answer should be about 2/3.)
The three-door problem
The great-grandaddy of baffling-though-simple puzzles is the
famous problem of the three doors, long known by statisticians
but recently popularized as the Monty Hall game show problem in
Parade by vos Savant: The player faces three closed containers,
one containing a prize and two empty. After the player chooses,
s/he is shown that one of the other two containers is empty. The
player is now given the option of switching from her/his original
choice to the other closed container. Should s/he do so?
Answer: Switching doubles the chances of winning.
When this problem was published in the Sunday newspapers
across the U.S., the thousands of letters - including a good many
from Ph.D.'s in mathematics - show that logical mathematical
deduction fails badly in this case. Most people - both
laypersons and statisticians - arrive at the wrong answer.
Simulation, however - and hands-on simulation with physical
symbols, rather than computer simulation - is a surefire way of
obtaining and displaying the correct solution. Table 6-1 shows
such a simple simulation with a random-number table. Column 1
represents the box you choose, column 2 where the prize is.
Based on columns 1 and 2, column 3 indicates the box that the
"host" would now open and show to be empty. Lastly, column 4
scores whether the "switch" or "remain" strategy would be
preferable. A count of the number of winning cases for "switch"
and the "remain" gives the result sought.
Table 6-1
Not only is the best choice obvious with this simulation
method, but you are likely to understand quickly why switching is
better. No other mode of explanation or solution brings out this
intuition so well. And it is much the same with other problems
in probability and statistics. Simulation can provide not only
answers but also insight into why the process works as it does.
In contrast, formulas frequently produce obfuscation and
confusion for most non-mathematicians.
The Birthday Problem
We then move from the pure brain-teasers to a famous
examination question used in probability courses: What is the
probability that two or more people among a roomful of (say)
twenty-five people will have the same birthday? To obtain an
answer we need simply examine the first twenty-five numbers from
the random-number table that fall between "001" and "365" (the
number of days in the year), record whether or not there is a
duplication among the twenty-five, and repeat the process often
enough to obtain a reasonably stable probability estimate.
Pose the question to a mathematical friend of yours, then
watch her or him sweat for a while, and afterwards compare your
answer to hers/his. I think you will find the correct answer
very surprising. It is not unheard of for people who know how
this problem works to take advantage of their knowledge by making
and winning big bets on it. (See how a bit of knowledge of
probability can immediately be profitable to you by avoiding such
unfortunate occurrences?)
More specifically, these steps answer the question for the
case of twenty-five people in the room:
Step 1. Let three-digit random numbers "001-365" stand for
the 365 days in the year. (Ignore leap year for simplicity.)
Step 2. Examine for duplication among the first twenty-five
random numbers chosen "001-365". (Triplicates or higher-order
repeats are counted as duplicates here.) If there is one or more
duplicate, record "yes." Otherwise record "no."
Step 3. Repeat perhaps a thousand times, and calculate the
proportion of a duplicate birthday among twenty-five people.
Here is the first experiment from a random-number table,
starting at the top left of the page of numbers: 021, 158, 116,
066, 353, 164, 019, 080, 312, 020, 353...
This leads us into showing how one can handle problems like
the birthday problem with with the computer. A program with the
language RESAMPLING STATS is amazingly simple. With the command
GENERATE, produce 25 numbers between "1" and "365" into a
location we can call A. Then determine whether any two people
have the same birthday with the MULTIPLES command which checks
whether the same number came up more than once, and put the
result in a location we can call B. Next, SCORE this result from
B into a vector we may call Z. REPEAT, say, 1000 times. After
the END of the loop, COUNT in the scoreboard Z the number of
samples out of the 1000 trials that had at least one birthday
shared by two or more people. This result is placed in K.
We then try the program written as follows.
REPEAT 1000 Do 1000 trials (experiments)
GENERATE 25 1,365 A Generate 25 numbers randomly between
1 and 365, put them in A.
MULTIPLES A > 1 B Looking in A, count the number of
multiples and put the result in B. We request multiples
> 1 because we are interested in any multiple, whether
it is a duplicate, triplicate, etc. Had we been
interested only in duplicates, we would have put in
MULTIPLES A = 2 B.
SCORE B Z Score the result of each trial to Z.
END End the loop for the trial, go back and repeat the
trial until all 1000 are complete, then proceed.
COUNT Z > 0 K Determine how many trials had at least one
multiple.
DIVIDE K 1000 KK Convert to a proportion.
PRINT KK Print the result.
Three Daughters Among Four Children
Now we are ready to demonstrate a realistic though simple
problem: What is the probability that exactly three of the four
children in a four-child family will be daughters?
The first step is to state that the approximate probability
that a single birth will produce a daughter is 50-50 (1 in 2).
This estimate is not strictly correct, because there are roughly
106 male children born to each 100 female children. But the
approximation is close enough for most purposes, and the 50-50
split simplifies the job considerably. (Such "false"
approximations are part of the everyday work of the scientist.
The appropriate question is not whether or not a statement is
"only" an approximation, but whether or not it is a good enough
approximation for your purposes.)
The probability that a fair coin will turn up heads is .50
or 50-50, close to the probability of having a daughter.
Therefore, flip a coin in groups of four flips, and count how
often three of the flips produce heads. (You must decide in
advance whether three heads means three girls or three boys.) It
is as simple as that.
In resampling estimation it is of the highest importance to
work in a careful, step-by-step fashion - to write down the steps
in the estimation, and then to do the experiments just as
described in the steps. Here are a set of steps that will lead
to a correct answer about the probability of getting three
daughters among four children:
Step 1. Using coins, let "heads" equal "boy" and "tails"
equal "girl."
Step 2. Throw four coins.
Step 3. Examine whether the four coins fall with exactly
three tails up. If so, write "yes" on a record sheet; otherwise
write "no."
Step 4. Repeat step 2 perhaps two hundred times.
Step 5. Count the proportion "yes." This proportion is an
estimate of the probability of obtaining exactly 3 daughters in 4
children.
The first few experimental trials might appear in the record
sheet as follows:
Number of Tails Yes or No
1 No
0 No
3 Yes
2 No
1 No
2 No
. .
. .
. .
The probability of getting three daughters in four births
could also be found with a deck of cards, a random number table,
a die, or with RESAMPLING STATS. For example, half the cards in
a deck are black, so the probability of getting a black card
("daughter") from a full deck is 1 in 2. Therefore, deal a card,
record "daughter" or "son," replace the card, shuffle, deal
again, and so forth for 200 sets of four cards. Then count the
proportion of groups of four cards in which you got four
daughters.
A RESAMPLING STATS computer solution to the "3Girls" problem
mimics the above steps:
REPEAT 1000 Do 1000 trials
GENERATE 4 1,2 A Generate 4 numbers at random, either 1 or 2.
This is analogous to flipping a coin 4 times to
generate 4 heads or tails. We keep these numbers in A,
letting "1" represent girls.
COUNT A = 1 B Count the number of girls and put the result in B.
SCORE B Z Keep track of each trial result in Z.
END End this trial, repeat the experiment until 1000 trials are
complete, then proceed.
COUNT Z = 3 K Count the number of experiments where we got
exactly 3 girls, and put this result in K.
DIVIDE K 1000 KK Convert to a proportion.
PRINT KK Print the results.
Notice that the procedure outlined in the steps above would
have been different (though almost identical) if we asked about
the probability of three or more daughters rather than exactly
three daughters among four children. For three or more daughters
we would have scored "yes" on our scorekeeping pad for either
three or four heads, rather than for just three heads.
Likewise, in the computer solution we would have used the command
"Count Z >= 3 K."
It is important that, in this case, in contrast to what we
did in Example 6-1 (the introductory poker example), the card is
replaced each time so that each card is dealt from a full deck.
This method is known as sampling with replacement. One samples
with replacement whenever the successive events are independent;
in this case we assume that the chance of having a daughter
remains the same (1 girl in 2 births) no matter what sex the
previous births were [2]. But, if the first card dealt is black
and would not be replaced, the chance of the second card being
black would no longer be 26 in 52 (.50), but rather 25 in 51
(.49), if the first three cards are black and would not be
replaced, the chances of the fourth card's being black would sink
to 23 in 49 (.47).
To push the illustration further, consider what would happen
if we used a deck of only six cards, half (3 of 6) black and half
(3 of 6) red, instead of a deck of 52 cards. If the chosen card
is replaced each time, the 6-card deck produces the same results
as a 52-card deck; in fact, a two-card deck would do as well.
But, if the sampling is done without replacement, it is
impossible to obtain 4 "daughters" with the 6-card deck because
there are only 3 "daughters" in the deck. To repeat, then,
whenever you want to estimate the probability of some series of
events where each event is independent of the other, you must
sample with replacement.
REFERENCES
Simon, Julian L., Atkinson, David T., and Shevokas, Carolyn,
"Probability and Statistics: Experimental Results of a Radically
Different Teaching Method", American Mathematical Monthly, vol.
83, no. 9, Nov. 1976, pp. 733-739.
Simon, Julian L. and Peter C. Bruce, "Evaluations of
Teaching Introductory Statistics via Resampling", xerox, 1995.
The Washington Post, September 12, 1995, "Brain Teaser or
No-Brainer", no author, p. D5
page # teachbk III-4puz May 9, 1996