The goal of this activity is to explain the differences between matched-pairs one-sample tests and two-sample tests.
First and foremost, one should understand that using two-sample procedures versus one-sample procedures (where one typically must subtract two values and test the collection of differences) can give very different results. Consider the problem of trying to determine if a certain Gateway computer is faster than a certain Dell computer. To analyze this question, the two computers are put through a collection of benchmark tests. If one computer completes these tests faster than the other, then one would consider that computer faster. Here are the times (in seconds) for six standard benchmark tests.
| Benchmark | A | B | C | D | E | F |
|
|
1.12 | 1.73 | 1.04 | 1.87 | 1.47 | 2.10 |
|
|
1.15 | 1.75 | 1.10 | 1.86 | 1.46 | 2.15 |
You should be somewhat surprised by the results. The one-sample test
is almost significant at the 5% level. How can this be? One test says there
is no difference in the times and the other says that there is significant
evidence of a difference. The key here are the assumptions of the two tests.
A two-sample test requires that the samples be independent, or roughly
that they have no influence on one another. It is important to consider
what the things in your sample are. We have times on benchmark tests
for the things in our samples, presumably chosen from the population of
all benchmark tests. Since the tests used for both computers are exactly
the same, the samples are NOT independent. Here is a way to sort out
the meaning of these two results. The two sample test can be viewed as
reaching into two populations and selecting random samples from both. When
the two values of
are close together, the two population means are considered equal. Our
two-sample test had sample means of 1.555 and 1.578. If they came from
independent random samples from two populations, it seems quite likely
that the two population means are equal. Our samples were not independent,
but came from head to head competition between the 2 computers on the exact
same benchmark tests. The Dell beat the Gateway 5 out of 6 times in head
to head competition, and it seems likely that the Dell is therefore faster.
Not only that, but the Dell beat the Gateway by about the same amount each
time, so the population of differences appears to have a small positive
mean, therefore yielding a significantly small p-value for the one-sample
test. In this case, the one-sample test gives the correct result, the Dell
computer is most likely faster than the Gateway. The key to determining
which test to use is frequently determined by carefully analyzing what
the population is (times on benchmark tests in our case) and considering
what things make up the samples (6 times on the same six benchmarks
here). If the things in the two samples are related to one another, they
are usually NOT independent and a two-sample test is therefore NOT appropriate.
Think carefully about each of the problems below and decide which test
is most appropriate and then run that test. Give a brief explanation of
why you selected the test you chose in each case.
| car 1 | S = 10.2 feet | |
| car 2 | S = 9.3 feet | |
| car1 - car2 | S = 9.76 |
| male | 364 | 381 | 390 | 398 | 411 | 420 | 439 | 366 | 381 | 392 | 403 | 420 |
| female | 292 | 349 | 361 | 375 | 388 | 412 | 316 | 350 | 362 | 376 | 389 | 327 |
| Golfer | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| round 1 | 89 | 90 | 87 | 95 | 86 | 81 | 102 | 105 | 83 | 88 |
| round 2 | 94 | 85 | 89 | 89 | 81 | 76 | 107 | 89 | 87 | 91 |