The goal of this activity is to explain the differences between matched-pairs one-sample tests and two-sample tests.
First and foremost, one should understand that using two-sample procedures
versus one-sample procedures (where one typically must subtract two values
and test the collection of differences) can give very different results.
Consider the problem of trying to determine if a certain Gateway computer
is faster than a certain Dell computer. To analyze this question, the
two computers are put through a collection of benchmark tests. If one computer
completes these tests faster than the other, then one would consider that
computer faster. Here are the times (in seconds) for six standard benchmark
tests.
| Benchmark | A | B | C | D | E | F |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
You should be somewhat surprised by the results. The one-sample test
(problem #2) is almost significant at the 5% level. How can this be? The
first test says there is no difference in the times and the second says
that there is significant evidence of a difference.
The key here are the assumptions of the two tests. A two-sample test
requires that the samples be independent, or roughly that they have
no influence on one another. It is important to consider what the things
in your sample are. We have times on benchmark tests for the things in
our samples, presumably chosen from the population of all benchmark tests.
Since the tests used for both computers are exactly the same, the
samples are NOT independent.
Here is a way to sort out the meaning of these two results:
The two sample test can be viewed as reaching into two populations and
selecting random samples from both. When the two values of
are close together, the two population means are considered equal. Our
two-sample test had sample means of 1.555 and 1.578. If they came from
independent random samples from two populations, it seems quite likely
that the two population means are equal.
Our samples were not independent, but came from head to head competition between the 2 computers on the exact same benchmark tests. The Dell beat the Gateway 5 out of 6 times in head to head competition, and it seems likely that the Dell is therefore faster. Not only that, but the Dell beat the Gateway by about the same amount each time, so the population of differences appears to have a small positive mean, therefore yielding a significantly small p-value for the one-sample test.
In this case, the one-sample test gives the correct result, the Dell computer is most likely faster than the Gateway.
The key to determining which test to use is frequently determined by
carefully analyzing what the population is (times on benchmark tests in
our case) and considering what things make up the samples (6 times
on the same six benchmarks here). If the things in the two samples are
related to one another, they are usually NOT independent and a two-sample
test is therefore NOT appropriate.
Think carefully about each of the problems below and decide which test
is most appropriate and then run that test. Give a brief explanation
of why you selected the test you chose in each case.
| car 1 | S = 10.2 feet | |
| car 2 | S = 9.3 feet | |
| car1 - car2 | S = 9.76 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Golfer | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| round 1 | 89 | 90 | 87 | 95 | 86 | 81 | 102 | 105 | 83 | 88 |
| round 2 | 94 | 85 | 89 | 89 | 81 | 76 | 107 | 89 | 87 | 91 |