NAMES:                                                                                 :
Two Sample Versus One Sample Tests
by Tom Linton
Work in groups of 1, 2 or 3 and turn in one paper per group.

The goal of this activity is to explain the differences between matched-pairs one-sample tests and two-sample tests.

First and foremost, one should understand that using two-sample procedures versus one-sample procedures (where one typically must subtract two values and test the collection of differences) can give very different results. Consider the problem of trying to determine if a certain Gateway computer is faster than a certain Dell computer. To analyze this question, the two computers are put through a collection of benchmark tests. If one computer completes these tests faster than the other, then one would consider that computer faster. Here are the times (in seconds) for six standard benchmark tests.

Benchmark A B C D E F
Dell
1.12 1.73 1.04 1.87 1.47 2.10
Gateway
1.15 1.75 1.10  1.86 1.46 2.15
  1. Enter the Dell times in the list L1 and the Gateway times in the list L2. Run a 2-sample T-Test to decide if m1 < m2. Do NOT pool the data. Report the p-value of this test, the t-value, the two values of  and the degrees of freedom.

  2.  

     
     
     
     
     
     
     
     
     
     
     

  3. The 2-sample test result states clearly that differences in times similar to those in the table occur frequently just by chance. That is, there is little or no evidence that the Dell computer times are faster than the Gateway computer times (based on the 2-sample test). Notice that if the Dell times are faster than the Gateway times, then the differences "Gateway - Dell" should be positive. On your home screen enter the command L2 - L1 [STO>] L3, to store these differences in the list L3. Run a T-Test on L3 to test if m > 0 (the same sort of test we ran above). Report the p-value, t-statistic, and value of . Note that n = 6 means we have 5 degrees of freedom for this test.

  4.  

     
     
     
     
     
     
     
     
     

    You should be somewhat surprised by the results. The one-sample test is almost significant at the 5% level. How can this be? One test says there is no difference in the times and the other says that there is significant evidence of a difference. The key here are the assumptions of the two tests. A two-sample test requires that the samples be independent, or roughly that they have no influence on one another. It is important to consider what the things in your sample are. We have times on benchmark tests for the things in our samples, presumably chosen from the population of all benchmark tests. Since the tests used for both computers are exactly the same, the samples are NOT independent. Here is a way to sort out the meaning of these two results. The two sample test can be viewed as reaching into two populations and selecting random samples from both. When the two values of  are close together, the two population means are considered equal. Our two-sample test had sample means of 1.555 and 1.578. If they came from independent random samples from two populations, it seems quite likely that the two population means are equal. Our samples were not independent, but came from head to head competition between the 2 computers on the exact same benchmark tests. The Dell beat the Gateway 5 out of 6 times in head to head competition, and it seems likely that the Dell is therefore faster. Not only that, but the Dell beat the Gateway by about the same amount each time, so the population of differences appears to have a small positive mean, therefore yielding a significantly small p-value for the one-sample test. In this case, the one-sample test gives the correct result, the Dell computer is most likely faster than the Gateway. The key to determining which test to use is frequently determined by carefully analyzing what the population is (times on benchmark tests in our case) and considering what things make up the samples (6 times on the same six benchmarks here). If the things in the two samples are related to one another, they are usually NOT independent and a two-sample test is therefore NOT appropriate. Think carefully about each of the problems below and decide which test is most appropriate and then run that test. Give a brief explanation of why you selected the test you chose in each case.
     

  5. The braking ability was compared for two types of 1990 automobiles. Random samples of 64 automobiles of each type were tested. The recorded measurement, X, was the distance (in feet) required to stop when the brakes were applied at 40 miles per hour. The sample means and sample standard deviations for the two types of cars and the differences are given below. You are to test the assumption that car 1 has a longer braking distance than car 2.
  6. car 1 = 112 feet S = 10.2 feet
    car 2 = 109 feet S = 9.3 feet
    car1 - car2 = 3 feet S = 9.76

     
     
     
     
     
     
     
     
     
     
     
  7. Marine biologists want to determine whether the length (in mm) of Dover Sole (a fish) differ by gender. It is believed that the male fish are larger. The biologists net, measure and release 12 random Sole of each gender. The lengths are given below.
male 364 381 390 398 411 420 439 366 381 392 403 420
female 292 349 361 375 388 412 316 350 362 376 389 327

 
 
 
 
 
 
 
 
  1. A golf coach believes that her golfers lower (improve) their scores in the second round of tournaments because they are more relaxed in the second round and have seen the course the day before. Here are the scores of the 10 team members in the two rounds of a recent tournament.

 
Golfer 1 2 3 4 5 6 7 8 9 10
round 1 89 90 87 95 86 81 102 105 83 88
round 2 94 85 89 89 81 76 107 89 87 91