Introductory Statistics Final Exam Review Problem Answers
Math 203 B, Tom Linton
The final exam is Wednesday May 17 at 1 PM, in Central Hall 317. The exam will be open book, one page (both sides) of notes, your calculator and your reading cards may also be used. Here are the answers to the even numbered review problems.
1.68
  1. 0.052
  2. 0.5466
  3. 279.466
2.98
a) Put right distances in L1, right times in L2, left distances in L3 and left times in L4. I used boxes for right data and crosses for left data. The left image shows both data sets as scatterplots. Because the scales are quite different, the right data and left data are redisplayed alone as well in the middle and right images.
b) From the separate graphs, one sees little in terms of a pattern. The left data displays a slight positive association; the right data does also, but to a smaller degree. The right-handedness is apparent in the image displaying both data sets together. For similar distances (on the x-axis), the boxes all lie below the crosses, so the right times (for similar distances) are shorter.

c) For the right hand data, y = 0.028X + 99.36 and r2 = 0.093. For the left hand data, y = 0.262X + 171.55 and r2 = 0.101. Thus, the linear relationship accounts for roughly 9.3% and 10.1% of the increase in time due to distance for the right and left hand data (which is very little). The left regression does a slightly better job of predicting time from distance. The plots are shown below.

d) Graphs (scatterplots with X = distance and Y = RESID) of the residuals for both right and left hand data are shown below. They show neither trend suggested by the text.
3.68
  1. explanatory variable = antioxidants (vitamins) or no antioxidants; response = colon cancer or no colon cancer.
  2. Randomly assign the subjects into 4 equal sized groups of 216 each. Treatment 1 is beta carotene, treatment 2 is vitamins C and E; treatment 3 is all 3 vitamins and treatment 4 is placebo. After treatment, combine and compare onset of cancer amongst groups.
  3. Using labels 000, 001, to 863, the first 5 group 1 subjects are: 731, 253, 304, 470, 296.
  4. Neither the patients nor those treating the patients knows who is in what group.
  5. The results were likely to occur just by chance, so the treatments had no effect on cancer rates.
  6. Fruits and vegetables have lots of fiber, which is good for the digestive tract. As well, people who eat lots of fruits and vegetables may simply be healthier than average.
3.72 Select randomly 100 or so individuals who like hamburgers. Ask them to taste both burgers (using randomness to decide whether the McDs or Wendy's burger is tasted first) and rank the two on some quatitative scale.

4.62

  1. The numbers add up to 1, all are non-negative and all are less than 1.
  2. 0.43
  3. 0.96
  4. 0.28
  5. 0.72
4.64 Divide the total weights by 12 to get x-bar values, then normalcdf(750/12, 825/12, 65, 5/sqrt(12) ) = 0.9537

5.54

  1. mean = n*p = 15 * 0.25 = 3.75
  2. 1 - binomcdf(15, 0.25, 9) = 0.000795
  3. Using the normal approximation, mean = 1000*.25 = 250, std dev = sqrt(n*p*(1 - p) ) = 13.69. Thus, the answer is normalcdf(275, 9999, 250, 13.69) = 0.034.
6.70 The acceptance interval for this test is all x-bar values from 128 - 1.96*15/sqrt(72) to 128 + 1.96*15/sqrt(72), or the interval (124.54, 131.46).
  1. We want 1 - Prob( x-bar is in acceptance interval) when m = 134, or

  2. 1 - normalcdf(124.54, 131.46, 134, 15/sqrt(72) ) = 0.9246.
  3. Repeat the command above but make the mean 122 instead of 134, so

  4. 1 - normalcdf(124.54, 131.46, 122, 15/sqrt(72) ) = 0.9246, the same power. Yes, the test detects a mean that is 6 away from 128 more than 92% of the time.
  5. The power gets higher as the mean moves farther from 128.
6.74
  1. Using split stems:
  2. 2|
    034
    2|
    3|
    01124
    3|
    6
    4|
    3
    4|
  3. (26.061, 34.739)
  4. A Z-test of m = 25 against m > 25, has a p-value of 0.0074. giving very strong evidence against H0 and indicating that the untrained noses have a higher threshold.
7.60
  1. The data comes from non-independent samples; the same people were weighed before and after the program.
  2. There is extremely little chance that differences in weight loss were simply due to chance variation. The study provides strong evidence that the weight loss program worked.
  3. The t-statistic of 4.68 (and 46 degrees of freedom) is quite a bit larger than the t-statistic with p = 0.005, thus the p-value is quite a bit lower than 0.005.
7.66
  1. A stem and leaf plot, using split stems 89, 90, 91 and 92, shows no severe deviations from normality. The t-procedures should give reliable results.
  2. The data have x-bar = 907.75 and Sx = 8.48. The 95% t-confidence interval is (903.23, 912.27).
  3. No. The confidence interval contains the value 910, so the data have an x-bar value that is in the 95% confidence interval centered at 910 (because 910 is in the 95% confidence interval centered at x-bar and these two intervals have the same width). Thus, the p-value of a two-tailed test will NOT be significant at the 5% level. In fact, the p-value is 0.305.
8.44
  1. A two-tailed one proportion Z-test of H0: p0 = 0.7 has a p-value of 0.155, so there is little or no evidence that the athletes have a different graduation rate then the entire student population.
  2. By setting p1 = male athletes graduation rate and p2 = female athletes graduation rate and running a 2 proportion Z test of p1 = p2 against p1 < p2, we get a p-value of 0.0015, which gives very strong evidence that the male rate is lower than the female rate.
  3. To use 2-proportion Z procedures, all counts should be 5 or more. For female athletes we have 37 graduating and 8 not graduating; for males our counts are 58 and 44. The basketball counts are 4 and 4, which are both too low.