Math 203, Introduction to Statistics,
Central College, Fall 1999 Exam 1 Review Sheet.
Tom Linton, http://www.central.edu/homepages/lintont/

Exam Particulars

The exam covers chapters 1 and 2 (section 2.5 was skipped and will NOT be on the exam) of the text (The Basic Practice of Statistics, by Moore). The exam will be open book (the text plus one page, 8.5 by 11, both sides, of notes) and calculators will be allowed. In fact, you will need a calculator which can calculate such things as mean, median, correlation, residuals and least squares linear regression equations. The focus of the exam will be on the interpretation or meaning of these statistical quantities (as opposed to the calculation of such quantities), but a fair bit of calculation will be required (with the aid of calculators). Questions on the exam will be similar (at least in the mind of the professor) to the homework assignments, however, some of the exam questions will require knowledge covered in a variety of sections of the text. As such, this exam is likely to be the most difficult aspect of this course, so far.

A Piece of Advise

The fact that this exam is open book should NOT lower your desire to study for it. If you need to search through the text to help answer all of the questions, you will likely NOT complete the exam in the 50 minute period. You should strive to know enough of the material so that you really don't need your text, but can rely on it being there just in case. Rather than devoting time to memorizing formulas, write them on your one page of notes, or know where the formulas are located in your text, and attempt to understand what these formulas mean or tell you.

Key Terms, Phrases, Results etc.

At this moment, the following terms, results, phrases etc. are, in my opinion,  the most important of those covered in the first 2 chapters. You should understand them for the exam.
categorical versus quantitative variables, histograms, stemplots, time plots and trends, symmetric versus skewed distributions, outliers and influential datapoints, center and spread, mean, median and quartiles, boxplots, standard deviation, normal distribution, standard normal distribution, explanatory versus response variables, linear, association, correlation, least squares regression, residuals, predicted versus actual values, slope and intercept of a line, causation, lurking variables.

Practice Problems

If you can answer the following questions rather quickly, you should be well prepared for the exam. These problems cover most of the topics which will appear on exam 1.
  1. You are designing a study to investigate relationships between persons (their lifestyles, habits and physical characteristics) and problems with vision. A key seems to be "exposure to sunshine".
    1. Give an example of both a categorical and quantitative variable which you would analyze for each individual (patient) you study.
    2. Other studies suggest that wearing a hat (often) lowers the risk of a common eye problem (cataracts, which is roughly spots of opaqueness, or non-see-throughness in the eyes). You decide to ask patients in your study what percentage of the time they wore a hat when they were outside in sunshine, and measure the percentage of their eyes which suffer from cataracts.
      1. Which variable (time with hat on, or percentage of cataracts in the eye) is the response variable?
      2. Do the other studies suggest a positive or negative association between the explanatory and response variable?
      3. Upon analysis of the data on how often patients wore a hat, when outside in sunshine, you see a distribution which is quite skewed to the right. Draw a possible density function of this distribution and indicate the mean and median of your data-plot. In particular, which is larger, the mean or the median?
  2. Bowlers in the Thursday night Pella league have scores which are normally distributed with a mean of 150 and a standard deviation of 17. The Saturday night league (also normally distributed) averages 132 with a standard deviation of 28. Sam bowls both Thursdays and Saturdays. One week Sam rolled a 180 on Thursday and a 173 on Saturday.
    1. On which night was Sam's score more exceptional (better in terms of the percentage of same-night bowlers which Sam out bowled)?
    2. What percentage of bowlers would have outdone Sam's Thursday night score, if the league average is normally distributed with a mean of 160 and a standard deviation of 15?
    3. If Sam wants to be better than 93% of all Thursday night bowlers, what bowling score would Sam need to roll?
  3. The age in years of each President (on the day they first took office) is shown in the time plot below. The Presidents appear on the X-axis in the order in which they assumed office (Washington =1, Clinton = 42, etc.).
Excel Chart


    1. Consider the first 8 Presidents (numbers 1 to 8), and the second group of 8 Presidents (9 to 16). Which group appears to have more spread in the variable A = age when assuming office?
    2. Calculate the mean age and standard deviation of age (when assuming office) for these 2 groups of Presidents. You should calculate two means and two standard deviations (one for each group). Are the means about the same? What about the standard deviations?
    3. Would you consider the 9th President (whose age was 68 when he assumed office) to be an outlier in the second group? Explain. Would age = 68 be an outlier in the first group? Explain.
    4. Make boxplots of both groups of 8 Presidents. Do the boxplots confirm your answer to part a? How so?
    5. Make a stem and leaf plot of the ages of the last 20 Presidents (numbers 23 to 42). Use split stems. Is the distribution of ages fairly symmetric? Skewed one way or the other?
    6. Based on the stem and leaf plot, would you expect the mean or median age of the last 20 Presidents to be larger? Why? Calculate both the mean and the median of these 20 Presidents’ ages.
    7. Give the 5 number summary of the ages of the last 20 Presidents.
  1. Problem number 1.68 on pages 83 and 84.
  2. Packaged foods sold at supermarkets are not always the weight indicated on the package. Variability always crops up in the manufacturing and packaging process. Suppose that the weight of "12 oz" bags of Lays potato chips are distributed in an approximately normal distribution with a mean weight of 12 ounces and a standard deviation of 0.4 ounces. If a 12 oz bag of Lays chips is selected at random, what are the chances that
    1. It weighs somewhere between 11.6 and 12.4 ounces?
    2. It weighs less than 11.5 ounces?
    3. It weighs more than 12.6 ounces?
  3. Data on the height in inches and shoe size of the members of the mathematics and computer science department at Central are given in the table below.
Height (inches)
Shoe Size
66 7.5
68 9
69 9
69 10.5
71 10
73 11
74 11.5
    1. Calculate the mean height and mean shoe size.
    2. Make a scatterplot using height as X and shoe size as Y.
    3. Find the equation of the least-squares regression line for the scatterplot above and give the correlation, r for this plot.
    4. Now make a scatterplot using X = shoe size and Y = height. Find the least squares regression equation for this plot and the correlation, r.
    5. From your regression line plots, estimate (two values, one for each regression) the shoe size for someone whose height is 70 inches. Repeat for a shoe size of 5.
    6. Which of the estimates above do you have more confidence in? Why?
    7. Verify that both regression lines contain the point whose coordinates are the mean height and mean shoe size. Note: for the first regression the point is (mean ht, mean size), for the second regression the point is (mean size, mean ht.).
    8. For the first regression, which data point has the biggest residual (largest positive or most negative, so which residual has the largest absolute value).
    9. If you delete the data point (ht = 66, shoe size = 7.5) from the first regression, what is the new least squares regression equation, and the new correlation?
    10. Would you consider the point in part i to be influential? Why or why not?
  1. Number 2.86 on page 170.