Math 203, Introduction to Statistics,
Central College, Spring 2001 Exam 1 Review Sheet.
by Tom Linton, and Wendy Weber

Key Terms, Phrases, Results etc.

At this moment, the following terms, results, phrases etc. are, in my opinion,  the most important of those covered in the first 2 chapters. You should understand them for the exam.
categorical versus quantitative variables, histograms, stem plots, split stems, time plots and trends, symmetric versus skewed distributions, outliers and influential data points, center and spread, mean, median and quartiles, boxplots, standard deviation, normal distribution, standard normal distribution, density curve, explanatory versus response variables, linear, association, correlation, least squares regression, residuals, slope and intercept of a line, causation, lurking variables.

Practice Problems

If you can answer the following questions rather quickly, you should be well prepared for the exam. These problems cover most of the topics which will appear on exam 1.
  1. Data on the height in inches and shoe size of the members of the mathematics and computer science department at Central College are given in the table below.
Height (inches) 66 58 69 69 71 73 74
Shoe Size 7.5 9 9 10.5 10 11 11.5
    1. Calculate the median shoe size and also find the first quartile, Q1, for the height data.
    2. Calculate the mean height and mean shoe size.
    3. Make a scatter plot (call it plot 1) using height as X and shoe size as Y. Does the association seem to be positive, negative, or neither?
    4. Find the equation (call it Y1) of the least-squares regression line for the scatter plot above and give the correlation, r, for this plot.
    5. What percentage of the variation in shoe size is due to the linear relationship between height and shoe size?
    6. Now make a scatter plot (call it plot 2) using X = shoe size and Y = height. Find the least squares regression equation (call it Y2) for this plot and the correlation, r.
    7. Using Y1 and plot 1, estimate the shoe size for someone whose height is 70 inches and the height of someone with a shoe size of 5.
    8. Using Y2 and plot 2, estimate the shoe size for someone whose height is 70 inches and the height of someone with a shoe size of 5.
    9. Which of the four estimates above do you have more confidence in? Why?
    10. Verify that both regression lines contain the point whose coordinates are the mean height and mean shoe size. Note: for the first regression the point is (mean ht, mean size), for the second regression the point is (mean size, mean ht.).
    11. For the first regression, which data point has the biggest residual (largest in absolute value).
    12. If you delete the data point (ht = 66, shoe size = 7.5) from the first regression, what is the new least squares regression equation, and the new correlation?
    13. Would you consider the point (66, 7.5) to be influential? Why or why not?

    14.  
  1. Each of the plots below is drawn on the same scale (or window).
 
    1. Describe the direction of the association (positive, negative, neither) for each plot above, and its strength.
    2. Estimate the correlation (give a number) for each of the plots above.
    3. Draw in what you think is a good approximation to the least squares regression line for plot 2.
    4. On plot 2, does the fourth data point from the left have a positive or negative residual?
  1. Assume that the length (in inches) of trout in a given river are normally distributed. The Big Hole river has trout which average 11.3 inches in length and have a standard deviation of 2.1 inches. Yellowstone river trout have lengths with m =14.2 inches and s = 4.6 inches.
    1. What proportion of trout caught in the Big Hole river are between 10 and 12 inches?
    2. What percentage of Yellowstone river trout are at least 22 inches long?
    3. Which of the two rivers has a smaller proportion of trout that are 8 inches long or less?
    4. How long does a Yellowstone river trout need to be if it is in the top (i.e. longest) 3% of all Yellowstone river trout?
    5. Bill fishes the Big Hole river and catches a 15.6 inch trout. Babe fishes in the Yellowstone and catches a 23.5 inch trout. Whose trout, Bill's or Babe's, is more exceptional?
  2. The game of golf is one where a low score is better than a high score, so as you get better, your score decreases. A golf instructor believes that more experience playing the game improves your score. To investigate this claim, she collects data from 10 of her students, namely the number of years each student has been playing golf, and the average score of each student.
    1. Which variable (years playing golf or average score) is the response variable and which is the explanatory variable?
    2. According to this golf instructor, would you expect a positive or negative association between these two variables? Explain.
    3. If the least squares regression line equation for the instructor's 10 data points was Y = 130 - 12.8*X, what is the meaning (in common-everyday language) of the slope -12.8?
    4. Using the regression equation from part (c), what is the predicted score for a golfer with 11 years of experience? Does this make sense? Explain.
  3. To help analyze the long distance calling habits of students, 15 phone extensions in the dorms at Central College were surveyed for one week. The number of long distance calls made and the average length of these calls are given below.
 
Number of calls 6 5 20 7 19 11 8 5 6 6 9 5 6 21 6
Length of calls (min) 6.1 8.3 4.2 6.1 1.6 4.4 5.8 5.9 4.4 8.2 1.2 7.7 3.9 0.6 2.0
    1. In addition to the variables above, the school also collected information on the number of students in each room, the long distance company (MCI, Sprint, etc.) chosen by each room's residents and the gender of the students in each room. Which of these variables (including the two listed in the table) are quantitative? Which are categorical?
    2. Make a (well-labeled) histogram of the number of calls made data.
    3. Make a stem and leaf plot of the average length of calls data.
    4. One of the data sets above is best described by the five number summary, the other is best described by its mean and standard deviation. Decide which is which (explain your choice) and calculate the five number summary for that data set and the mean and standard deviation for the other.
    5. From your five number summary, make a boxplot of that data set.
    6. For the mean and standard deviation data set, how many of the observations fall within one standard deviation of the mean? How many should fall within one standard deviation according to the empirical guidelines?
    7. If some researcher believes that making more long distance calls lowers the average length of calls,
      1. Which variable (number of calls, or length of calls) is the explanatory variable and which is the response variable?
      2. If the researcher is correct, should a scatter plot show a negative or positive association?
  1. The density curve shown below is for a normal distribution that has a mean and standard deviation that are both integers (whole numbers).
    1. What would you guess is the value of the mean for this distribution?
    2. Guess at the value of the standard deviation s.
    3. Shade in the area that corresponds to the proportion of data from this distribution that are less than 4.