Math 105, Introduction to Statistics, Central College, Spring 2007
Exam 1 Review Sheet by Tom Linton 

Practice Problems If you can answer the following questions rather quickly, you should be well prepared for the exam.

  1. You decide to collect data related to students' preferences regarding the food service at Central College. Give an example of a categorical variable for this setting, and another example of a quantitative variable for this situation.
  2. The price (in dollars) charged for a certain style of Chanel lipstick at 20 randomly selected stores in the midwest are given in the table below. The data are numbered for your convenience, and listed from smallest to largest. A histogram of this distribution is also shown.
Index
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Price
5.23
5.42
5.93
6.16
6.30
6.55
6.65
6.80
7.00
7.05 7.13 7.21 7.67 7.80 7.81 7.89 8.85 8.97 9.07 9.63

lipstick prices
    1. Use the histogram to describe the shape of this distribution. Do there appear to be any outliers?
    2. Based on the histogram, and your answer to part (a), which numerical summary (mean and standard deviation OR 5-number summary) is best for this distribution?
    3. The mean is x-bar = $7.256 and the standard deviation is s = $1.217. Use the table of data values above to count the number of prices that are more than 1 standard deviation above the mean, and then convert this count into a proportion by dividing it by 20 (the number of individuals in our data set). What proportion of this data set lie more than 1 standard deviation above the mean?
    4. If the data above were distributed in a perfectly Normal manner (bell shaped), what proportion of the data would lie more than one standard deviation above the mean according to the 68-95-99.7 rule? How close is your actual proportion from part (c) to this theoretical proportion?
    5. Calculate the 5 number summary for the lipstick price data above.
    6. How high, or low would a price have to be before the 1.5xIQR rule labeled it as a suspected outlier?
  1. Assume that the length (in inches) of trout in a given river are Normally distributed. The Big Hole river has trout which average 11.3 inches in length and have a standard deviation of 2.1 inches. Yellowstone river trout have lengths with μ =14.2 inches and σ = 4.6 inches.
    1. What proportion of trout in the Big Hole river are between 10 and 12 inches?
    2. What percentage of Yellowstone river trout are at least 22 inches long? Answer this part using both normalcdf() and Table A (you should get roughly the same answer).
    3. Which of the two rivers has a smaller proportion of trout that are less than or equal to 8 inches long?
    4. How long does a Yellowstone trout need to be if it is in the top (longest) 3% of all Yellowstone trout?
    5. A trout has a standardized length of z = 1.42. How long is this trout if it came from the Big Hole river? How long would it be if it came from the Yellowstone river?
    6. Bill fishes the Big Hole river and catches a 15.6 inch trout. Babe fishes in the Yellowstone and catches a 23.5 inch trout. Whose trout, Bill's or Babe's, is more exceptional? How do you know?



  2. Yearly averages for the Dow Jones Industrial Average (an index of stocks) are given below. Make a time-plot of the data and describe any trends or deviations from the trends that appear.
Year
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
Index
2062
2510
2670
2933
3282
3565
3735
4494
5740
7448
8631
10482
10731
10209
9214

  1. To help analyze the long distance calling habits of students, 15 Central College students had their cell phone usage observed for one week. The number of long distance calls made and the average length of these calls are given below.
Number of calls 6 5 20 7 19 11 8 5 6 6 9 5 6 21 6
Avg Length of calls (min) 6.1 8.3 4.2 6.1 1.6 4.4 5.8 5.9 4.4 8.2 1.2 7.7 3.9 0.6 2.0
    1. Make a (well-labeled) histogram of the number of calls made data. Describe the shape of this data set. Are there any outliers?
    2. Make a stem plot of the average length of calls data. Describe the shape of this data set. Are there any outliers?
    3. One of the data sets above is best described by the five number summary, the other is best described by its mean and standard deviation. Decide which is which (explain your choice), and calculate the appropriate numerical summary for each data set.
    4. From your five number summary, make a boxplot of that data set.
    5. Make a scatterplot of X = number of calls, and Y = average length of a call. What is the correlation r?
  1. The 5-number summary for a data set is min = 4, Q1 = 30, M = 35, Q3 = 55, max = 110, as well, the mean is 46.
    1. What can you say about the shape of this data set?
    2. Does the 1.5xIQR rule suggest that this data set has outliers? If so, which side (right, left, or both) has suspected outliers?
  2. The density curve shown below is for a normal distribution that has a mean and standard deviation that are both integers (whole numbers).
    1. What would you guess is the value of the mean for this distribution? What is the standard deviation?
    2. Shade in the area that corresponds to the proportion of data from this distribution that are less than 4, and give a decent estimate of this area.
  3. For each variable pair below, guess whether the association is positive, negative, or near zero and explain your choice.
    1. X = number of years playing golf, Y = score on an 18 hole golf course (in golf, low scores are better).
    2. X = number of years you've been bowling, Y = length (in inches) of your left pinky finger.
    3. X = weight of a trout, Y = age of a trout.
    4. X = hours spent in training for a job, Y = errors made in first week on the job.
  4. In relation to the board game Monopoly, you are interested in predicting the rent (R) of a property with one house, from that property's purchase price (P). For example, it will cost you P = $280 to buy the property Marvin Gardens, and once you have all three yellow properties and one house on Marvin Gardens, you will collect a rent of R = $120 each time an opponent lands there. You decide to make a scatterplot for this situation.
    1. Which variable (P or R) is the explanatory variable and which variable (P or R) is the response variable?
    2. It turns out that for the situation above, the correlation for this scatterplot is r = .994. Describe in words (in terms of association, shape, and strength) or with a possible plot (you only need to plot a handful of data points, say 5 or 6, and don't worry about specific values for X or Y, just display the proper form and strength suggested by the correlation above), what this means.