Math 105 A, Introduction to Statistics, Central College,
Fall 2006
Exam 2 Review Sheet by Tom Linton

Practice Problems If you can answer the following questions rather quickly, you should be well prepared for the exam. These problems cover most of the topics which will appear on exam 2 (chapters 5,8,9,10,11, and 12 in the Moore text).

  1. As people age they begin to experience hearing loss. A study was done to determine the "comfort level" of sound for people of different ages (i.e. the level of noise, measured in decibels, that people could listen to comfortably). The data are given in the table below.
Age (years) 15 25 35 45 55 65 75 85
Sound level (decibels)
56 57 64 64 68 74 78 85
    1. If we want to predict a person's age from the level of sound for their "comfort level", which variable would be X and which would be Y?
    2. If we want to predict a person's sound level from their age, which variable would be X and which would be Y?
    3. Make a scatter plot that would be appropriate for predicting a person's sound level, from their age. Does the scatter plot suggest that a linear model is approriate?
    4. Find the equation of the least-squares regression line for the scatter plot in part (c). Add the regression line to your scatterplot.
    5. What is the slope of your regression equaltion? In terms of age and sound level, what does the slope of this regression equation tell you?
    6. What is the y-intercept for this regression equation? For this data set, is the y-intercept meaningful? Why or why not?
    7. Find one of the data pairs above that has a positive residual.
    8. Estimate the sound level for someone who is 60 years old.
    9. Why would it be inappropriate to use the regression equation to predict the sound level of an infant?
  1. Decide if each of the quantities below is most likely a parameter or a statistic.
    1. The mean time spent sleeping last night by 4 students  in this Intro Stats class.
    2. The mean number of items purchased by an SRS of 8 customers at a local HyVee store.
    3. The median age of all pennies currently in circulation in the United States.
    4. The standard deviation of the sample means, from all samples of size 8, from a given population.
  2. People who eat lots of fruits and vegetables have lower rates of colon cancer than those who eat little of these foods. Fruits and vegetables are rich in "antioxidants" such as vitamins A, C, and E.  Will taking antioxidants help prevent colon cancer?  A clinical trial studied this question with 864 people who were at risk of colon cancer.  The subjects were divided into four groups: daily beta carotene, daily vitamins C and E, all three vitamins, and daily placebo.  After four years, the researchers were surprised to find no statistically significant difference in colon cancer among the groups.
    1. Explain what the last sentence above means.
    2. Explain why this clinical trial is an experiment.
    3. What are the explanatory and response variables?
    4. Explain how you would use your calculator, or table B to assign subjects to the treatment groups.
    5. The study was double-blind.  What does this mean?
    6. Suggest some lurking variables that could explain why people who eat lots of fruits and vegetables have lower rates of colon cancer.
  3. Repetition, or using large enough groups for your treatments is one of three key ingredients to a well designed experiment. What are the other two key ingredients?

  4. Give an example where using a stratified random sample is appropriate.

  5. Name two types of sampling methods that can give unreliable results.

  6. In each of their games in the 1999 Major League Baseball season, the Minnesota Twins committed X = 0,1,2,3, or 4 errors. The distribution of X = number of errors per game is skewed to the right, with X = 0 the most common value, X = 1 the second most common value, and so on down to X = 4 the least common value. Their average for the season was 0.84 errors per game with a standard deviation of 0.97 errors per game. Suppose we decide to select an SRS of n games (for the Twins in the 1999 season) and calculate the sample mean x-bar for the number of errors in those games. Let X-bar denote the collection of all possible x-bar values (that is, the sampling distribution X-bar).
    1. If n = 9, what is the mean of  x-bar? What is the standard deviation of x-bar?
    2. How large would n have to be, before it was safe to assume that the distribution of X-bar values was approximately Normal?
    3. If we select an SRS of size 30, what is the approximate probability that x-bar < 0.8?
    4. If we consider the games played before the all-star break to be an SRS (there were 80 of these games), how likely is it that the Twins committed a total of 56 or fewer errors in these games?
  7. The number of copies of the magazine Cosmopolitan that are sold daily at a convenience store is a random variable X which takes on the values 0,1,2,3,4 and 5. The distribution of X is mostly given in the table below.

  8. X
    0
    1
    2
    3
    4
    5
    P(X)
    0.10
    0.12
    0.25
    0.30
    0.20
    0.03
    1. What is the probability that X = 1?
    2. What is the probability that X is greater than or equal to 4?
    3. What is the probability that the store sells one or more copies of Cosmopolitan magazine on a randomly chosen day?
    4. What does it mean to say that daily sales of Cosmopolitan magazine at this store are independent?
    5. Assuming daily sales are independent at this store, what is the probability that this store sells 1 or more copies of Cosmopolitan magazine for three days in a row?
    6. For total sales in a 2-day period, there are four ways that this store can sell exactly 3 copies of the magazine. One of them is to sell 0 copies day1 and 3 copies on day 2. Find the other 3 ways to sell a total of 3 copies over a 2-day period, and calculate the probability that this store sells a total of exactly 3 copies over a 2-day period.
  9. You are the marketing director for a mail order plant and seed company that has produced two catalogs, one for spring orders and one for fall orders. For each catalog sent to a potential customer,  the customer's entry in a data file is Y if they ordered something, and N if they did not (Y = yes, N = no). After mailing the spring and fall catalogs to a large collection of potential customers, you determine the probabilities of the buying patterns to be:

    Outcome (spring, fall):
    YY
    YN
    NY
    NN
    Probability:
    0.30
    0.10
    0.05
    0.55


    1. Let S denote buying from the spring catalog, and F denote buying from the fall catalog. Calculate P(S) and P(F). You might consider making a Venn diagram to help answer these questions.
    2. Explain what the event "S and F" represents, and calculate P(S and F).
    3. In words, what does P(F | S) represent? What is the value of P(F | S)?
    4. Are F and S independent? Explain.
  10. Toot-Toots all you care to eat restaurant charges $8.95 per customer to eat at the restaurant. They find that their expense per customer (including the amount of food eaten and their expenses for labor), has a distribution that is noticably skewed to the right with a mean of $8.20 and a standard deviation of $3.
    1. Explain what the law of large numbers says about Toot-Toots customers and profits.
    2. If a couple (2 people entering the restaurant together) can be viewed as an SRS of size 2 from Toot-Toot's customer base, what are the mean and standard deviation of the sampling distribution of a couple's mean expense (that is, the average expense per customer, to Toot-Toot's, based on a sample of size 2)? Would it be safe to assume that the sampling distribution for a couple's mean expense has a Normal distribution? Explain.
    3. Assume that on a given day, 100 customers eat at Toot-Toots. If we view these 100 customers as an SRS from the customer base, what is the probability that Toot-Toots earns a profit on this day, i.e. what is P(x-bar ≤ 8.95)? What is the probability that Toot-Toots averages at least $0.50 profit per customer on this day, i.e. what is P(x-bar ≤ 8.45)?
  11. It is estimated that 50% of all computer chips manufactured are defective. Fortunately, inspections and other forms of quality control, guarantee that only 5% of all legally marketed computer chips are defective. Unfortunately, some chips are stolen immediately after being produced (before inspections and other forms of quality control have been used). It is estimated that 1% of all computer chips on the market are stolen. Make a tree diagram to help analyze this situation. Your first branch should consider whether a chip is stolen or legally marketed. The second branch should correspond to whether the chip is defective or good. Find the probability that a randomly purchased chip is stolen, given that it is defective.