Math 105
A, Introduction to Statistics,
Central College, Fall 2006
Exam 1 Review Sheet by Tom
Linton
Practice Problems If you can answer the following questions
rather
quickly, you should be well prepared for the exam.
- You decide to collect data related to students' preferences
regarding
the
food service at Central College. Give an example of a categorical
variable
for this setting, and another example of a quantitative variable for
this
situation.
- The price (in dollars) charged for a certain style of Chanel
lipstick at 20 randomly selected stores in the midwest are given in the
table below. The data are numbered for your convenience, and listed
from smallest to largest. A histogram of this distribution is also
shown.
Index
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
|
16
|
17
|
18
|
19
|
20
|
Price
|
5.23
|
5.42
|
5.93
|
6.16
|
6.30
|
6.55
|
6.65
|
6.80
|
7.00
|
7.05 |
7.13 |
7.21 |
7.67 |
7.80 |
7.81 |
7.89 |
8.85 |
8.97 |
9.07 |
9.63 |
- Use the histogram to describe the shape of this distribution.
Do there appear to be any outliers?
- Based on the histogram, and your answer to part (a), which
numerical summary (mean and standard deviation OR 5-number summary) is
best for this distribution?
- The mean is
= $7.256 and the
standard deviation is s = $1.217. Use the table of values to count the
number of prices that are more than 1 standard deviation above the
mean, and then convert this count into a proportion by dividing it by
20 (the number of individuals in our data set). What proportion of this
data set lie more than 1 standard deviation above the mean?
- If the distribution above were perfectly bell shaped, what
proportion of the data would lie more than one standard deviation above
the mean according to the 68-95-99.7 rule? How close is your actual
proportion from part (c) to this theoretical proportion?
- Calculate the 5 number summary for the lipstick price data
above.
- How high, or low would a price have to be before the 1.5xIQR
rule labeled it as a suspected outlier?
- Assume that the length (in inches) of trout in a given river are
normally
distributed. The Big Hole river has trout which average 11.3 inches in
length and have a standard deviation of 2.1 inches. Yellowstone river
trout
have lengths with μ =14.2 inches and σ
= 4.6 inches.
- What proportion of trout caught in the Big Hole river are
between 10
and
12 inches?
- What percentage of Yellowstone river trout are at least 22
inches long?
- Which of the two rivers has a smaller proportion of trout that
are 8
inches
long or less?
- How long does a Yellowstone trout need to be if it is in
the top
(longest) 3% of all Yellowstone trout?
- A trout has a standardized length of z = 1.42. How long is this
trout
if
it came from the Big Hole river? How long would it be if it came from
the
Yellowstone river? Hint: think
of a z-score as the number of standard deviations above or below
average, or solve the formula, z = (x -
μ) / σ for x.
- Bill fishes the Big Hole river and catches a 15.6 inch trout.
Babe
fishes
in the Yellowstone and catches a 23.5 inch trout. Whose trout, Bill's
or
Babe's, is more exceptional? How do you know?
- Yearly averages for the Dow Jones Industrial Average (an index of
stocks) are given below. Make a time-plot of the data and describe any
trends or deviations from the trends that appear.
Year
|
1988
|
1989
|
1990
|
1991
|
1992
|
1993
|
1994
|
1995
|
1996
|
1997
|
1998
|
1999
|
2000
|
2001
|
2002
|
Index
|
2062
|
2510
|
2670
|
2933
|
3282
|
3565
|
3735
|
4494
|
5740
|
7448
|
8631
|
10482
|
10731
|
10209
|
9214
|
-
To help analyze the long distance
calling habits of students, 15
phone
extensions in the dorms at Central College were surveyed for one week.
The number of long distance calls made and the average length of these
calls are given below.
| Number of calls |
6 |
5 |
20 |
7 |
19 |
11 |
8 |
5 |
6 |
6 |
9 |
5 |
6 |
21 |
6 |
| Avg Length of calls (min) |
6.1 |
8.3 |
4.2 |
6.1 |
1.6 |
4.4 |
5.8 |
5.9 |
4.4 |
8.2 |
1.2 |
7.7 |
3.9 |
0.6 |
2.0 |
- Make a (well-labeled) histogram of the number of calls made
data. Describe the shape of this data set. Are there any outliers?
- Make a stem plot of the average length of calls data.
Describe
the shape of this data set. Are there any outliers?
- One of the data sets above is best described by the five number
summary,
the other is best described by its mean and standard deviation. Decide
which is which (explain your choice), and calculate the appropriate
numerical summary for each data set.
- From your five number summary, make a boxplot of that data set.
- The 5-number summary for a data set is 4, 30, 35, 55, 110, while
the mean is 46.
- What can you say about the shape of this data set?
- Does the 1.5 IQR rule suggest that this data set has outliers?
If so, which side (right, left, or both) has suspected outliers?
- The density curve shown below is for a normal distribution that
has a
mean
and standard deviation that are both integers (whole numbers).
- What would you guess is the value of the mean for this
distribution? What is the standard deviation?
- Shade in the area that corresponds to the proportion of data
from this
distribution that are less than 4, and give a decent estimate of this
area.
- For each variable pair below, guess whether the
association is positive or negative and explain your choice.
- X = number of years playing golf, Y = score on an 18 hole golf
course (in golf, low scores are better).
- X = number of years you've been bowling, Y = average bowling
score over last 5 games bowled.
- X = weight of a trout, Y = age of a trout.
- X = hours spent in training for a job, Y = errors made in first
week on the job.
- In relation to the board game Monopoly, you are interested in
predicting the rent (R) of a property with one house, from that
property's purchase price (P). For example, it will cost you P = $280
to buy the property Marvin Gardens, and once you have all three yellow
properties and one house on Marvin Gardens, you will collect a rent of
R = $120 each time an opponent lands there. You decide to make a
scatterplot for this situation.
- Which variable (P or R) is the explanatory variable and which
variable (P or R) is the response variable?
- It turns out that for the situation above, the correlation for
this scatterplot is r = .994. Describe in words (in terms of
association, shape, and strength) or with a possible plot (you only
need to plot a handful of data points, say 5 or 6, and don't worry
about specific values for X or Y, just display the proper form and
strength of the correlation above), what this means.