Math 203, Introduction to Statistics,
Central College, Spring 2001 Exam 1 Review Sheet.
by Tom Linton,
and Wendy Weber
Key Terms, Phrases, Results etc.
At this moment, the following terms, results, phrases etc. are, in my opinion,
the most important of those covered in the first 2 chapters. You should
understand them for the exam.
categorical versus quantitative variables, histograms, stem plots,
split stems, time plots and trends, symmetric versus skewed distributions,
outliers and influential data points, center and spread, mean, median and
quartiles, boxplots, standard deviation, normal distribution, standard
normal distribution, density curve, explanatory versus response variables,
linear, association, correlation, least squares regression, residuals,
slope and intercept of a line, causation, lurking variables.
Practice Problems
If you can answer the following questions rather quickly, you should be
well prepared for the exam. These problems cover most of the topics which
will appear on exam 1.
-
Data on the height in inches and shoe size of the members of the mathematics
and computer science department at Central College are given in the table
below.
| Height (inches) |
66 |
58 |
69 |
69 |
71 |
73 |
74 |
| Shoe Size |
7.5 |
9 |
9 |
10.5 |
10 |
11 |
11.5 |
-
Calculate the median shoe size and also find the first quartile, Q1, for
the height data.
-
Calculate the mean height and mean shoe size.
-
Make a scatter plot (call it plot 1) using height as X and shoe size as
Y. Does the association seem to be positive, negative, or neither?
-
Find the equation (call it Y1) of the least-squares regression line for
the scatter plot above and give the correlation, r, for this plot.
-
What percentage of the variation in shoe size is due to the linear relationship
between height and shoe size?
-
Now make a scatter plot (call it plot 2) using X = shoe size and Y = height.
Find the least squares regression equation (call it Y2) for this plot and
the correlation, r.
-
Using Y1 and plot 1, estimate the shoe size for someone whose height is
70 inches and the height of someone with a shoe size of 5.
-
Using Y2 and plot 2, estimate the shoe size for someone whose height is
70 inches and the height of someone with a shoe size of 5.
-
Which of the four estimates above do you have more confidence in? Why?
-
Verify that both regression lines contain the point whose coordinates are
the mean height and mean shoe size. Note: for the first regression the
point is (mean ht, mean size), for the second regression the point is (mean
size, mean ht.).
-
For the first regression, which data point has the biggest residual
(largest in absolute value).
-
If you delete the data point (ht = 66, shoe size = 7.5) from the first
regression, what is the new least squares regression equation, and the
new correlation?
-
Would you consider the point (66, 7.5) to be influential? Why or why not?
-
Each of the plots below is drawn on the same scale (or window).
-
Describe the direction of the association (positive, negative, neither)
for each plot above, and its strength.
-
Estimate the correlation (give a number) for each of the plots above.
-
Draw in what you think is a good approximation to the least squares regression
line for plot 2.
-
On plot 2, does the fourth data point from the left have a positive or
negative residual?
-
Assume that the length (in inches) of trout in a given river are normally
distributed. The Big Hole river has trout which average 11.3 inches in
length and have a standard deviation of 2.1 inches. Yellowstone river trout
have lengths with m =14.2 inches and s
= 4.6 inches.
-
What proportion of trout caught in the Big Hole river are between 10 and
12 inches?
-
What percentage of Yellowstone river trout are at least 22 inches long?
-
Which of the two rivers has a smaller proportion of trout that are 8 inches
long or less?
-
How long does a Yellowstone river trout need to be if it is in the top
(i.e. longest) 3% of all Yellowstone river trout?
-
Bill fishes the Big Hole river and catches a 15.6 inch trout. Babe fishes
in the Yellowstone and catches a 23.5 inch trout. Whose trout, Bill's or
Babe's, is more exceptional?
-
The game of golf is one where a low score is better than a high score,
so as you get better, your score decreases. A golf instructor believes
that more experience playing the game improves your score. To investigate
this claim, she collects data from 10 of her students, namely the number
of years each student has been playing golf, and the average score of each
student.
-
Which variable (years playing golf or average score) is the response variable
and which is the explanatory variable?
-
According to this golf instructor, would you expect a positive or negative
association between these two variables? Explain.
-
If the least squares regression line equation for the instructor's 10 data
points was Y = 130 - 12.8*X, what is the meaning (in common-everyday language)
of the slope -12.8?
-
Using the regression equation from part (c), what is the predicted score
for a golfer with 11 years of experience? Does this make sense? Explain.
-
To help analyze the long distance calling habits of students, 15 phone
extensions in the dorms at Central College were surveyed for one week.
The number of long distance calls made and the average length of these
calls are given below.
| Number of calls |
6 |
5 |
20 |
7 |
19 |
11 |
8 |
5 |
6 |
6 |
9 |
5 |
6 |
21 |
6 |
| Length of calls (min) |
6.1 |
8.3 |
4.2 |
6.1 |
1.6 |
4.4 |
5.8 |
5.9 |
4.4 |
8.2 |
1.2 |
7.7 |
3.9 |
0.6 |
2.0 |
-
In addition to the variables above, the school also collected information
on the number of students in each room, the long distance company (MCI,
Sprint, etc.) chosen by each room's residents and the gender of the students
in each room. Which of these variables (including the two listed in the
table) are quantitative? Which are categorical?
-
Make a (well-labeled) histogram of the number of calls made data.
-
Make a stem and leaf plot of the average length of calls data.
-
One of the data sets above is best described by the five number summary,
the other is best described by its mean and standard deviation. Decide
which is which (explain your choice) and calculate the five number summary
for that data set and the mean and standard deviation for the other.
-
From your five number summary, make a boxplot of that data set.
-
For the mean and standard deviation data set, how many of the observations
fall within one standard deviation of the mean? How many should fall within
one standard deviation according to the empirical guidelines?
-
If some researcher believes that making more long distance calls lowers
the average length of calls,
-
Which variable (number of calls, or length of calls) is the
explanatory variable and which is the response variable?
-
If the researcher is correct, should a scatter plot show a negative or
positive association?
-
The density curve shown below is for a normal distribution that has a mean
and standard deviation that are both integers (whole numbers).
-
What would you guess is the value of the mean for this distribution?
-
Guess at the value of the standard deviation s.
-
Shade in the area that corresponds to the proportion of data from this
distribution that are less than 4.