Math 203 A, Introduction to Statistics,
Central College, Fall 2002
Exam 1 Review Sheet by Tom
Linton and Wendy
Weber
Key Terms, Phrases, Results etc. At this moment, the following
terms, results, phrases etc. are, in our opinion, the most important
of those covered in the first 2 chapters. You should understand them for
the exam.
categorical versus quantitative variables, histograms, stem plots,
split stems, time plots and trends, symmetric versus skewed distributions,
outliers and influential data points, center and spread, mean, median and
quartiles, boxplots, standard deviation, normal distribution, standard
normal distribution, density curve, explanatory versus response variables,
linear, association, correlation, least squares regression, residuals,
slope and intercept of a line, causation, lurking variables.
Practice Problems If you can answer the following questions rather
quickly, you should be well prepared for the exam. These problems cover
most of the topics which will appear on exam 1.
-
You decide to collect data related to students' preferences regarding the
food service at Central College. Give an example of a categorical variable
for this setting, and another example of a quantitative variable for this
situation.
-
Data on the height in inches and shoe size of the members of the Mathematics
and Computer Science Department at Central College are given in the table
below.
| Height (inches) |
66 |
58 |
69 |
69 |
71 |
73 |
74 |
| Shoe Size |
7.5 |
9 |
9 |
10.5 |
10 |
11 |
11.5 |
-
Calculate the median shoe size and also find the first quartile, Q1, for
the height data.
-
Calculate the mean height and mean shoe size.
-
Make a scatter plot (call it plot 1) using height as X and shoe size as
Y. Does the association seem to be positive, negative, or neither?
-
Find the equation (call it Y1) of the least-squares regression line for
the scatter plot above and give the correlation, r, for this plot.
-
What percentage of the variation in shoe size is due to the linear relationship
between height and shoe size?
-
Now make a scatter plot (call it plot 2) using X = shoe size and Y = height.
Find the least squares regression equation (call it Y2) for this plot and
the correlation, r.
-
Using Y1 and plot 1, estimate the shoe size for someone whose height is
70 inches and then estimate the height of someone with a shoe size of 5.
-
Using Y2 and plot 2, estimate the shoe size for someone whose height is
70 inches and the height of someone with a shoe size of 5.
-
Which of the four estimates above do you have more confidence in? Why?
-
Verify that both regression lines contain the point whose coordinates are
the mean height and mean shoe size. Note: for the first regression the
point is (mean ht, mean size), for the second regression the point is (mean
size, mean ht.).
-
For the first regression, which data point has the biggest residual
(largest in absolute value).
-
If you delete the data point (ht = 66, shoe size = 7.5) from the first
regression, what is the new least squares regression equation, and the
new correlation?
-
Would you consider the point (66, 7.5) to be influential? Why or why not?
-
Each of the plots below is drawn on the same scale (or window).
-
Describe the direction of the association (positive, negative, neither)
for each plot above, and its strength.
-
Estimate the correlation (give a number) for each of the plots above.
-
Draw in what you think is a good approximation to the least squares regression
line for plot 2.
-
On plot 2, does the fourth data point from the left have a positive or
negative residual?
-
Assume that the length (in inches) of trout in a given river are normally
distributed. The Big Hole river has trout which average 11.3 inches in
length and have a standard deviation of 2.1 inches. Yellowstone river trout
have lengths with m =14.2 inches and s
= 4.6 inches.
-
What proportion of trout caught in the Big Hole river are between 10 and
12 inches?
-
What percentage of Yellowstone river trout are at least 22 inches long?
-
Which of the two rivers has a smaller proportion of trout that are 8 inches
long or less?
-
How long does a Yellowstone river trout need to be if it is in the top
(i.e. longest) 3% of all Yellowstone river trout?
-
A trout has a standardized length of z = 1.42. How long is this trout if
it came from the Big Hole river? How long would it be if it came from the
Yellowstone river?
-
Bill fishes the Big Hole river and catches a 15.6 inch trout. Babe fishes
in the Yellowstone and catches a 23.5 inch trout. Whose trout, Bill's or
Babe's, is more exceptional?
-
The game of golf is one where a low score is better than a high score,
so as you get better, your score decreases. A golf instructor believes
that more experience playing the game improves your score. To investigate
this claim, she collects data from 10 of her students, namely the number
of years each student has been playing golf, and the average score of each
student. These 10 students have been playing golf for 0 to 4 years.
-
Which variable (years playing golf or average score) is the response variable
and which is the explanatory variable?
-
According to this golf instructor, would you expect a positive or negative
association between these two variables? Explain.
-
If the least squares regression line equation for the instructor's 10 data
points was Y = 130 - 12.8*X, what is the meaning (in common-everyday language)
of the slope -12.8? In common terms, what does the Y-intercept represent?
-
Using the regression equation from part (c), what is the predicted score
for a golfer with 11 years of experience? Does this make sense? Explain.
-
To help analyze the long distance calling habits of students, 15 phone
extensions in the dorms at Central College were surveyed for one week.
The number of long distance calls made and the average length of these
calls are given below.
| Number of calls |
6 |
5 |
20 |
7 |
19 |
11 |
8 |
5 |
6 |
6 |
9 |
5 |
6 |
21 |
6 |
| Avg Length of calls (min) |
6.1 |
8.3 |
4.2 |
6.1 |
1.6 |
4.4 |
5.8 |
5.9 |
4.4 |
8.2 |
1.2 |
7.7 |
3.9 |
0.6 |
2.0 |
-
In addition to the variables above, the school also collected information
on the number of students in each room, the long distance company (MCI,
Sprint, etc.) chosen by each room's residents and the gender of the students
in each room. Which of these variables (including the two listed in the
table) are quantitative? Which are categorical?
-
Make a (well-labeled) histogram of the number of calls made data.
-
Make a stem plot of the average length of calls data.
-
One of the data sets above is best described by the five number summary,
the other is best described by its mean and standard deviation. Decide
which is which (explain your choice), and calculate the five number summary
for that data set and the mean and standard deviation for the other.
-
From your five number summary, make a boxplot of that data set.
-
For the mean and standard deviation data set, how many of the observations
fall within one standard deviation of the mean? How many should fall within
one standard deviation according to the empirical guidelines?
-
If some researcher believes that making more long distance calls lowers
the average length of calls,
-
Which variable (number of calls, or average length of calls
) is the explanatory variable and which is the response variable?
-
If the researcher is correct, should a scatter plot show a negative or
positive association?
-
The density curve shown below is for a normal distribution that has a mean
and standard deviation that are both integers (whole numbers).
-
What would you guess is the value of the mean for this distribution?
-
Guess at the value of the standard deviation s.
-
Shade in the area that corresponds to the proportion of data from this
distribution that are less than 4.