Math 203, Introduction to Statistics,
Central College, Fall 2000 Exam 1 Review Sheet.
by Tom Linton,
and Wendy Weber
Key Terms, Phrases, Results etc.
At this moment, the following terms, results, phrases etc. are, in my opinion,
the most important of those covered in the first 2 chapters. You should
understand them for the exam.
categorical versus quantitative variables, histograms, stem plots,
time plots and trends, symmetric versus skewed distributions, outliers
and influential data points, center and spread, mean, median and quartiles,
boxplots, standard deviation, normal distribution, standard normal distribution,
density curve, explanatory versus response variables, linear, association,
correlation, least squares regression, residuals, predicted versus actual
values, slope and intercept of a line, causation, lurking variables.
Practice Problems
If you can answer the following questions rather quickly, you should be
well prepared for the exam. These problems cover most of the topics which
will appear on exam 1.
-
Data on the height in inches and shoe size of the members of the mathematics
and computer science department at Central are given in the table below.
| Height (inches) |
66 |
58 |
69 |
69 |
71 |
73 |
74 |
| Shoe Size |
7.5 |
9 |
9 |
10.5 |
10 |
11 |
11.5 |
-
Calculate the median shoe size and also find the first quartile, Q1, for
the height data.
-
Calculate the mean height and mean shoe size.
-
Make a scatterplot using height as X and shoe size as Y. Does the association
seem to be positive, negative, or neither?
-
Find the equation of the least-squares regression line for the scatterplot
above and give the correlation, r, for this plot.
-
What percentage of the variation in shoe size is due to the linear relationship
between height and shoe size?
-
Now make a scatterplot using X = shoe size and Y = height. Find the least
squares regression equation for this plot and the correlation, r.
-
From your regression line plots, estimate (two values, one for each regression)
the shoe size for someone whose height is 70 inches. Repeat for a shoe
size of 5.
-
Which of the four estimates above do you have more confidence in? Why?
-
Verify that both regression lines contain the point whose coordinates are
the mean height and mean shoe size. Note: for the first regression the
point is (mean ht, mean size), for the second regression the point is (mean
size, mean ht.).
-
For the first regression, which data point has the biggest residual
(largest in absolute value).
-
If you delete the data point (ht = 66, shoe size = 7.5) from the first
regression, what is the new least squares regression equation, and the
new
correlation?
-
Would you consider the point in part k to be influential? Why or why not?
-
Assume that the length (in inches) of trout in a given river are normally
distributed. The Big Hole river has trout which average 11.3 inches in
length and have a standard deviation of 2.1 inches. Yellowstone river trout
have lengths with m =14.2 inches and s
= 4.6 inches.
-
What is the probability that a random trout caught in the Big Hole river
is between 10 and 12 inches?
-
How likely is it for a Yellowstone river trout to be at least 22 inches
long?
-
Which of the two rivers has a smaller proportion of trout that are 8 inches
long or less?
-
How long does a Yellowstone river trout need to be if it is in the top
(i.e. longest) 3% of all Yellowstone river trout?
-
Bill fishes the Big Hole river and catches a 15.6 inch trout. Babe fishes
in the Yellowstone and catches a 23.5 inch trout. Whose trout, Bill's or
Babe's, is more exceptional?
-
To help analyze the long distance calling habits of students, 15 phone
extensions in the dorms at Central College were surveyed for one week.
The number of long distance calls made and the average length of these
calls are given below.
| Number of calls |
6 |
5 |
20 |
7 |
19 |
11 |
8 |
5 |
6 |
6 |
9 |
5 |
6 |
21 |
6 |
| Length of calls (min) |
6.1 |
8.3 |
4.2 |
6.1 |
1.6 |
4.4 |
5.8 |
5.9 |
4.4 |
8.2 |
1.2 |
7.7 |
3.9 |
0.6 |
2.0 |
-
In addition to the variables above, the school also collected information
on the number of students in each room, the long distance company (MCI,
Sprint, etc.) chosen by each room's residents and the gender of the students
in each room. Which of these variables (including the two listed in the
table) are quantitative? Which are categorical?
-
Make a (well-labeled) histogram of the number of calls made data.
-
Make a stem and leaf plot of the average length of calls data.
-
One of the data sets above is best described by the five number summary,
the other is best described by its mean and standard deviation. Decide
which is which (explain your choice) and calculate the five number summary
for that data set and the mean and standard deviation for the other.
-
From your five number summary, make a boxplot of that data set.
-
For the mean and standard deviation data set, how many of the observations
fall within one standard deviation of the mean? How many should fall within
one standard deviation according to the 68-95-99.7 rule?
-
If some researcher believes that making more long distance calls lowers
the average length of calls,
-
Which variable (number of calls, or length of calls) is the
explanatory variable and which is the response variable?
-
If the researcher is correct, should a scatterplot show a negative or positive
association?
-
The density curve for the distribution of the variable X is shown below.
-
Is the distribution roughly symmetric? skewed left? or skewed right?
-
Is the mean of this distribution less than 9 or greater than 9?
-
Which is larger for this distribution, the mean or the median?
-
Which is larger, the probability that X < 5 or the probability that
X > 25? Recall that areas under the density curve can be used to describe
these probabilities. Try shading the two areas referred to above before
you attempt to answer this question.