Math 203, Introduction to Statistics,
Central College, Fall 1999 Exam 1 Review Sheet.
Tom Linton, http://www.central.edu/homepages/lintont/
Exam Particulars
The exam covers chapters 1 and 2 (section 2.5 was skipped and will NOT
be on the exam) of the text (The Basic Practice of Statistics, by
Moore). The exam will be open book (the text plus one page, 8.5 by 11,
both sides, of notes) and calculators will be allowed. In fact, you will
need a calculator which can calculate such things as mean, median, correlation,
residuals and least squares linear regression equations. The focus of the
exam will be on the interpretation or meaning of these statistical quantities
(as opposed to the calculation of such quantities), but a fair bit of calculation
will be required (with the aid of calculators). Questions on the exam will
be similar (at least in the mind of the professor) to the homework assignments,
however, some of the exam questions will require knowledge covered in a
variety of sections of the text. As such, this exam is likely to be the
most difficult aspect of this course, so far.
A Piece of Advise
The fact that this exam is open book should NOT lower your desire to study
for it. If you need to search through the text to help answer all of the
questions, you will likely NOT complete the exam in the 50 minute period.
You should strive to know enough of the material so that you really don't
need your text, but can rely on it being there just in case. Rather than
devoting time to memorizing formulas, write them on your one page of notes,
or know where the formulas are located in your text, and attempt to understand
what these formulas mean or tell you.
Key Terms, Phrases, Results etc.
At this moment, the following terms, results, phrases etc. are, in my opinion,
the most important of those covered in the first 2 chapters. You should
understand them for the exam.
categorical versus quantitative variables, histograms, stemplots,
time plots and trends, symmetric versus skewed distributions, outliers
and influential datapoints, center and spread, mean, median and quartiles,
boxplots, standard deviation, normal distribution, standard normal distribution,
explanatory versus response variables, linear, association, correlation,
least squares regression, residuals, predicted versus actual values, slope
and intercept of a line, causation, lurking variables.
Practice Problems
If you can answer the following questions rather quickly, you should be
well prepared for the exam. These problems cover most of the topics which
will appear on exam 1.
-
You are designing a study to investigate relationships between persons
(their lifestyles, habits and physical characteristics) and problems with
vision. A key seems to be "exposure to sunshine".
-
Give an example of both a categorical and quantitative variable which you
would analyze for each individual (patient) you study.
-
Other studies suggest that wearing a hat (often) lowers the risk of a common
eye problem (cataracts, which is roughly spots of opaqueness, or non-see-throughness
in the eyes). You decide to ask patients in your study what percentage
of the time they wore a hat when they were outside in sunshine, and measure
the percentage of their eyes which suffer from cataracts.
-
Which variable (time with hat on, or percentage of cataracts in the eye)
is the response variable?
-
Do the other studies suggest a positive or negative association between
the explanatory and response variable?
-
Upon analysis of the data on how often patients wore a hat, when outside
in sunshine, you see a distribution which is quite skewed to the right.
Draw a possible density function of this distribution and indicate the
mean and median of your data-plot. In particular, which is larger, the
mean or the median?
-
Bowlers in the Thursday night Pella league have scores which are normally
distributed with a mean of 150 and a standard deviation of 17. The Saturday
night league (also normally distributed) averages 132 with a standard deviation
of 28. Sam bowls both Thursdays and Saturdays. One week Sam rolled a 180
on Thursday and a 173 on Saturday.
-
On which night was Sam's score more exceptional (better in terms of the
percentage of same-night bowlers which Sam out bowled)?
-
What percentage of bowlers would have outdone Sam's Thursday night score,
if the league average is normally distributed with a mean of 160 and a
standard deviation of 15?
-
If Sam wants to be better than 93% of all Thursday night bowlers, what
bowling score would Sam need to roll?
-
The age in years of each President (on the day they first took office)
is shown in the time plot below. The Presidents appear on the X-axis in
the order in which they assumed office (Washington =1, Clinton = 42, etc.).
-
Consider the first 8 Presidents (numbers 1 to 8), and the second group
of 8 Presidents (9 to 16). Which group appears to have more spread in the
variable A = age when assuming office?
-
Calculate the mean age and standard deviation of age (when assuming office)
for these 2 groups of Presidents. You should calculate two means and two
standard deviations (one for each group). Are the means about the same?
What about the standard deviations?
-
Would you consider the 9th President (whose age was 68 when
he assumed office) to be an outlier in the second group? Explain. Would
age = 68 be an outlier in the first group? Explain.
-
Make boxplots of both groups of 8 Presidents. Do the boxplots confirm your
answer to part a? How so?
-
Make a stem and leaf plot of the ages of the last 20 Presidents (numbers
23 to 42). Use split stems. Is the distribution of ages fairly symmetric?
Skewed one way or the other?
-
Based on the stem and leaf plot, would you expect the mean or median age
of the last 20 Presidents to be larger? Why? Calculate both the mean and
the median of these 20 Presidents’ ages.
-
Give the 5 number summary of the ages of the last 20 Presidents.
-
Problem number 1.68 on pages 83 and 84.
-
Packaged foods sold at supermarkets are not always the weight indicated
on the package. Variability always crops up in the manufacturing and packaging
process. Suppose that the weight of "12 oz" bags of Lays potato chips are
distributed in an approximately normal distribution with a mean weight
of 12 ounces and a standard deviation of 0.4 ounces. If a 12 oz bag of
Lays chips is selected at random, what are the chances that
-
It weighs somewhere between 11.6 and 12.4 ounces?
-
It weighs less than 11.5 ounces?
-
It weighs more than 12.6 ounces?
-
Data on the height in inches and shoe size of the members of the mathematics
and computer science department at Central are given in the table below.
|
Height (inches)
|
Shoe Size
|
| 66 |
7.5 |
| 68 |
9 |
| 69 |
9 |
| 69 |
10.5 |
| 71 |
10 |
| 73 |
11 |
| 74 |
11.5 |
-
Calculate the mean height and mean shoe size.
-
Make a scatterplot using height as X and shoe size as Y.
-
Find the equation of the least-squares regression line for the scatterplot
above and give the correlation, r for this plot.
-
Now make a scatterplot using X = shoe size and Y = height. Find the least
squares regression equation for this plot and the correlation, r.
-
From your regression line plots, estimate (two values, one for each regression)
the shoe size for someone whose height is 70 inches. Repeat for a shoe
size of 5.
-
Which of the estimates above do you have more confidence in? Why?
-
Verify that both regression lines contain the point whose coordinates are
the mean height and mean shoe size. Note: for the first regression the
point is (mean ht, mean size), for the second regression the point is (mean
size, mean ht.).
-
For the first regression, which data point has the biggest residual (largest
positive or most negative, so which residual has the largest absolute value).
-
If you delete the data point (ht = 66, shoe size = 7.5) from the first
regression, what is the new least squares regression equation, and the
new correlation?
-
Would you consider the point in part i to be influential? Why or why not?
-
Number 2.86 on page 170.