Name(s):
:
Hard Questions, Samples and Their Distributions
Tom Linton, Statistics, February 25, 2000
Imagine that you loved your job, were paid well for what you did, and one
day your boss asked you how many letters there were in a typical word of
an American textbook. Your answer would determine whether or not you remained
on your company's payroll (that is, a bad answer meant you were fired).
Finding the exact answer is straightforward, collect every American textbook,
count the number of letters in each word and calculate the average. Unfortunately,
this task would take more time than anyone has, and cost more money than
any employee could afford.
In contrast, imagine that your boss cooked a pot soup and asked you
whether or not the soup was good. To answer the question, your boss gave
you a large bowl of the soup to taste. In this setting, it is clear that
one or two teaspoons full of the soup will suffice to answer the question.
You certainly to do not need to consume the entire bowl to decide if the
recipe is a good one!
The main idea behind statistical inference is captured in the contrast
of the two situations above:
A small sample of a large data set should be adequate
to make accurate predictions about characteristics of the larger group.
Many predictions boil down to knowing the average value (or mean) of
a large population, and the spread, or variance of this data set. You'd
have a decent understanding of how many letters occur in an American word
of a textbook, if you knew the average number of letters in such a word,
and had a feel for how likely it was for a word to be 1, 2, 3 etc. letters
longer or shorter. The point of this exercise is to introduce the two most
commonly used formulas for estimatimation from samples, and to help convince
you that such formulas (called statistics) are themselves random
variables and therefore have their own distributions.
-
Select a textbook (preferably not a math text as these have a lot of symbols
and formulas) and repeat the following procedure 5 times.
-
Open the book to a random page.
-
Close your eyes and point to a random word on the page.
-
Count the number of letters in that word.
Record your results in the table below and then add your results to the
class data set on the board.
Single Word Letter Counts
| Trial |
1 |
2 |
3 |
4 |
5 |
| Number of Letters |
|
|
|
|
|
-
Calculate the mean (average) number of letters in a word from the class
data set.
-
Record the standard deviation of the class data set below (Tom will calculate
this).
-
Now do a similar experiment, except this time when you point to a random
word on the page, calculate the number of letters in that word and the
eight words that follow it. From each set of 9 words, calculate their mean
and standard deviation. Record your results below and add your data to
the class data set on the board.
9-Word Letter Counts and Statistics
| Trial |
Letter Counts (4,3,7 etc.) |
Average |
Std Dev |
|
1
|
|
|
|
|
2
|
|
|
|
|
3
|
|
|
|
|
4
|
|
|
|
|
5
|
|
|
|
-
Draw a picture of the class data set for 9-word average letter counts and
then draw a density function that comes close to approximating this data
set.
-
Repeat the last question for the standard deviations from samples of size
9.
-
How do the mean and standard deviation of the samples of size 9 compare
to the mean and standard deviation from parts (a) and (b)?
-
The numbers below come from an exponential random variable X with mean
equal to 4 and standard deviation equal to 2. The density for X is also
shown. Note that this distribution in highly non-normal!
-
Pick one number (at random) from each row and circle it. This is a sample
of size 4. Calculate your samples average and standard deviation. Repeat
this procedure 5 times and then add your data to the class data set on
the board.
Exponential Samples
| Trial |
Average |
Std Dev |
|
1
|
|
|
|
2
|
|
|
|
3
|
|
|
|
4
|
|
|
|
5
|
|
|
-
Draw a picture of the class data set below and add a density curve that
comes close to summarizing the data. Does your density curve look like
the density curve for X, or is it different? How do the means and standard
deviations of the samples compare to the means and standard deviation of
X?