Name(s):                                                   :
Hard Questions, Samples and Their Distributions
Tom Linton, Statistics, February 25, 2000
Imagine that you loved your job, were paid well for what you did, and one day your boss asked you how many letters there were in a typical word of an American textbook. Your answer would determine whether or not you remained on your company's payroll (that is, a bad answer meant you were fired). Finding the exact answer is straightforward, collect every American textbook, count the number of letters in each word and calculate the average. Unfortunately, this task would take more time than anyone has, and cost more money than any employee could afford.

In contrast, imagine that your boss cooked a pot soup and asked you whether or not the soup was good. To answer the question, your boss gave you a large bowl of the soup to taste. In this setting, it is clear that one or two teaspoons full of the soup will suffice to answer the question. You certainly to do not need to consume the entire bowl to decide if the recipe is a good one!

The main idea behind statistical inference is captured in the contrast of the two situations above:

A small sample of a large data set should be adequate to make accurate predictions about characteristics of the larger group.

Many predictions boil down to knowing the average value (or mean) of a large population, and the spread, or variance of this data set. You'd have a decent understanding of how many letters occur in an American word of a textbook, if you knew the average number of letters in such a word, and had a feel for how likely it was for a word to be 1, 2, 3 etc. letters longer or shorter. The point of this exercise is to introduce the two most commonly used formulas for estimatimation from samples, and to help convince you that such formulas (called statistics) are themselves random variables and therefore have their own distributions.

  1. Select a textbook (preferably not a math text as these have a lot of symbols and formulas) and repeat the following procedure 5 times.
  2. Record your results in the table below and then add your results to the class data set on the board.
     
    Single Word Letter Counts
    Trial 1 2 3 4 5
    Number of Letters                         
    1. Calculate the mean (average) number of letters in a word from the class data set.

    2.  

       
       
       
       
       
       

    3. Record the standard deviation of the class data set below (Tom will calculate this).

    4.  

       

    5. Now do a similar experiment, except this time when you point to a random word on the page, calculate the number of letters in that word and the eight words that follow it. From each set of 9 words, calculate their mean and standard deviation. Record your results below and add your data to the class data set on the board.

     
    9-Word Letter Counts and Statistics
    Trial Letter Counts (4,3,7 etc.) Average Std Dev
    1
         
    2
         
    3
         
    4
         
    5
         
    1. Draw a picture of the class data set for 9-word average letter counts and then draw a density function that comes close to approximating this data set.

    2.  

       
       
       
       
       
       
       
       
       
       
       
       

    3. Repeat the last question for the standard deviations from samples of size 9.










    4. How do the mean and standard deviation of the samples of size 9 compare to the mean and standard deviation from parts (a) and (b)?

    5.  

       
       
       
       
       

  3. The numbers below come from an exponential random variable X with mean equal to 4 and standard deviation equal to 2. The density for X is also shown. Note that this distribution in highly non-normal!

    1. Pick one number (at random) from each row and circle it. This is a sample of size 4. Calculate your samples average and standard deviation. Repeat this procedure 5 times and then add your data to the class data set on the board.
    Exponential Samples
    Trial Average Std Dev
    1
       
    2
       
    3
       
    4
       
    5
       
    1. Draw a picture of the class data set below and add a density curve that comes close to summarizing the data. Does your density curve look like the density curve for X, or is it different? How do the means and standard deviations of the samples compare to the means and standard deviation of X?