Name(s)                                                               :
Sampling Distributions and Variability
Introduction to Statistics, Fall 2006, Tom Linton
Work in groups of size two.
In today's activity, we will gather information that allows us to look at the shapes of the distributions of various statistics associated with samples. To do this, we will collect a large number of each such statistic and then display them all with a histogram or a stem plot. We want to look at what happens to the shape of the distribution of all possible X-bar values, as we change the sample size, n, used to calculate each x-bar value, as well as how the standard deviation of the collection of all x-bar  values changes with the sample size n. To get started, we need to make up a population of numbers. We will then draw our samples from this population.
  1. Here is rather strange discription of a random variable we will call Y (so that it doesn't look too much like the symbol for our sample mean, i.e. x-bar), and we will use Y to generate our population. This particular random variable was selected because its distribution is very far from "bell-shaped". Imagine rolling 2 dice (both assumed to be balanced or fair). Let d1 denote the larger of the 2 values rolled (jn case of a tie, d1 can be either of the two equal values), and let d2 denote the smaller of the 2 values rolled. Set Y = d1^2 - d2^2. For example, if your 2 dice turn up 4 and 1, then Y = 16 - 1 = 15, and if your 2 dice turn up 5 and 5, then Y = 25 - 25 = 0. Since we don't have enough dice to go around, we will use our calculators to generate values of Y. Each group will need to generate 7 values of the random variable Y. To generate one value of Y, issue the command randInt(1,6,2) on your calculator and interpret the 2 values returned as the 2 face-up values of your dice (in this case, repeats are fine, as they simply correspond to rolling doubles). It is perfectly fine if your different "rolls of the dice" produce the same values of Y (as if we were rolling real dice and simply rolled the same values more than once). Fill in the table below with the output from your 7 rolls of 2 dice, and the corresponding values of the random variable Y. Once you have all 7 values of Y, proceed to the front of the room and record them for the rest of the class to see.


    Roll 1
    Roll 2
    Roll 3
    Roll 4
    Roll 5
    Roll 6
    Roll 7
    randInt output







    Value of Y









  2. Once the entire class is done recording their values of Y, enter the "population data" into L1 and make a histogram of this data (you do NOT need to copy this histogram, just look at it). Does the histogram look Normal? Is it symmetric or skewed (which direction)? How close to Normal does this population look (very close, sort of close, etc.)? For future reference record the left endpoint of the first bar in the histogram, and the right endpoint of the last bar in the histogram (like the min and max of this collection of numbers, as a crude measure of the spread).











  3. Using 1VarStats, calculate the mean and standard deviation of the population (this time use σx not Sx for the standard deviation, as we are assuming we have the entire population in L1). Record these values below. Also record N = the number of individuals in the population.
  4.  

     

     

  5. Are the mean and standard deviation from (3) statistics or parameters?

  6.  



    We will now draw several samples from our population and calculate x-bar and SX for each sample. To do this, we will use the indicies beside each individual in our population as our labels, and then use our calculator's randInt command (do NOT seed your calculator however) to select our samples.

    We would use randInt(1,N,4) to select an SRS of size 4 (N is the total size of the population). Of course, we must now remove duplicates in our samples (now we do NOT want repeated individuals in our samples) with repeated calls to the command randInt(1,N).

    Once we have an SRS of the correct number of labels, we take the individuals from the population corresponding to those labels and enter them into L2 (using the STAT Editor). We can then calculate their mean and statndard deviation by running 1-VarStats on L2. For example, if your randInt command returns the values 38, 12, 9, 2, then you will use the individuals (from the population) next to labels 38, 12, 9 and 2, for your calculations. Do NOT simply average the numbers returned by your randInt command (those are just labels, not individuals from the population).

    You need to get the individuals in your sample into L2. You can simply read off the individual values from your stat-editor (scroll down to entries numbered 38, 12, 9, and 2 in L1), record them in the table below, and then type them directly into L2 (using the stat editor), or better yet, in your stat-editor screen, clear out the values in L2, and simply type L1(38) as the first entry for L2, L1(12) for the second entry in L2, and so on. Each time you press enter, your calculator will display the corresponding individual from L1 (which you can copy into the table below) in the appropriate spot in L2. Then you can run 1-VarStats L2 and record the mean (x-bar) and standard deviation (SX because now L2 does NOT contain the entire population, but a sample from the population) of your sample.
     

  7. Select 7 SRS's of size 2 (it is OK if your second, third, fourth, etc. samples contain individuals used in earlier samples, but no single sample can contain the same individual twice) from the population and calculate their mean and standard deviation. Record the information below and then copy your x-bar and Sx values to the stem-plots of class data on the board (rounded to one decimal place).


Samples of size 2
Sample SRS labels      Individuals  x-bar
Sx
1      
2
     
3
     
4




5




6




7




  1. Did every group get the same x-bar values in their samples?


  2.  
     

  3. Look at the sample standard deviations from the class. Note, these are standard deviations computed from samples, but they are NOT standard deviations for the spread between different x-bar values, they are standard deviations caculated from different Y-values from our population. We will see below where the standard deviation of   sigma / root ncomes into play. The classes values of Sx should bunch up around the population standard deviation (which you found in question 3). Does the class stem-plot of Sx values "bunch up" around the population standard deviation?








  4. Look at the stem-plot of the classes x-bar values. Just by looking, estimate the center of this data collection. Is this data more or less spread out than the population (look at the min and max as a crude measure)?







  5. Does this data set look more or less Normal (that is "bell shaped") than the population?








  6. Now select 5 SRS's of size 4 from the original population and calculate their means. Record the information below and copy your x-bar values to the class data set on the board (rounded to one decimal place). You can use the process described above for entering the individuals in your sample into L2, and running 1-VarStats on L2 to calculate your sample means. That is, record your labels below, then go to your STAT Editor screen and clear out the entries in L2. You can enter each of the new individuals in these samples into L2 by typing L1(k), where k is one of your labels returned by your randInt command, on each of the first 4 lines of L2.
    Samples of size 4
    Sample Labels
         Individuals  sample mean
    1
       
    2

       
    3

       
    4



    5




        
  7. Look at the stem-plot of x-bar values from samples of size 4. This is an estimate of the distribution X-bar (except now we have n = 4 instead of n = 2). Does this stem-plot look more Normal than the population?

  8.  




     
     
     

  9. Is this data set fairly symmetric? Is it more or less symmetric than the samples of size 2 data set? How about the original population?

  10.  




     
     
     

  11. Is the center of this data set about the same as the center of the population?

  12.  

     



     
     

  13. How does the spread of the collection of x-bar values from samples of size 4 compare to the spread of the population? How about the spread of the x-bar values from samples of size 2?








  14. Finally collect 3 samples of size 9, and record each sample mean on the class data set on the board.
  15.  
Samples of size 9
Sample Labels
     Individuals  sample mean
1
   
2

   
3