Name(s):                                                   :
Introduction to Statistics Activity on SRS's from the TI-83
Tom Linton, February 23, 2000
The random digits table (Table B in the text) can be used to select an SRS of size n from any population that has been labeled with numbers. However, the process has its drawbacks unless you have exactly 10, 100, or 1000 individuals in your population (so you don't have to skip invalid entries in the table). Today we will look at the notion of stratified random samples, how to use the TI-83 to quickly select our SRS's, and explore some of the reasons why we use random samples.

A simple random sample (SRS) of size n is a sample chosen in such a way that all groups of n individuals from our population have an equal chance of being selected. Samples which are random tend to agree with the characteristics of the population from which they are chosen. This limits the impact of under-representation and other problems that may occur from other sampling techniques. If a population consists of several different types of individuals, say 42% with brown hair, 33% with blonde hair, 20% with black hair and 5% with red hair, then a random sample (assuming it is large enough) from this population should come close to matching these percentages. The reason is that every individual is equally likely to be included, so about 42% of the chosen individuals will have brown hair (because 42% of the population does and everyone is equally likely to be included), about 33% will have blonde hair (since 33% of the population does), and so on.

A random sample tends to reproduce the characteristics of the population in a smaller scale.

However, even random samples can give poor matches to population characteristics. We might actually pick 10 persons from the population above, all of whom have red hair. It isn't very likely, but it could happen. For sure, we should expect our samples to be slightly different than the population, and we must realize that different samples will have different characteristics. It is unlikely that any two samples (of a large size) will exactly match the population, or exactly match one another.
 

  1. Let's explore the typical variation and population matching power amongst random samples from a population with two types of individuals, say 33% who prefer diet pop and 67% that prefer regular pop. Our make believe population will be the numbers from 1 to 100. Individuals 1 to 33 like diet pop and individuals 34 to 100 prefer regular pop. Note: there are 33 / 100 = 33% who prefer diet and 67 / 100 = 67% that prefer regular pop. If we selected 10 random individuals from this population, we'd expect 3 or 4 to prefer diet pop and the rest (6 or 7) to prefer regular. However, different samples will have different numbers that prefer diet or regular pop. We'll use our TI-83's to generate our random samples of size 10. To make sure that different groups select different samples, we need to seed or calculator's random number generator.

  2.  
    1. Pick a number (not a nice round one like 7500, but a messy one) from 5000 to 10000 and record it here                      . This number is your seed value.

    2.  
    3. On your home screen, type in the number you selected above, then the [STO>] key. Now press [MATH] [left-arrow] (to select the probability sub-menu, [PRB]), select [1:rand], and finally press [ENTER]. This command tells your calculator to start generating random numbers starting at the location specified by your seed value chosen in part (a).

    4.  

       

    5. Several of the commands we use today will come from the [PRB] (probability) sub-menu of the [MATH] menu. You should remember how to get to this sub-menu. Next, we want to have our calculator select 10 numbers (randomly) each having a value from 1 to 100 and store them in the list L1. This is accomplished with the command (don't execute this command yet).
    randInt(1,100,10)[STO>][2nd]L1
      and the randInt command is located on the probability sub-menu of the [MATH] menu. This command will select 10 numbers (the third parameter in the command) with values from 1 (the first parameter) to 100 (the second parameter) and place them into the list L1.
       
    1. If you wanted to store 15 numbers from 0 to 47 in L2, what command would you use? Make sure that everyone in your group understands the answer to this question. You can quickly select an SRS of size n from any population using the randInt command!

    2.  

       
       
       
       
       
       
       
       

    3. One potential problem with using the randInt command to generate samples, is that it can produce the same number (or numbers) more than once (so the sample wouldn't have n distinct individuals in it). When this happens, you can enter another randInt command without the third parameter, and keep pressing [ENTER] until you have enough individuals in your sample. For example, the command randInt(1,100) will generate a single random value from 1 to 100. If you then press [ENTER], the calculator will generate another single random number from 1 to 100. Execute the randInt command from part (c) above and then load your statistical editor to look at the values in your sample. If your sample has repeated values, generate new individuals for your sample as described until you have a sample of size 10. Record your sample below, sorting it from smallest (number 1) to largest (number 10).

    4.  
    Sample 1
    1 2 3 4 5 6 7 8 9 10
                       

     
    1. Count the number of individuals in your first sample that prefer diet pop. Call this count D1 and record the value in the table below, following part (g).

    2.  

       
       
       

    3. Now, each group should generate a total of 5 samples (so you need to do 4 more samples) of size 10 and count the number of individuals in the samples that prefer diet pop. After you generate a sample and store it in L1, you can have the calculator sort the sample (execute the command SortA(L1), the SortA command is on the [OPS] sub-menu of the [LIST] menu). Record the number of individuals from each sample that prefer diet pop in the table below, and add your data to the class data set on the board. Be sure to resolve any problems from numbers that appear more than once in your samples.
    Diet Pop Counts
    D1 D2 D3 D4 D5
             

     
     
    1. Record the counts for the class data set below (how many samples had D = 0, 1, 2, 3 etc. individuals that preferred diet pop).
Diet Pop Counts
D =
Class Totals                  
    1. Comment on the variation in the sample counts and how well the random samples did in reproducing the population characteristics of 33% preferring diet pop. You should note the a sample of size 10 is not really large enough to give a good representation of this population.

    2.  

       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       

While one can expect a truly random sample to be a decent representation of the population, there are some situations where you'd like to avoid certain samples that possess undesirable characteristics. Suppose that a school is located near the intersection of a lower-class neighborhood and an upper-class suburb of St. Louis and that 30% of the students at this school come from the lower-class neighborhood and 70% come from the suburb. The school board is considering a proposal to change the starting and ending times of the school day and they want to know how parents (of the students at this school) feel about this proposal. They believe that parents from the lower-class neighborhood and parents from the suburb may have different opinions on this proposal. This situation calls for a sample of parents that closely agrees with the makeup of the student body (30% lower-class and 70% suburb). A random sample of size 100 would likely contain a split close to this 30-70 level, but splits like 40-60 or even 60-40 are possible (but not that likely) if we simply select 100 sets of parents at random. In this case, the school board can stratify (break into groups) the population of parents, and select a fixed number of parents from each strata (the strata are the groups you break the population into, here, we have lower-class and suburb as our strata). If we wanted a sample of 100 parents, we would take 30 parents from the lower-class strata and 70 parents from the suburb, thus forcing our sample to closely represent the split in the student body.
  1. Suppose that the school has 1200 students, 362 from the lower-class neighborhood and 838 from the suburb (roughly a 30-70 split). The school board can afford to interview 35 sets of parents, thus the board wants to select a stratified random sample of 35 parents.

  2.  
    1. How many sets of parents should be chosen from the lower-class neighborhood, so that about 30% of the parents come from this neighborhood?

    2.  

       
       
       
       
       
       
       

    3. How many sets of parents should be selected from the suburb?

    4.  

       
       
       
       
       
       
       
       
       

    5. Assume that the parents of the 362 lower-class students have been labeled with numbers from 1 to 362 and the parents of the suburban students have been labeled with numbers from 1 to 838. We can use the randInt command (twice) to select our stratified sample. However, we should seed our calculator's random number generator with a specific seed-value and record the seed value (so I can grade your answer, and everyone will obtain the same answer). Since this is problem number 2, seed your calculator with the value 2 (issue the command 2 [STO>] rand [ENTER]). Now use the randInt command (twice, with different parameters each time) to select the stratified random sample of parents (if duplicates appear, fix them) and record the results below.

    6.  

       

      Lower-class parent numbers in sample:
       
       
       

      Suburban parent numbers in sample: