Name(s):
:
Introduction to Statistics Activity on SRS's from
the TI-83
Tom Linton, February 23, 2000
The random digits table (Table B in the text) can be used to select an
SRS of size n from any population that has been labeled with numbers.
However, the process has its drawbacks unless you have exactly 10, 100,
or 1000 individuals in your population (so you don't have to skip invalid
entries in the table). Today we will look at the notion of stratified
random samples, how to use the TI-83 to quickly select our SRS's, and
explore some of the reasons why we use random samples.
A simple random sample (SRS) of size n is a sample chosen
in such a way that all groups of n individuals from our population
have an equal chance of being selected. Samples which are random tend to
agree with the characteristics of the population from which they are chosen.
This limits the impact of under-representation and other problems that
may occur from other sampling techniques. If a population consists of several
different types of individuals, say 42% with brown hair, 33% with blonde
hair, 20% with black hair and 5% with red hair, then a random sample (assuming
it is large enough) from this population should come close to matching
these percentages. The reason is that every individual is equally likely
to be included, so about 42% of the chosen individuals will have brown
hair (because 42% of the population does and everyone is equally likely
to be included), about 33% will have blonde hair (since 33% of the population
does), and so on.
A random sample tends to reproduce the characteristics
of the population in a smaller scale.
However, even random samples can give poor matches to population characteristics.
We might actually pick 10 persons from the population above, all of whom
have red hair. It isn't very likely, but it could happen. For sure, we
should expect our samples to be slightly different than the population,
and we must realize that different samples will have different characteristics.
It is unlikely that any two samples (of a large size) will exactly match
the population, or exactly match one another.
-
Let's explore the typical variation and population matching power amongst
random samples from a population with two types of individuals, say 33%
who prefer diet pop and 67% that prefer regular pop. Our make believe population
will be the numbers from 1 to 100. Individuals 1 to 33 like diet pop and
individuals 34 to 100 prefer regular pop. Note: there are 33 / 100
= 33% who prefer diet and 67 / 100 = 67% that prefer regular pop. If we
selected 10 random individuals from this population, we'd expect 3 or 4
to prefer diet pop and the rest (6 or 7) to prefer regular. However, different
samples will have different numbers that prefer diet or regular pop. We'll
use our TI-83's to generate our random samples of size 10. To make sure
that different groups select different samples, we need to seed
or calculator's random number generator.
-
Pick a number (not a nice round one like 7500, but a messy one) from 5000
to 10000 and record it here
. This number is your seed value.
-
On your home screen, type in the number you selected above, then the [STO>]
key. Now press [MATH] [left-arrow] (to select the probability
sub-menu, [PRB]), select [1:rand], and finally press
[ENTER]. This command tells your calculator to start generating random
numbers starting at the location specified by your seed value
chosen in part (a).
-
Several of the commands we use today will come from the [PRB]
(probability) sub-menu of the [MATH] menu. You should remember
how to get to this sub-menu. Next, we want to have our calculator select
10 numbers (randomly) each having a value from 1 to 100 and store them
in the list L1. This is accomplished with the command (don't
execute this command yet).
randInt(1,100,10)[STO>][2nd]L1
and the randInt command is located on
the probability sub-menu of the [MATH] menu. This command will
select 10 numbers (the third parameter in the command) with values from
1 (the first parameter) to 100 (the second parameter) and place them into
the list L1.
-
If you wanted to store 15 numbers from 0 to 47 in L2, what command
would you use? Make sure that everyone in your group understands the answer
to this question. You can quickly select an SRS of size n from any
population using the randInt command!
-
One potential problem with using the randInt command to
generate samples, is that it can produce the same number (or numbers) more
than once (so the sample wouldn't have n distinct individuals in
it). When this happens, you can enter another randInt command
without the third parameter, and keep pressing [ENTER] until you
have enough individuals in your sample. For example, the command randInt(1,100)
will generate a single random value from 1 to 100. If you then press [ENTER],
the calculator will generate another single random number from 1 to 100.
Execute the randInt command from part (c) above
and then load your statistical editor to look at the values in your sample.
If your sample has repeated values, generate new individuals for your sample
as described until you have a sample of size 10. Record your sample below,
sorting it from smallest (number 1) to largest (number 10).
Sample 1
| 1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
| |
|
|
|
|
|
|
|
|
|
-
Count the number of individuals in your first sample that prefer diet pop.
Call this count D1 and record the value in the table below, following part
(g).
-
Now, each group should generate a total of 5 samples (so you need to do
4 more samples) of size 10 and count the number of individuals in the samples
that prefer diet pop. After you generate a sample and store it in L1, you
can have the calculator sort the sample (execute the command SortA(L1),
the SortA command is on the [OPS] sub-menu of
the
[LIST] menu). Record the number of individuals from each sample
that prefer diet pop in the table below, and add your data to the class
data set on the board. Be sure to resolve any problems from numbers that
appear more than once in your samples.
Diet Pop Counts
| D1 |
D2 |
D3 |
D4 |
D5 |
| |
|
|
|
|
-
Record the counts for the class
data set below (how many samples had D = 0, 1, 2, 3 etc. individuals
that preferred diet pop).
Diet Pop Counts
| D = |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
| Class Totals |
|
|
|
|
|
|
|
|
|
-
Comment on the variation in the sample counts and how well the random samples
did in reproducing the population characteristics of 33% preferring diet
pop. You should note the a sample of size 10 is not really large enough
to give a good representation of this population.
While one can expect a truly random sample to be a decent representation
of the population, there are some situations where you'd like to avoid
certain samples that possess undesirable characteristics. Suppose that
a school is located near the intersection of a lower-class neighborhood
and an upper-class suburb of St. Louis and that 30% of the students at
this school come from the lower-class neighborhood and 70% come from the
suburb. The school board is considering a proposal to change the starting
and ending times of the school day and they want to know how parents (of
the students at this school) feel about this proposal. They believe that
parents from the lower-class neighborhood and parents from the suburb may
have different opinions on this proposal. This situation calls for a sample
of parents that closely agrees with the makeup of the student body (30%
lower-class and 70% suburb). A random sample of size 100 would likely contain
a split close to this 30-70 level, but splits like 40-60 or even 60-40
are possible (but not that likely) if we simply select 100 sets of parents
at random. In this case, the school board can stratify (break into
groups) the population of parents, and select a fixed number of parents
from each
strata (the strata are the groups you break the population
into, here, we have
lower-class and suburb as our strata).
If we wanted a sample of 100 parents, we would take 30 parents from the
lower-class strata and 70 parents from the suburb, thus forcing our sample
to closely represent the split in the student body.
-
Suppose that the school has 1200 students, 362 from the lower-class neighborhood
and 838 from the suburb (roughly a 30-70 split). The school board can afford
to interview 35 sets of parents, thus the board wants to select a stratified
random sample of 35 parents.
-
How many sets of parents should be chosen from the lower-class neighborhood,
so that about 30% of the parents come from this neighborhood?
-
How many sets of parents should be selected from the suburb?
-
Assume that the parents of the 362 lower-class students have been labeled
with numbers from 1 to 362 and the parents of the suburban students have
been labeled with numbers from 1 to 838. We can use the randInt
command (twice) to select our stratified sample. However, we should seed
our calculator's random number generator with a specific seed-value and
record the seed value (so I can grade your answer, and everyone will obtain
the same answer). Since this is problem number 2, seed your calculator
with the value 2 (issue the command 2 [STO>] rand [ENTER]).
Now use the randInt command (twice, with different parameters
each time) to select the stratified random sample of parents (if duplicates
appear, fix them) and record the results below.
Lower-class parent numbers in sample:
Suburban parent numbers in sample: