The goal of this activity is to explain the reasoning behind significance
tests (also called hypothesis tests), their p-values and the calculation
of type I and II errors related to these tests.
A trusted friend offers you an investment where you give them $1000 and at the end of each month, your friend pays you your earnings. After each 3-month period, you can continue the investment for another 3 months, or withdrawal your $1000. The monthly earnings are claimed to vary in a normal fashion with a mean return of $15.00 and a standard deviation of $4.00. Recognizing that your average annual return will be $180 (18% profit), you consider this a good deal and decide to invest $1000.
Your first three monthly returns, $13.39, $13.29 and $13.64 (so
= $13.44) are below average and you start to doubt the claim that the average
monthly return is actually $15.00, thinking it is likely a smaller amount.
However, the friend is a good one, so you'd like to be pretty sure that
your 3-month average return wasn't just due to bad luck or the typical
variation in the average return over a three-month period, before pulling
your money out of the investment. In accordance with the scientific method
outlined above, we have our original hypothesis, namely that X is N(15,
4), and a single value of
,
from a sample of size n = 3, namely
= 13.44. The question we're interested in answering is how likely is
it that
= 13.44, if we assume that X is normal with a mean of 15 and a standard
deviation of 4? The problem is that since X (and hence
)
is continuous, the probability of a single point is zero. We need an interval
for our probability calculation, but we only have a single point from which
to determine this interval. There are only two reasonable intervals that
we could consider, namely
< 13.44 and
> 13.44. These two intervals have probabilities that sum to 1 (so they
are essentially equivalent) and since we are now leaning toward the alternative
assumption that m < 15, let's calculate P(
< 13.44), assuming X is N(15, 4). Standardizing our single x-bar value
yields
and the table of standard normal probabilities indicates that P(Z <
-0.6755) is about equal to 0.25. 25% of the time, just be the typical variation
in
,
we should expect a value of
that is 13.44 or less. Let's summarize our results and define some terminology
to help describe this situation. If we assume that our friend's hypothesis
that monthly returns are distributed in a normal fashion with m
= $15 and s = $4 are true, then 3-month returns
averaging $13.44 (or less) should happen one out of every four 3-month
periods. We are testing the original assumption (m
= 15) against the alternative that m
< 15. Because of our alternative, we calculated the probability that
< 13.44, which is called a left-tailed test (we calculate the
area under the density curve to the left of our observed value of
x-bar). We call the probability we calculated, namely P(
< 13.44) the p-value of our test. If our original assumption
is true, then values of
like ours (or lower), will occur roughly once in every four trials (although
this probability would be larger if m were closer
to 13.44). This means that we have very little, or no evidence that our
original assumption is false (our value is quite typical if we assume that
m
= 15). Based on this, we decide to leave our $1000 invested for another
3-month period.
After 6 months in the investment scheme, our monthly returns have averaged $12.91 (note, n = 6 now). Again, our returns are well below the stated value of $15 per month, but a 6-month average of $12.91 might be just due to the usual variation based on chance.
Errors of type I and II
Your test can now be restated completely in terms of x-bar values.
If
< C, then you pull your money out, rejecting the assumption that m
= 15 (and you are 95% confident that your decision is the correct one),
and if
> C, you leave your money in, because you are less than 95% confident that
m < 15, and your friend is a good one. In
this form, there are two types of errors that our test may commit. First
off, it could be that m = 15, yet we get an
value less than C. In this case, we reach the conclusion that m
< 15 (based on our test) and withdrawal our money, but actually m
= 15. Put simply, the original assumption is true, but our test said it
was false. This is called a type I error. The probability of making
a type I error is easy to calculate, it is P(
< C), if we assume that X is N(15, 4).
Another type of error occurs when our original assumption is false (so
m
< 15), but our test says that it is true. This is called a type II
error, and probabilities associated with type II errors vary, depending
on just how false our original assumption is. Since our test says that
the original assumption is true, we must have
> C. However, to calculate P(
> C), we need to know what the true value of m
is. Here is a sample. Suppose that m = 11 and
hence X is N(11, 4). We can calculate P(
> C) by standardizing. ( C - 11) / (4 / sqrt(6) ) = 0.80. From the table
of normal probabilities, we see that when m
= 11, the probability of a type II error is 0.2119. This means that when
monthly returns average $11, our test will decide that a six-month average
return is reasonable (i.e. the test says that m
= 15 is reasonable) about 21% of the time.
OK, the typical variation in 6-month average returns (assuming that
m
= 15 is true) are enough to account for our below average value of x-bar.
We decide to leave our $1000 invested for two more 3-month periods (a total
of 12 months in all now). However, after 12 months, our average monthly
return is again low. Like before, we want to be 95% sure that m
< 15 before we withdrawal our investment.
We decide to leave our $1000 invested if
> C, and withdrawal our $1000 if
< C.