NAMES:                                                                                :
BTI Significance or Hypothesis Tests and Errors
by Tom Linton
Work in groups of 1, 2 or 3 and turn in one paper per group.

The goal of this activity is to explain the reasoning behind significance tests (also called hypothesis tests), their p-values and the calculation of type I and II errors related to these tests.
 

The Reasoning Behind Significance Tests

One version of the scientific method can be summarized as follows: In statistics, we often implement this procedure as follows: To help explain this reasoning, consider the following scenario:

A trusted friend offers you an investment where you give them $1000 and at the end of each month, your friend pays you your earnings. After each 3-month period, you can continue the investment for another 3 months, or withdrawal your $1000. The monthly earnings are claimed to vary in a normal fashion with a mean return of $15.00 and a standard deviation of $4.00. Recognizing that your average annual return will be $180 (18% profit), you consider this a good deal and decide to invest $1000.

Your first three monthly returns, $13.39, $13.29 and $13.64 (so  = $13.44) are below average and you start to doubt the claim that the average monthly return is actually $15.00, thinking it is likely a smaller amount. However, the friend is a good one, so you'd like to be pretty sure that your 3-month average return wasn't just due to bad luck or the typical variation in the average return over a three-month period, before pulling your money out of the investment. In accordance with the scientific method outlined above, we have our original hypothesis, namely that X is N(15, 4), and a single value of , from a sample of size n = 3, namely  = 13.44. The question we're interested in answering is how likely is it that  = 13.44, if we assume that X is normal with a mean of 15 and a standard deviation of 4? The problem is that since X (and hence ) is continuous, the probability of a single point is zero. We need an interval for our probability calculation, but we only have a single point from which to determine this interval. There are only two reasonable intervals that we could consider, namely  < 13.44 and  > 13.44. These two intervals have probabilities that sum to 1 (so they are essentially equivalent) and since we are now leaning toward the alternative assumption that m < 15, let's calculate P( < 13.44), assuming X is N(15, 4). Standardizing our single x-bar value yields

Z = (13.44 - 15) / (4 / sqrt(3) ) = -0.6755,

and the table of standard normal probabilities indicates that P(Z < -0.6755) is about equal to 0.25. 25% of the time, just be the typical variation in , we should expect a value of  that is 13.44 or less. Let's summarize our results and define some terminology to help describe this situation. If we assume that our friend's hypothesis that monthly returns are distributed in a normal fashion with m = $15 and s = $4 are true, then 3-month returns averaging $13.44 (or less) should happen one out of every four 3-month periods. We are testing the original assumption (m = 15) against the alternative that m < 15. Because of our alternative, we calculated the probability that  < 13.44, which is called a left-tailed test (we calculate the area under the density curve to the left of our observed value of x-bar). We call the probability we calculated, namely P( < 13.44) the p-value of our test. If our original assumption is true, then values of  like ours (or lower), will occur roughly once in every four trials (although this probability would be larger if m were closer to 13.44). This means that we have very little, or no evidence that our original assumption is false (our value is quite typical if we assume that m = 15). Based on this, we decide to leave our $1000 invested for another 3-month period.

After 6 months in the investment scheme, our monthly returns have averaged $12.91 (note, n = 6 now). Again, our returns are well below the stated value of $15 per month, but a 6-month average of $12.91 might be just due to the usual variation based on chance.

  1. If X is normally distributed with a mean of 15 and a standard deviation of 4, how likely is it that an average of 6 values of X has  < 12.91?

  2.  

     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

  3. Your friend is a good one, and you'd like to be quite sure that m < 15, before you pull your money out of the investment. The p-value you just calculated isn't that small. It says that there is some evidence against the assumption that m = 15, but not a whole lot (x-bar values like ours, or worse, don't happen all that often, but they are not that rare either). Suppose you wanted to be 95% confident that m < 15, before you pulled your money out of the investment. This translates to saying that you will pull your money out, if your x-bar value is in the lowest 5% of all x-bar values (equivalently if your p-value is less than 0.05). You want to find the value of x-bar that has 5% of all values to its left (or 95% to the right). That is, you are looking for the number C, so that P( < C) = 0.05. Since all normal (and T-distributions as well) are symmetric, the point you are looking for is the left endpoint of a 90% confidence interval for x-bar (5% in each tail) centered at 15. The value -Z.05 = -1.645 is a key, since it has 5% of all Z-values to its left. Find this value of C (by solving (C - 15) / (s / sqrt(n) ) = -1.645 for C).

  4.  

     
     
     
     
     
     
     
     
     
     

    Errors of type I and II
    Your test can now be restated completely in terms of x-bar values. If  < C, then you pull your money out, rejecting the assumption that m = 15 (and you are 95% confident that your decision is the correct one), and if  > C, you leave your money in, because you are less than 95% confident that m < 15, and your friend is a good one. In this form, there are two types of errors that our test may commit. First off, it could be that m = 15, yet we get an  value less than C. In this case, we reach the conclusion that m < 15 (based on our test) and withdrawal our money, but actually m = 15. Put simply, the original assumption is true, but our test said it was false. This is called a type I error. The probability of making a type I error is easy to calculate, it is P( < C), if we assume that X is N(15, 4).
     

  5. Find the probability of making a type I error (look in the previous paragraph that asks you to calculate C).

  6.  

     
     
     
     
     
     
     
     
     
     
     

    Another type of error occurs when our original assumption is false (so m < 15), but our test says that it is true. This is called a type II error, and probabilities associated with type II errors vary, depending on just how false our original assumption is. Since our test says that the original assumption is true, we must have  > C. However, to calculate P( > C), we need to know what the true value of m is. Here is a sample. Suppose that m = 11 and hence X is N(11, 4). We can calculate P( > C) by standardizing. ( C - 11) / (4 / sqrt(6) ) = 0.80. From the table of normal probabilities, we see that when m = 11, the probability of a type II error is 0.2119. This means that when monthly returns average $11, our test will decide that a six-month average return is reasonable (i.e. the test says that m = 15 is reasonable) about 21% of the time.
     

  7. What is the probability of a type II error if m = $10?

  8.  

     
     
     
     
     
     
     
     
     
     

    OK, the typical variation in 6-month average returns (assuming that m = 15 is true) are enough to account for our below average value of x-bar. We decide to leave our $1000 invested for two more 3-month periods (a total of 12 months in all now). However, after 12 months, our average monthly return is again low. Like before, we want to be 95% sure that m < 15 before we withdrawal our investment.
     

  9. Find the cut-off value C (for a 12-month average return) so that P( < C) = 0.05 (assuming that m = 15).

  10.  

     
     
     
     
     
     
     
     

    We decide to leave our $1000 invested if  > C, and withdrawal our $1000 if  < C.
     

  11. What is the probability of a type I error in this case?

  12.  

     
     
     
     
     
     

  13. If m = 11, what is the probability of a type II error?