Name(s)                                                               :
The Staistics of Proportions
Introduction to Statistics, Fall 2006, Tom Linton

CLASS DATA LINK

In this activity, we look at some of the basic properties related to the statistics of proportions. We will gather several samples from the population of “all M&Ms”, calculate sample proportions, and compare our sampling results to the theoretical predictions about proportions and their probabilities. You should work in groups of two, and hand in one activity per group.

  1. Get a cup of M&Ms and without paying attention to the colors of the M&Ms, “randomly” divide your M&Ms into piles of size n = 40. You may eat your leftover M&Ms (those that do not make it into piles of size 40) at this time.

  2. In each pile of 40 M&Ms, let X1 denote the number of BLUE M&Ms and X2 denote the number of ORANGE M&Ms. Since we are interested in proportions, let p1 hat  and p2 hat  (rounded to 2 decimal places). Record these values in the table below and then add each of your “p-hat” values to the stem plots on the board and to the Excel worksheet. You may now eat all of your M&Ms (even those in your samples of size 40).


    Sample 1
    Sample 2
    Sample 3
    BLUE COUNTS X1



    BLUE  PROPORTIONS



    ORANGE COUNTS X2



    ORANGE PROPORTIONS




  3. Once all groups have added their p-hat values to the stem plots on the board, answer the following questions:
    1. How many p-hat values are there for blue M&Ms?




    2. What is the general shape of the collection of the p-hat values from blue M&Ms? What appears to be the center of this collection of proportions?







    3. How many p-hat values are there for orange M&Ms?




    4. What is the general shape of the collection of the p-hat values from orange M&Ms? What appears to be the center of this collection of proportions?









  4. We are in a special situation, having many samples to estimate both the proportion of blue and the proportion of orange M&Ms (normally you have a single sample upon which you must base your estimate). With all of this “extra information” at your disposal, would you guess the proportions of blue M&Ms is larger than the proportion of orange, equal to the proportion of orange, or less than the proportion of orange M&Ms?








  5. Enter all of the p-hat values for the blue M&Ms into L1 and calculate their mean and standard deviation (as usual, use Sx, not σx). Record the values below.






  6. Statistical theory states that the mean of the sampling distribution of  “all p-hat values” (for blue M&Ms with samples of large enough size) should be p, the population proportion of blue M&Ms. We don’t have the collection of “all p-hat values”, but our stem plot of many samples should do a good job of approximating this distribution. Mars Candy (the manufacturers of M&Ms) claims that p = 0.24. Does your mean from part (5) seem to agree with statistical theory (and the claim of Mars Candy)?














  7. Statistical theory also states that the standard deviation of the sampling distribution of “all p-hat values" (for samples of size 40, for blue M&Ms) should be formula, which in our case is std dev, or about 7%. How does your standard deviation from part (5) compare to this theoretical result?








  8. Another important claim of statistical theory is that the sampling distribution of  “all p-hat values" (for samples of size 40, for blue M&Ms) should be aprroximately Normal (with mean p and standard deviation formula ). Let's investigate this claim with some "counting" calculations. Because our supply of M&Ms may not be a truly representative sample from the collection of all M&Ms (all though it should be), we’ll use the mean and standard deviation from part (5) for these calculations (instaed of the theoretical predictions of mean = .24 and standard deviation = 0.06753). I’ll refer to the mean calculated in part (5) as m and the standard deviation from part (5) as s (because m and s are easier to type than μ and σ).
    1. Using an invNorm command, find the value (call it p1) in the N(m,s) distribution that has 30% of the data to its left. Calculate the percentage (just count and divide by the total number of class p-hat values) of the classes’ p-hat values from blue M&Ms that are less than p1. Is this percentage close to 30%?








    2. Now calculate (by counting and dividing) the percentage of the classes' p-hat values (for blue M&Ms) that are between 0.16 and 0.30. Compare this to the result of the command normalcdf(0.16,0.30,m,s). Are the results similar?