NAMES:                                                                             :
Association and Correlation Activity
Introductory Statistics, Fall 2000 Tom Linton
Work in groups of 2 or 3 and turn in one paper per group.
The goal of this activity is to explore notions related to correlation and learn how to calculate correlations with the TI-83.
Recall that a positive association exists when values of the response variable (y) increase as values of the explanatory (x) increase. On a scatterplot, the data will have a tendency to flow from the lower left corner of the plot to the upper right corner.
  1. In your own words describe what it means for data pairs (x, y) to have a negative association.

  2.  

     
     
     

We want to describe these linear associations with a number called the correlation. Some data pairs have scatterplots that are much closer to a line than other scatterplots. We want our correlation value to quickly indicate how well a straight line describes our data (is the scatterplot almost a straight line, close to a straight line, or far from a straight line). In addition, this numerical description of association should also indicate whether the data are positively associated or negatively associated. The correlation, known as r on the TI-83's, is a number that is always between -1 and 1. Negative correlation values indicate negative association, while positive correlation values indicate positive associations. Correlations near 1 indicate almost perfectly linear data with a positive slope. Correlations near -1 indicate almost perfectly linear data with a negative slope. Correlations near zero indicate scatterplots that do not look much like straight lines. You cannot accurately predict the correlation from just a scatterplot, but you can make statements like "the correlation is slightly positive" or "the correlation is definitely negative, but not too close to -1".
Each of the scatterplots below shows a typical scatterplot for the correlation range on the left, and displays the actual value of the correlation r on the left. These plots will serve as a guide to what scatterplots with given correlations look like. For the moment, simply look at the pictures. The next question will ask you to verbally describe what you see in each plot.
     
    Close to 1 (0.8 to 1)
    r = 0.98
    Medium positive (0.3 to 0.7)
    r = 0.68
    Slightly positive (0.1 to 0.3)
    r = 0.25
    Near zero (-0.1 to 0.1)
    r = -0.02
    Slightly negative (-0.3 to -0.1)
    r = -0.3
    Medium negative (-0.7 to -0.3)
    r = -0.7
    Near -1 (-1 to -0.8)
    r = -0.98
  1. For each correlation range above (near 1, medium positive etc.), look at the corresponding scatterplot and use your own words to describe the scatterplot. You should write down whatever you need, so that later, when viewing a different scatterplot, you'll be able to decide if the correlation is in that range or not. Your description will serve as your definition of what a scatterplot with "correlation near 1" or "correlation that is between 0.3 and 0.7" etc. looks like. Write your descriptions on the left side of the images above.

  2.  

     

  3. Below are brief descriptions of an explanatory variable and a response variable, for each pair, guess as to whether the association would be positive, negative or near zero, and then explain your guess. If you cannot guess, just guess near zero.

  4.  
    explanatory
    variable
    response
    variable
    guessed association
    and explanation
    Length of hair
    in inches
    Cost of last haircut
    in dollars
     
    Number of hours
    spent training
    Errors made
    by employees
     
    Years of service
    at ABC
    Width of office at
    ABC in feet
     
    Gross weight
    of vehicle in pounds
    Time (seconds) to
    travel 1 / 4 mile
     
    Cost of 
    last haircut
    Height
    in inches
     
    Cholesterol
    level
    Daily
    calcium intake
     
    Total cost to
    make a movie
    Total income 
    from a movie
     
  1. Shown below are scatterplots of the above data pairs. On each plot, draw in a straight line that you think best describes the scatterplot. Pay careful attention to whether your line increases or decreases. If you don't know where to draw the line, draw it "down the middle".


  1. By looking back at the earlier examples and your descriptions of various correlation ranges, guess a value for the correlation of each scatterplot above (for now, you can ignore the third column of this table).
Plot description Guessed 
correlation
Actual 
correlation
Length of hair in inches vs Cost of last haircut in dollars    
Number of hours spent training vs Errors made by employees    
Years of service at ABC vs Width of ABC office in feet    
Gross weight of vehicle vs Time to travel 1 / 4 mile    
Cost of last haircut vs Height in inches    
Cholesterol level vs Daily calcium intake    
Total cost to make a movie vs Total income from a movie    
  1. To calculate correlations on the TI-83, there is a one time setup required. From your Home-screen (press [2nd][QUIT] to get to your home screen if you are not already there), press [2nd][CATALOG] (near the number 0 key) and then [D] (letters are in Green above and right of the keys). Now scroll down with the down-arrow button and press [ENTER] when you get to the DiagnosticsOn line. This pastes the diagnostics on command to your home-screen. Press [ENTER] to execute that command. From this point on, your calculator will display the correlation r and sometimes r2 (the square of the correlation) whenever you ask the calculator to perform a linear regression (see the third bullet below for details on how).

  2. To calculate the correlation for paired data:
  3. Shown below are the actual data values for the scatterplots above. As a class (each group can do one or two and you can combine the answers), calculate the correlations for each plot and record them in the third column of the table above (where you guessed the correlation values). At this point, you do not need the equations of these lines, so ignore them.

  4.  
    Length 
    of hair
    (inches)
    Cost of 
    haircut
    (dollars)
    4.0 7
    4.0 10
    8.0 23
    8.0 25
    10.0 13
    10.0 15
    12.0 0
    12.0 20
    12.0 13
    14.0 22
    14.0 14
    14.0 40
    18.0 15
    hours 
    training
    errors
    1 6
    4 3
    6 2
    8 1
    2 5
    3 4
    1 7
    Haircut
    cost
    (dollars)
    Height
    (inches)
    13 62
    0 71
    15 64
    10 74
    7 67
    9 78
    13 66
    15 71
    25 68
    10 67
    7 73
    8 70
    Years
    of
    service
    Width
    of office
    (feet)
    3 4
    16 40
    7 16
    4 9
    15 38
    7 16
    8 17
    5 10
    Cholesterol
    (mmol / L)
    Calcium
    (mg / day)
    5.99 814
    6.46 323
    6.3 273
    6.77 519
    5.5 379
    5.68 547
    6.25 400
    6.15 386
    6.38 189
    6.15 680
    5.71 365
    6.54 926
    6.51 252
    Gross
    weight
    (pounds)
    1/4 mile
    time
    (seconds)
    2193 18
    2551 16
    2511 16
    2720 16
    3093 18
    3220 11
    3407 16
    3713 10
    3955 14
    4158 10
    Movies
    Cost
    (millions)
    Income
    (millions)
    55
    150.5
    42
    123
    17
    68
    30
    93
    43
    16
    26
    5
    19
    10
    35
    35
    22
    20
    13
    15







     

  5. Let X be your guessed correlation values in the table above, and Y be the actual correlation values reported by the calculator.
    1. Would you guess that the association here is positive or negative? Explain.

    2.  

       
       
       
       
       
       

    3. Do you think your guesses and the actual data will show a strong or weak relationship? Explain.

    4.  

       
       
       
       
       
       

    5. Sketch a scatterplot of the (X,Y) pairs and the actual "best-fit" line as follows. Place the X values in L1, the Y values in L2 and execute (from your home-screen) the command LinReg(a+bx) Y1. Record the correlation below. Now, leave the formula for Y1 turned on in the [Y=] screen (if you do nothing, it will be turned on) and set up stat-plot 1 to display a scatterplot of L1 and L2, using boxes for the data points. Press [ZOOM][9:ZoomStat].

    6. Correlation:

      Displayed Plot:


 
 
 
 
 
 

The last problem should show a strongly linear scatterplot with the line of best-fit superimposed over the data. The closer your line comes to going through the data, the better you understand the values of correlation!