NAMES:                                                                       :
Association Activity Introductory Statistics, Fall 2006, Tom Linton 
The goal of this activity is to explore notions related to the relationships between two variables, collected from the same individual. We will look at scatterplots, association (both its direction and strength), and introduce the notions of explanatory and response variables.

Many interesting statistical relationships exist between pairs of variables. Did you know that shorter women are much more likely to have heart attacks than taller women? Doesn't it seems reasonable to assume that the age at which parents were married is related to the age at which their children wed? These are examples of relationships between two variables. In both cases, one of the variables seems to explain, or predict something about the other variable, while the response of this other variable is what we're interested in. In the first example, a women's height is used to explain the number of heart attacks she has. The number of heart attacks seems to respond to changes in height, and the frequency of heart attacks is what we're interested in learning about. In the second example, the age at which adults were married is being used to explain differences in the ages at which children wed. The "wedding age" of the children is the variable whose responses were most interested in.

A response variable measures an outcome of interest in a study or experiment. An explanatory variable explains, or influences changes in a response variable. For the examples above, frequency of heart attacks and the age at which children get married are the response variables, while height and the age at which parents wed are the explanatory variables. Most of the time it is straightforward to decide which variable is the response variable and which is the explanatory variable. Frequently we are trying to predict values of the response variable based on knowledge of the explanatory variable.
  1. Each situation below involves two variables. Decide which is the response variable and which is the explanatory variable. Sometimes, when neither variable stands out as explaining or influencing the other, and neither variable is obviously of higher interest than the other, both variables could play both roles. In this case, simply say "both could be either".
    1. The fuel efficiency of a vehicle (in gallons per mile) and its speed (in mph).



    2. The number of hours of studying and the score a student receives on a statistics exam.



    3. The age of the husband, and the age of the wife on their wedding day.



    4. The width (in feet) of an executives office and the number of years they have worked for a company.


We can display a relationship between two variables by making a scatterplot of the data. For each individual, we plot the pair
x = explanatory variable, y = response variable
look for the overall pattern, and any striking deviations from the overall pattern.


To do this on the TI-83, you simply:
  1. Use the statistical editor to enter the values of the explanatory variable (x) into L1 and the values of the response variable (y) into L2. Make sure that each individual's y-value is in the same row as their x-value.
  2. On the [STATPLOT] menu, select plot 1; turn it on and select the first icon in the top row in the [TYPE] field (the scatterplot icon, see below).
  3. Set Xlist to L1 and Ylist to L2.
  4. Select a type of Mark (boxes work well).
  5. Press [ZOOM] [9: ZoomStat].
    The stat plot window
  1. Here is some data relating x = the cost (in millions of dollars) to make a movie, and y = the total income (in millions of dollars) of that movie. Make a scatterplot of this data and copy it below, being sure to label the axes, and provide a decent indication of the scale on each axis. You may need to use the [WINDOW] menu to do this.

Cost to Produce Movie (millions of dollars)
55
42
17
30
43
19
22
13
26
35
income of movie (millions of dollars)
150
123
68
93
16
10
20
15
5
35














You should notice that, in general, as the value of the explanatory variable (x = cost) gets bigger, so does the value of the response variable (y = income).
  1. Is it always the case that if a movie cost more to produce, then it also had a higher income? If not, find two data points (x1, y1) and (x2, y2) where it cost more to make movie 1 than it did to make movie 2 (so x1 > x2), but movie 2 had a higher income than movie 1 (so y2 > y1).





    The concept of association between variables is an example of a statistical tendency. Not every movie that costs more to produce, ends up with a higher income, but movies which are expensive to make tend to have higher incomes. We say that a positive association exists when values of the response variable (y) tend to increase as values of the explanatory variable (x) increase. In general this means that large values of x are paired with large values of y, and small values of x are paired with small values of y. On a scatterplot, the data will have a tendency to flow from the lower left corner of the plot to the upper right corner.
    We say that a negative association exists when values of the response variable (y) tend to decrease as values of the explanatory variable (x) increase. In general this means that large values of x are paired with small values of y, and small values of x are paired with large values of y. On a scatterplot, the data will have a tendency to flow from the upper left corner of the plot to the lower right corner.
    Sometimes there is no association, or a zero association, meaning that all sizes of the explanatory variable tend to occur with all sizes of values of the response variable. On a scatterplot, there is no general pattern, the data is simply "all over the place".
    Most of the time, common sense can be used to guess whether an association is positive, negative, or doesn't exist.
  1. Below are brief descriptions of an explanatory variable and a response variable. For each pair, using only the verbal description given, guess as to whether the association would be positive, negative or near zero, and then explain your guess. If you cannot guess, just guess near zero.

  2.  
    explanatory variable response variable guessed association and explanation
    Length of hair 
    in inches
    Cost of last haircut 
    in dollars
     
    Number of hours 
    spent training
    Errors made 
    by employees
     
    Size of a fastfood
    sandwich in ounces
    calories in the
    fastfood sandwich
     
    Cost (in dollars) of  a
    brand of wrist watch
    number complaints
    for watch brand
     
    Cholesterol 
    level
    Daily 
    calcium intake
     
    Airfare Des Moines
    to a city
    Distance Des Moines
    to the city
     
  1. Associations come in a variety of strengths. Strong associations are ones where the pattern is followed by nearly all of the data points (you can predict quite accurately the value of y, knowing only the value of x). Weak associations have many pairs that don't fit the pattern (each x-value has a range of possible y-values that it may be paired with). Below are six scatterplots, for which you are to determine the direction (positive or negative) of the association, and its strength (strong, moderate, or weak). Use each scatterplot once to fill in the table below.




Association
weak
moderate
strong
Positive
 
 
 
Negative