Name(s):                                                                 :
Poisson Distributions Activity
Math 341, Central College, Fall 1999

Background

Poisson is a French word meaning fish! Random variables with a Poisson distribution usually have nothing to do with fish (but they will today), rather, these distributions are named after a famous French mathematician Siméon Denis Poisson. In 1837 Poisson published an important article on probability where random variables with Poisson distributions were first introduced. Intuitively, Poisson random variables correspond to discrete counts of rare events (like accidents, earthquakes, etc.) over fairly long intervals of time or space. They also can be described as limiting values of a binomial random variable, where p (the probability of a success) is small and n (the number of trials) is large enough to ensure that the expected value, n*p, is moderate (say from 1 to 7). We will introduce the Poisson distribution using this last approach.

Problem Description

Babe Winkleman is a superb fisherman. On one of Babe's favorite streches of productive trout water, he catches an average of 3 fish per hour. We start by attempting to model the number of fish that Babe catches in an hour with a binomial random variable, say Y. We need to designate the parameters n = number of trials and p = probability of a success (plus what a success designates and what we mean by a trial) in such a way that we capture the fact that Babe averages three fish per hour (i.e. so that E[Y] = 3). We will let a trial denote a certain amount of time, like t minutes, which Babe spends fishing (on his good stretch of river). The number of trials will then be the number k where k*t = 60 (the number of trials needed to cover one hour of fishing). A success corresponds to Babe catching a fish. We need p to lie between zero and one (so it corresponds to a probability) and we must select p so that Babe averages three fish per hour. We will look at a variety of times for our trial lengths, the corresponding appropriate values for p, and calculate probabilities associated with Babe catching ceratin numbers of fish in an hour (0 fish, 1 fish, 2 fish etc.).

Basic Facts

If Y is a binomial random variable with parameters n and p, what is E[Y]?
 
 

The last question mandates that we select p so that n*p = 3. We also need 0 < p < 1, which means our trials cannot be too long. The border case would give p = 1 and n = 3. If we had 3 trials in an hour, each trial would cover 20 minutes of fishing. The model with p = 1 and n = 3 (so trials correspond to Babe fishing for 20 minutes) is unrealistic. It states that every 20 minutes Babe catches exactly one fish. In all likelihood, some 20 minute periods will produce no fish, other 20 minute periods might produce more than one fish. For this reason, we must select trials to be less than 20 minutes in length. It is also bad if the length of a trial (the amount of time Babe fishes) is not a factor of 60 (minutes). The number of trials must be an integer, so we cannot take a trial to be 17 minutes of fishing, or an hour would correspond to n = 60 / 17 = 3.5294 (or so) trials. With these things in mind, let's start by defining a trial to be 10 minutes of Babe fishing. This gives n = 6 trials, since 6 ten-minute-periods corresponds to one hour of fishing.

If n = 6 and n*p = 3, what value must p have?
 
 
 

Let Y denote the number of fish which Babe catches in one hour of fishing. Our first model will assume that Y is a binomial random variable with parameters n = 6 and p being the value you just calculated. Fill in the value of p and the probabilites asked for in the table below.
 

Model 1, n = 6, p = 
y 0 1 2 3 4
p(y)          

Rare events correspond to very small values of p (much smaller than p = 0.5). Since n*p is fixed at 3, if we decrease the value of p, we must increase the value of n, which in turn means shortening the length of our trials. Let model 2 correspond to trials being 3 minutes of fishing. Calculate the values of n and p for this model and then fill in their values and the probabilities in the table below.
 

Model 2, n =     , p = 
y 0 1 2 3 4 5
p(y)            

Let model 3 correspond to trials which are 1 minute in length. Calculate the values of n and p for this model and then fill in their values and the probabilities in the table below.
 

Model 3, n =     , p = 
y 0 1 2 3 4 5
p(y)            

Finally, since it takes at least 15 seconds to reel in a hooked fish, if we let model 4 correspond to trials which are 15 seconds in length, then each trial results (realistically) in either one fish or no fish (a success or a failure). This makes a binomial random variable to describe the number of fish caught in an hour quite realistic. Calculate the values of n and p for this model and then fill in their values and the probabilities in the table below.
 

Model 4, n =     , p = 
y 0 1 2 3 4 5
p(y)            

Calculating these last probabilities is difficult, even with a calculator. Raising numbers to powers close to 240 is begging for roundoff error problems. Nonetheless, your calculators should be fairly accurate for these computations. Just for fun, calculate the values of the (much easier to evaluate) function f(y) = e^(-3)*3^y / (y!) and fill in the values in the table below.
 
 

Another Function
y 0 1 2 3 4 5
f(y)            

Are the values in the last table close to the probabilities of model 4?
 
 
 

The values of f(y) and p(y) should be hauntingly close; too close to be a coincidence. Notice first that the expected value, 3, appears twice in f(y). The formula for f(y) contains the constant multiplier e^(-3) and each value of f(y) is this constant times 3 to the y power over y factorial. Recall that the sum of 3^n / n!, from n = 0 to infinity is e^3. This fact guarantees that values of f(y), from y = 0 to infinity sum to one. Coupled with the fact that 0 < f(y) < 1 for all y >= 0, we see that f(y) is a probability (density) function for a discrete random variable. Now we will compare f(y) and the probability density function for a binomial random variable Y with N large and N*p = 3, to see why they are so similar. As we worked through the models for Babe's fishing experience, we kept increasing the value of N and selecting p so that N*p = 3. This is like letting N go to infinity and setting p = 3 / N. The binomial density function with parameters N (think of N as huge) and p = 3 / N is

p(y) = (N nCr y)*(3 / N)^y*[1 - 3 / N]^(N - y).
For y = 2, show that the density above can be written as [3^2 / 2!]*[(N*(N - 1)*(N - 3)^(N - 2)] / (N^N).
 
 
 
 
 
 
 
 
 
 

The first part (3^2 / 2!) appears in f(2). The bold part does not. Let N = 500 and compare the value of the bold part to e^(-3).
 
 
 
 

Assuming N is huge and y = 2, why is the bold part about the same as [(N - 3) / N ]^N?
 
 
 
 
 
 
 

If we set y = 4 instead of y = 2, we can rewrite p(y) as 3^4 / 4! *(leftover4). What is a formula for the leftover part?
 
 
 
 
 
 

Explain why the leftover part is close in value to [(N - 3) / N]^N again.
 
 
 
 
 
 
 

The quantity [(N - 3) / N]^N can be rewritten as [1 - 3 / N]^N. Calculate this value for N = 600, 700 and 1000 and compare it to e^(-3).
 
 
 
 
 
 

What do you suppose is the limit as N goes to infinity of [1 - 3 / N]^N?