ST 352

Simple Linear Regression

The Animal Gestation problem

In this example, we’ll walk through a complete simple linear regression analysis.  This example will involve doing an analysis after a transformation.

The following data lists the average gestation period (in days) and longevity (average life expectancy in years) for a sample of animals, as reported in The 1993 World Almanac and Book of Facts.

animal               gestation           longevity                        animal               gestation           longevity

baboon              187                   20                                 guinea pig           68                     4

black bear         219                   18                                 hippopotamus    238                   25

grizzly bear       225                   25                                 horse                330                   20

polar bear          240                   20                                 kangaroo             42                     7

beaver               122                     5                                leopard                98                   12

buffalo               278                   15                                 lion                   100                   15

camel               406                   12                                 monkey             164                   15

chimpanzee       231                   12                                 moose              240                   12

cat                      63                   20                                 mouse                21                     3

chipmunk            31                     6                                opossum             15                     1

cow                  284                   15                                 pig                    112                   10

deer                  201                     8                                puma                  90                   12

dog                     61                   12                                 rabbit                  31                     5

donkey              365                   12                                 rhinoceros         450                   15

elephant            645                   40                                 sea lion             350                   12

elk                    250                   15                                 sheep               154                   12

fox                      52                     7                                squirrel                44                   10

giraffe                425                   10                                 tiger                  105                   16

goat                  151                     8                                wolf                     63                     5

gorilla                257                   20                                 zebra                365                   15

1.         What is the response and what is the explanatory variable?

Step 1: Determine if a linear relationship exists between longevity and gestation and identify any possible

outliers:

2.         Is the relationship between longevity and gestation linear?

3.         Are there any outliers?  If so, identify the animal.

Step 2:  Determine if the assumptions of the model are met.

4.         Which assumption(s) seem to be violated?  Why?

5.         What should be done?

Steps 3 and 4:   A log transformation of the response was done.  After doing the transformation, we go back to Step 1:

Determine if a linear relationship exists between longevity and log(gestation).  (Note: we do not have to go back and consider outliers again – we already did that!)

6.         Does the relationship between longevity and log(gestation) appear to be fairly linear?  How is(are) the outlier(s) influencing the linearity of this relationship?

Step 2:Determine if the assumptions of the model are met:

7.         Do the assumptions of the model appear to be met?  Are they better met than on the original scale?

Steps 5 & 6:      With the best-fitting model, determine if the explanatory variable helps to predict the response.

Here is the Plot of the Fitted Model (scatterplot with the least-squares regression line drawn on it)

Use the STATGRAPHICS output below to answer the following questions:

Simple Regression - log(gestation) vs. longevity

Regression Analysis - Linear model: Y = a + b*X

-----------------------------------------------------------------------------

Dependent variable: log(gestation)

Independent variable: longevity

-----------------------------------------------------------------------------

Standard          T

Parameter       Estimate         Error       Statistic        P-Value

-----------------------------------------------------------------------------

Intercept         3.8096       0.227308        16.7596         0.0000

Slope          0.0855573      0.0151843

-----------------------------------------------------------------------------

Analysis of Variance

-----------------------------------------------------------------------------

Source             Sum of Squares     Df  Mean Square    F-Ratio      P-Value

-----------------------------------------------------------------------------

Model                     14.9849      1      14.9849      31.75       0.0000

Residual                  17.9354     38     0.471985

-----------------------------------------------------------------------------

Total (Corr.)             32.9203     39

Correlation Coefficient = 0.674675

R-squared =

R-squared (adjusted for d.f.) = 44.0849 percent

Standard Error of Est. = 0.687012

Mean absolute error = 0.560136

Durbin-Watson statistic = 1.93505 (P=0.4144)

Lag 1 residual autocorrelation = 0.0119836

8.         Does longevity (life expectancy) of animals help explain gestation period for these animals?  Write the null and alternative hypotheses, calculate the appropriate test-statistic (with degrees of freedom), find the p-value, and write a sentence answering the question.

9.         Write the regression equation in the context of this problem.

10.        Predict gestation for an animal with a life expectancy of 17 years.

11.        What percent of the variation in log(gestation) is explained by the regression line?

Notes:

1)         When doing a log transformation of the response variable, the interpretation of the slope (and y-intercept) becomes a bit more difficult.  For this problem, the interpretation is as follows:  a one-year increase in life expectancy of an animal is associated with a multiplicative change in the median gestation period of e.0856 (or 1.09).  In other words, the median gestation period for a life expectancy of 17 years is about 1.09 times longer than for a life expectancy of 16 years.

2)         When doing a log transformation, a confidence interval for the slope can be started in the same way as if there was no transformation done, but it is finished in a slightly different way:

95% confidence interval for :  .0856 (2.042)(.0152) = (.0546 , .1166)

to finish: (e.0546 , e.1166) = (1.05 , 1.12)

The interpretation:  we are 95% confident that the median gestation period will be between 1.05 to 1.12 times longer for every increase of one year in life expectancy.

3)         The standard error of the estimate (estimate of the residuals) is also on the log scale.  We won’t try to put it back on the original scale since on the original scale (original data), since the assumption of constant variation was violated.