(Data stories and Excel data files)

Chapter 2. Two-Sample Problems

Speed Limits and Traffic Fatalities.

Chapter 3. A Closer Look at Assumptions

Chapter 4. Alternatives to the t-tools

Therapeutic marijuana

Chapter 7. Simple Linear Regression

Brain activity in violin and string players

Chapter 8. A Closer Look at Assumptions for Simple Linear Regression

Respiratory Rates for Children

Butterfly ballots in Palm Beach County, Florida

Chapter 9. Multiple Regression

Winning speeds at the Kentucky Derby

Chapter 10. Inferential Tools for Multiple Regression

Chapter 11. Model Checking and Refinement

Natal dispersal distances of mammals

Chapter 12. Strategies for Variable Selection

Chapter 13. The Analysis of Variance for Two-Way Classifications

Gender differences in performance on mathematics achievement test scores

Chapter 14. Multifactor Studies Without Replication

Tennessee corn yield trials

Chapter 15. Adjustment for Serial Correlation

S & P 500

El Nino and Hurricanes

Trends in firearm and motor vehicle deaths in the U.S

Chapter 16. Repeated Measures

Chapter 18. Comparisons of Proportions or Odds

Chapter 19. More Tools for Tables of Counts

Tire-related fatal accidents and ford sports utility vehicles

Chapter 20. Logistic Regression for Binary Response Variables

Fatal car accidents involving tire failure on Ford Explorers

Chapter 21. Logistic Regression for Binomial Counts

Spock conspiracy
trial

Effect of Stress During Conception on Odds of a Male
Birth

HIV and circumcision

Meta-analysis of breast cancer and lactation studies

Chapter 22. Log-Linear Regression for Poisson Counts

El Nino
and Hurricanes

Body size and reproductive success in a population of male
bullfrogs

2.23. **Speed Limits and Traffic Fatalities**.
The National Highway System Designation Act was signed into law in the United
States on November 28, 1995. Among other things, the act abolished the federal
mandate of 55 mile per hour maximum speed limits on roads in the U.S. and
permitted states to establish their own limits. Of the 50 states (plus the
District of Columbia), 32 increased their speed limits either at the beginning
of 1996 or sometime during 1996. Shown below are the percentage changes in
interstate highway traffic fatalities from 1995 to 1996. What evidence is there
that the percentage change was greater in states that increased their speed
limits? How much of a difference is there? Write a brief statistical report
detailing the answers to these questions. (Data from “Report to Congress: The
Effect of Increased Speed Limits in the Post-NMSL Era,” National Highway
Traffic Safety Administration, February, 1998; available in the reports library
at http://www-fars.nhtsa.dot.gov/.)

Data: ex0223.xls

3.22. **Umpire life lengths**. When
an umpire collapsed and died soon after the beginning of the 1996 U.S. major
league baseball season, there was speculation that the stress associated with
that job poses a health risk. Researchers subsequently collected historical and
current data on umpires to investigate their life expectancies (Cohen, et al.,
2000, “Life expectancy of major league baseball umpires,” The Physician and
Sportsmedicine, 28, 5, 83-89). From an original list of 441 umpires, data were
found for 227 who had died or had retired and were still living. Of these,
dates of birth and death were available for 195. Shown below are several rows
of a generated data set based on the study.

a) Use a t-test and confidence interval (possibly after transformation) to investigate whether umpires had smaller observed life lengths than expected, using only those with known life lengths (i.e. for whom Censored = 0)

b) What are the potential consequences of ignoring those 214 of the 441 umpires on the original list for whom data was unavailable?

c) What are the potential consequences of ignoring those 32 umpires in the data set who had not yet died at the time of the study? (Note: appropriate procedures are available—and are appropriate—for answering the question of interest using the censored and uncensored life times. See, for example, the survival analysis techniques in Anderson, S. et al. 1980, Statistical Methods for Comparative Studies, Wiley.)

Data: ex0322.xls

4.32. **Therapeutic marijuana**.
Nausea and vomiting are frequent side effects of cancer chemotherapy, which can
contribute to the decreased ability of patients to undergo long-term
chemotherapy schedules. To investigate the capacity of marijuana to reduce
these side effects, researchers performed a double-blind, randomized,
cross-over trial. Fifteen cancer patients on chemotherapy schedules were
randomly assigned to receive either a marijuana treatment or a placebo
treatment after their first three chemotherapy sessions, and then “crossed
over” to the opposite treatment after their next three sessions. The
treatments, which involved both cigarettes and pills, were made to appear the
same whether in active or placebo form. Shown below is the number of vomiting
and retching episodes for the 15 subjects. Does marijuana treatment reduce the
frequency of episodes? By how much. Analyze the data and write a statistical
summary of conclusions. (Data from Chang, A. E., et al.,
“Delta-9-Tetrahydrocannibinol as an Antiemetic in Cancer Patients Receiving
High-Dose Methotrexate,” The Science of Medical Marijuana, Dec. 1979. The order
of the treatments is unavailable.)

Data: ex0432.xls

7.26.
**Decline in Male Births**. Display 7.16 shows the
proportion of male births in Denmark, The Netherlands, Canada, and the United
States for a number of years. (Data read from graphs in Davis, et al., 1998,
“Reduced ratio of male to female births in several industrial countries,”
Journal of the American Medical Association, 279, 1018-1023.) Notice that the
proportions for Canada and the United States are only provided for the years
1970 to 1990, while Denmark and The Netherlands have data listed for 1950 to
1994. Display 7.17 shows the results of least squares fitting to the simple
linear regression of proportion of males on year, separately for each country,
with standard errors of estimated coefficients in parentheses.

a) With a statistical computer package obtain the least squares fits to the four simple regressions, individually, to confirm the estimates and standard errors presented in Display 7.17.

b) Obtain the t-statistic for the test that the slopes of the regressions are zero, for each of the four countries. Is there evidence that the proportion of male births is truly declining?

c) Explain why the United States can have the largest of the four t-statistics (in absolute value) even though its slope is only the third largest (in absolute value).

d) Explain why the standard error of the estimated slope is smaller for the United States than for Canada, even though the sample size is the same.

c) Can you think of any reason why the standard deviations about the regression line might be different for the four countries? (Hint: the proportion of males is a kind of average, i.e. the average number of births that are male.)

Data: ex0726.xls

7.29
**Male Displays**. Black wheatears, *Oenanthe leucura*,
are small birds of Spain and Morocco. Males of the species demonstrate an
exaggerated sexual display by carrying many heavy stones to nesting cavities.
This 35-gram bird transports, on average, 3.1 kg. of stones per nesting season!
Different males carry somewhat different sized stones, prompting a study of
whether larger stones may be a signal of higher health status. M. Soler, et al.
[“Weight lifting and health status in the black wheatear,” 1999, Behavioral
Ecology 10(3):281-6] calculated the average stone mass (g) carried by each of
21 male black wheatears, along with T-cell response measurements reflecting
their immune systems’ strengths. The data in Display 7.16 were taken from their
Figure 1. Analyze the data and write a statistical report summarizing the
evidence supporting whether health, as measured by T-cell response, is
associated with stone mass; and quantifying the association.

Data: ex0729.xls

7.30.**
Brain activity in violin and string players****.** Studies over the
past two decades have shown that activity can effect the reorganization of the
human central nervous system. For example, it is known that the part of the
brain associated with activity of a finger or limb is taken over for other
purposes in individuals whose limb or finger has been lost. In one study,
psychologists used magnetic source imaging (MSI) to measure neuronal activity
in the brains of 9 string players (6 violinists, 2 cellists, and 1 guitarist)
and 6 controls who had never played a musical instrument, when the thumb and
fifth finger of the left hand were exposed to mild stimulation. The researchers
felt that stringed instrument players, who use the fingers of their left hand
extensively, might show different behavior in the brain—as a result of this
extensive physical activity—than individuals who did not play stringed
instruments. Shown below is a neuron activity index from the MSI and the years
that the individual had been playing a stringed instrument (zero for the
controls). (Data based on a graph in Elbert, T., et al., 1995, “Increased
cortical representation of the fingers of the left hand in string players,”
Science, 270, 13 October, 305-307.) Is the neuron activity different in the
stringed musicians and the controls? Is the amount of activity associated with
the number of years the individual has been playing the instrument?

Data: ex0730.xls

8.23.
**Respiratory Rates for Children**. A high
respiratory rate is a potential diagnostic indicator of respiratory infection
in children. To judge whether a respiratory rate is truly “high,” however, a
physician must have a clear picture of the distribution of normal respiratory
rates. To this end, Italian researchers measured the respiratory rates of 618
children between the ages of 15 days and 3 years. The display below shows a few
rows of the data set. Analyze the data and provide a statistical summary.
Include a useful plot or chart that a physician could use to assess a normal
range of respiratory rate for children of any age between 0 and 3. (Data read
from a graph in Rusconi, et al., 1994, “Reference values for respiratory rate
in the first 3 years of life,” Pediatrics, 94, 350-355.).

Data: ex0823.xls

8.24.
**Butterfly ballots in Palm Beach County, Florida**. The
U.S. presidential election of November 7, 2000 was one of the closest in
history. As returns were counted on election night it became clear that the
outcome in the state of Florida would determine the next president. At one
point in the evening, television networks projected that the state was carried
by the Democratic nominee, Al Gore, but a retraction of the projection followed
a few hours later. Then, early in the morning of November 8, the networks
projected that the Republican nominee, George W. Bush, had carried Florida and
won the presidency. Gore called Bush to concede. While on route to his
concession speech, though, the Florida count changed rapidly in his favor. The
networks once again reversed their projection, and Gore called Bush to retract
his concession. When the roughly six million Florida votes had been counted,
Bush was shown to be leading by only 1,738, and the narrow margin triggered an
automatic recount. The recount, completed in the evening of November 9, showed
Bush’s lead to be less than 500.

Meanwhile, angry Democratic voters in Palm Beach County complained that a confusing “butterfly” lay-out ballot caused them to accidentally vote for the Reform Party candidate Pat Buchanan instead of Gore. The ballot, as illustrated in Display 8.22, listed presidential candidates on both a left-hand and a right-hand page. Voters were to register their vote by punching the circle corresponding to their choice, from the column of circles between the pages. It was suspected that since Bush’s name was listed first on the left-hand page, Bush voters likely selected the first circle. Since Gore’s name was listed second on the left-hand side, many voters—who already knew who they wished to vote for—did not bother examining the right-hand side and consequently selected the second circle in the column; the one actually corresponding to Buchanan. Two pieces of evidence supported this claim: Buchanan had an unusually high percentage of the vote in that county, and an unusually large number of ballots (19,000) were discarded because voters had marked two circles (possibly by inadvertently voting for Buchanan and then trying to correct the mistake by then voting for Gore).

Display 8.23 shows the first
few rows of a data set containing the numbers of votes for Buchanan and Bush in
all 68 counties in Florida. What evidence is there in the scatterplot of
Display 8.24 that Buchanan received more votes than expected in Palm Beach
County? Analyze the data without Palm Beach County results to obtain an
equation for predicting Buchanan votes from Bush votes. Obtain a 95% prediction
interval for the number of Buchanan votes in Palm Beach from this
result—assuming the relationship is the same in this county as in the others.
If it is assumed that Buchanan’s actual count contains a number of votes
intended for Gore, what can be said about the likely size of this number from
the prediction interval. (Consider transformation.)

Data: ex0824.xls

9.18.
**Speed of Evolution**. How fast can evolution occur in
nature? Are evolutionary trajectories predictable or idiosyncratic? To answer
these questions, R.B. Huey et al. (“Rapid evolution of a geographic cline in
size in an introduced fly”, Science 287:308-9, 1990) studied the development of
a fly — Drosophila subobscura — that had accidentally been introduced from the
Old World into North America (NA) around 1980. In Europe (EU), characteristics
of the flies’ wings follow a “cline” — a steady change with latitude. One
decade after introduction, the NA population had spread throughout the
continent, but no such cline could be found. After two decades, Huey and his
team collected flies from 11 locations in western NA and native flies from 10
locations in EU at latitudes ranging from 35-55 degrees N. They maintained all
samples in uniform conditions through several generations to isolate genetic
differences from environmental differences. Then they measured about 20 adults
from each group. Display 9.19 shows average wing size in millimeters, on a
logarithmic scale, and average ratios of basal lengths to wing size.

a)
Construct a scatter plot of average wing size against latitude, in which the
four groups defined by continent and sex are coded differently. Do these
suggest that the wing sizes of the NA flies have evolved toward the same cline
as in EU?

b) Construct a multiple linear regression model with wing size as the response,
with latitude as a linear explanatory variable, and with indicator variables to
distinguish the sexes and continents. Construct the model in such a way that
one parameter measures the difference between the slopes of the wing size v.
latitude regressions of NA and EU for females, one measures the same difference
for males, one measures the difference between the intercepts of the
regressions of NA and UE for females, and one measures the same difference for
males.

Data: ex0918.xls

9.20. **Winning speeds at the Kentucky
Derby**. The Kentucky Derby is a 1.25-mile horse race held annually at
the Churchill Downs racetrack in Louisville, Kentucky. Shown below are some
sample rows of a data set containing the year of the race, the winning horse,
the condition of the track, and the average speed (in feet per second) of the
winner, for years 1896-2000. The track conditions have been grouped into three
categories: fast, good (which includes the official designations “good” and
“dusty”), and slow (which includes the designations “slow’, “heavy”, “muddy”,
and “sloppy”). Use a statistical computer program to fit a model for the mean
winning speed as a function of year and the track condition factor. The data
are from www.kentuckyderby.com.

Data: ex0920.xls

10.23. **Speed of Evolution**.
Refer back to Exercise 9.18. The authors of that study concluded that although
the wing size of North American flies was converging rapidly to the same cline
as exhibited by the European flies, the means by which the cline is achieved is
different in the North American population.

a) As evidence that the means of convergence is different, they concluded that there was a marked difference between the NA and the EU patterns of the basal length-to-wing size ratios versus latitude (in females). Fit a multiple linear regression, which allows for different slopes and different intercepts. In a single F-test, evaluate the evidence against there being a single straight line that describes the cline on both continents. If you conclude there is a difference, is the difference one of slope alone? of intercept alone? or of both?

b) Return to the basic question of whether the wing sizes in NA flies have established a cline similar to their EU ancestors. Using the model developed in Exercise 9.18, answer these questions: (i) Is there a non-zero slope to the cline of NA females? (ii) Is there a non-zero slope to the cline of NA males? (iii) Is there a difference between the clines of NA and EU females, and if so, what is its nature? and (iv) repeat (iii) for males?

10.24. **Speed of Evolution**.
(Refer again to Exercise 9.18 and also to Exercise 10.23.) Many software
systems allow the user to perform weighted regression, in which different
squared residuals from regression receive different weights in deciding which
set of parameter estimates provide the smallest sum of squared residuals. If
each individual response has an independent estimate of its likely error, the
weight given to each residual is usually taken to be the reciprocal of the
square of that likely error. The st.err. of wing sizes are standard errors of
the averages of around 2 individual (log) wing sizes. If your software allows
for weights, construct a weight variable as the inverse square of the standard
errors. Then repeat both parts of Exercise 10.23 using weighted regression. Do
the results differ? Why is this preferable to using each fly as a separate
case?

10.25. **Potato Yields**. Nitrogen and water are important factors
influencing potato production. One study of their roles was conducted at sites
in the St. John River Valley of New Brunswick. (Belanger, G., et al. 2000.
“Yield response of two potato cultivars to supplemental irrigation and N
fertilization in New Brunswick. Amer. J. of Potato Res. 77:11-21.) Nitrogen
fertilizer was applied at six different levels in combination with two water
conditions: irrigated or non-irrigated. This design was repeated at four
different sites in 1996, with the resulting yields depicted in Display 10.21.
Notice that the patterns of responses against nitrogen level are fit reasonably
well by quadratic curves.

Each quadratic requires 3 parameters, so a model that would allow for separate
quadratic curves for each site-by-irrigation combination would have 24
parameters. (a) Using indicator functions for sites and for irrigation,
construct a multiple linear regression model with 23 variables that will allow
for completely different quadratic curves. Interpret the parameters in this
model, if possible. (b) Describe how you would answer the following questions:
(i) Is there evidence that the manner in which the quadratic terms differ by
water condition changes from site to site (or is the difference the same at all
four sites)? (ii) If the quadratic term differences are the same at all sites,
is there strong evidence of a difference by water condition? (iii) If there is
no difference between quadratic terms by water or by site, is there evidence of
any quadratic term at all? (iv), (v), and (vi) repeat (i), (ii), and (iii) for
the linear terms, if there is no evidence of any quadratic terms. (c) Why are
the questions in (b) ordered as they are?

10.28. **El Nino and Hurricanes**. Shown below are the first
few rows of a data set with the numbers of Atlantic Basin tropical storms and
hurricanes for each year from 1950 to 1997. The variable storm index, is an
index of overall intensity of the hurricane season. (It is the average of
number of tropical storms, number of hurricanes, the number of days of tropical
storms, the number of days of hurricanes, the total number of intense
hurricanes, and the number of days they last—when each of these is expressed as
a percentage of the average value for that variable. A storm index score of
100, therefore, represents, essentially, an average hurricane year.) Also
listed are whether the year was a cold, warm, or neutral El Nino year; a
constructed numerical variable temperature that takes on the values -1, 0, and
1 according to whether the El Nino temperature is cold, neutral, or warm; and a
variable indicating whether West Africa was wet or dry that year. It is thought
that the warm phase of El Nino suppresses hurricanes while a cold phase
encourages them. It is also thought that wet years in West Africa often bring
more hurricanes. Analyze the data to describe the effect of El Nino on (a) the
number of tropical storms, (b) the number of hurricanes and (c) the NTC, after
accounting for the effects of West African wetness and for any time trends, if
appropriate. (These data were gathered by William Gray of Colorado State
University, and reported on the USA Today weather page: www.usatoday.com/weather/whurnum.htm)

Data: ex1028.xls

10.29. **Wage and race**.
Shown below are the first few rows of a data set from the 1988 March U.S.
Current Population Survey. The set contains weekly wages in 1987 (in 1992
dollars) for a sample of 25,632 males between the age of 18 and 70 who worked
full-time, along with their years of education, years of experience, an
indicator variable for whether they were black, an indicator variable for whether
they worked in a standard metropolitan statistical area (i.e. in or near a
city), and a code for the region in the US where they worked (northeast,
midwest, south, and west). Analyze the data and write a brief statistical
report to see whether and to what extent black males were paid less than
non-black males in the same region and with the same levels of education and
experience. Realize that the extent to which blacks were paid differently than
non-blacks may depend on region. (Suggestion: refrain from looking at
interactive effects, except for the one implied by the previous sentence.)
(These data were discussed in the paper, Bierens, H. J. and D. K. Ginther
(2000) “Integrated Conditional Moment Testing of Quantile Regression Models,”
to appear in a special issue of Empirical Economics on Economic Applications of
Quantile Regression; and made available at the web site http://econ.la.psu.edu/~hbierens/MEDIAN.HTM
associated with the software EasyReg.)

Data: ex1029.xls. WARNING: large data set.

11.24. **Natal dispersal distances of mammals**. Natal dispersal
distances are the distances that juvenile animals travel from their birthplace
to their adult home. An assessment of the factors affecting dispersal distances
is important for understanding population spread, recolonization, and gene
flow—which are central issues for conservation of many vertebrate species. For
example, an understanding of dispersal distances will help to identify which
species in a community are vulnerable to the loss of connectedness of habitat.
To further the understanding of determinants of natal dispersal distances,
researchers gathered data on body weight, diet type, and maximum natal
dispersal distance for various animals. Shown below are the first 6 of 64 rows
of data on mammals. (Data from Sutherland, G.D., et al., 2000, “Scaling of natal
dispersal distances in terrestrial birds and mammals,” Conservation Ecology
4(1): 16.) Analyze the data to describe the distribution of maximum dispersal
distance as a function of body mass and diet type. Write a summary of
statistical findings.

Data: ex1124.xls

11.xx. Acorn. The acorn data set in DASL is very nice as a problem with issues of influential observations. Students may need some guidance though.

12.22. **Bush-Gore ballot controversy**.
Review the Palm Beach Country ballot controversy description in Exercise 8.24.
To estimate how much of Pat Buchanan’s vote count might have been intended for
Al Gore in Palm Beach County, Florida, that exercise required the fitting of a
model for predicting Buchanan’s count from Bush’s count from all other counties
in Florida (excluding Palm Beach), followed by the comparison of Buchanan’s
actual count in Palm Beach to a prediction interval. One might suspect that the
prediction interval can be narrowed and the validity of the procedure
strengthened by incorporating other relevant predictor variables. Display 12.19
shows the first few rows of a data set containing the vote counts by county in
Florida for Buchanan and for four other presidential candidates in 2000, along
with the total vote counts in 2000, the presidential vote counts for three
presidential candidates in 1996, the vote count for Buchanan in his only other
campaign in Florida—the 1996 Republican primary, the registration in Buchanan’s
Reform Party, and the total registration in the county. Analyze the data and
write a statistical summary predicting the number of Buchanan votes that were
not intended for him. It would be appropriate to describe any unverifiable
assumptions used in applying the prediction equation for this purpose.
(Suggestion: find a model for predicting Buchanan’s 2000 vote from other
variables, excluding Palm Beach County, which is listed last in the data set.
Consider a transformation of all counts.)

Data: ex1222.xls

13.18. **El Nino and Hurricanes**.
Reconsider the El Nino and Hurricane data set from exercise 10.28 above. (a)
Regress the log of the storm index on West African wetness (treated as a
categorical factor with 2 levels) and El Nino temperature (treated as a
categorical factor with 3 levels); retain the sum of squared residuals and the
residual degrees of freedom. (b) Regress the log of the storm index on West
African wetness (treated as categorical with 2 levels), El Nino temperature
(treated as numerical), and the square of El Nino temperature. Retain the sum
of squared residuals and the residual degrees of freedom. (c) Explain why the
answers to a and b are the same. (d) Explain why a test that the coefficient of
the temperature-squared term is zero can be used to help decide whether to treat
temperature as numerical or categorical.

13.22. **Gender differences in performance
on mathematics achievement test scores**. Display 13.25 shows the first
few rows of a data set on 861 ACT Assessment Mathematics Usage Test scores from
1987. The test was given to a sample of high school seniors who met one of
three profiles of high school mathematics course work: a: Algebra I only, b:
two Algebra courses and Geometry, and c: two Algebra courses, Geometry,
Trigonometry, Advanced Mathematics, and Beginning Calculus. Analyze the data
and write a brief statistical report to determine whether male scores are
distributed differently than female scores, after accounting for coursework
profile, and whether the difference is the same for all profiles. (These data were
generated from summary statistics for one particular form of the test, as
reported in Doolittle, A. E., 1987, Gender differences in performance on
mathematics achievement items, ACT Research Report Series, 87-16.)

Data: ex1322.xls

14.17**. Tennessee corn yield trials**. Corn yield trials were
performed at four locations in Tennessee in 1999. Shown in Display 14.22
are the average yields, in bushels per acre, for 6 hybrids at each of the four
locations. Notice that at the Ames Plantation there were two trials, one
unirrigated and one irrigated. Do any of the hybrids’s have mean yields
that are higher than the others? Do yellow corn hybrids have means that
differ from the white corn hybrids? (Data from the University of Tennessee
Agricultural Experiment Station web site, http://web.utk.edu/~taescomm/research/corn1999.html).

**
**Data: ex1417.xls

15.11. **El Nino and Hurricanes**.
Reconsider the El Nino and Hurricane data set from exercise 10.28. Regress the
log of the storm index on temperature and the indicator variable for West
African wetness and retain the residuals. (a) Construct a lag plot of the
residuals, as in Display 15.5. (b) Construct a partial autocorrelation function
plot of the residuals. (c) Is there any evidence of autocorrelation? How many
lags?

15.14. **Trends
in firearm and motor vehicle deaths in the U.S**. Display 15.16 shows
the number of deaths due to firearms and the number due to motor vehicle
accidents in the United States between 1968 and 1993. Is there evidence of an
increasing or decreasing trend in firearm deaths over this period? What is the
rate of increase or decrease? Is there evidence of an increasing or decreasing
trend in motor vehicle deaths over this period? What is the rate of increase or
decrease? (The data were read from a Centers for Disease Control and Prevention
graph reported in The Oregonian, June 17, 1997.)

Data: ex1514.xls

15.18**. S & P 500**. The
Standard and Poors 500 stock index (S&P 500) is a benchmark of stock market
performance, based on the values of 400 industrial firms, 40 financial stocks,
40 utilities and 20 transportation stocks. Display 15.15 shows the value of a
$1 investment in 1871 at the end of each year from 1870 to 1999, according to
the S&P 500, assuming all dividends are reinvested. Describe the
distribution of the S&P value as a function of year.

Data: __ex1518.xls__

16.15. **Trends in SAT scores**. Display 16.17 shows a partial listing
of a data set with ratios of average Math to Verbal SAT scores in the 50 U.S.
states plus the District of Columbia for 1989 and 1996-1999. Is the mean of the
ratios different in 1999 than in 1989? Is there an increasing trend in the
ratios over the period from 1996 to 1999? Analyze the data and write a brief
statistical report of the findings.

Data: ex1617.xls

18.18. **Hale-Bopp
and handedness**. It is known that left-handed people tend to recall
orientations of human heads or figures differently than right-handed people. To
investigate whether there is a similar systematic difference in recollection of
inanimate object orientation, researchers quizzed University of Oxford
undergraduates on the orientation of the tail of the Hale-Bopp comet. The
students were shown eight photographic pictures with the comet in different
orientations (head of the comet facing left down, left level, left up, center
up, right up, right level, right down, or center down) six months after the
comet was visible in 1997. The students were asked to select the correct orientation.
(The comet faced to the left and downward.) Shown below are the responses
categorized as correct or not, shown separately for left- and right-handed
students. Is there evidence that left- or right-handedness is associated with
correct recollection of the orientation. If so, quantify the association. Write
a brief statistical report of the findings. (Data from Martin and Jones, 1999,
Hale-Bopp and handedness: individual difference in memory for orientation,
American Psychological Society.)

** **

** **

19.19. **Tire-related
fatal accidents and ford sports utility vehicles**. The table in
Display 19.13 shows the numbers of compact sports utility vehicles involved in
fatal accidents in the U.S. between 1995 and 1999, categorized according to
travel speed, make of car (Ford or other), and cause of accident (tire-related
or other). From this table, test whether the odds of a tire-related fatal
accident depend on whether the sports utility vehicle is a Ford, after
accounting for travel speed. For this subset of fatal accidents, estimate the
excess number of Ford tire-related accidents. (This is a subset of data
described more fully in exercise 20.18.).

** **

Data: ex1919.xls

20.18. **Fatal
car accidents involving tire failure on Ford Explorers**. The
Ford Explorer is a popular sports utility vehicle made in the U.S. and sold
throughout the world. Early in its production concern arose over a potential
accident risk associated with tires of the prescribed size when the
vehicle was carrying heavy loads, but the risk was thought to be acceptable if
a low tire pressure was recommended. The problem was apparently
exacerbated by a particular type of Firestone tire that was overly prone to
tread separation, especially in warm temperatures. This type of tire was
a common one used on Explorers in model years 1995 and later. By the end of
1999 more than 30 lawsuits had been filed over accidents that were thought to
be associated with this problem. U.S. federal data on fatal car accidents were
analyzed at that time, showing that the odds of a fatal accident being
associated with tire failure were three times as great for Explorers as for
other sports utility vehicles. Additional data from 1999 and additional
variables may be used to further explore the odds ratio. Display 20.19
lists data on 1995 and later model compact sports utility vehicles involved in
fatal accidents in the U.S. between 1995 and 1999, excluding those that were
struck by another car and excluding accidents that, according to police
reports, involved alcohol. It is of interest to see whether the odds that
a fatal accident is tire-related depend on whether the vehicle is a Ford, after
accounting for age of the car and number of passengers. Since the Ford
tire problem may be due to the load carried, there is some interest in seeing
whether the odds associated with a Ford depend on the number of
passengers. (Suggestions: (i) Presumably, older tires are more likely to
fail than newer ones. Although tire age is not available, vehicle age is
an approximate substitute for it. Since many car owners replace their
tires after the car is 3 to 5 years old, however, we may expect the odds of
tire failure to increase with age up to some number of years, and then to
perhaps decrease after that. (ii) If there is an interactive effect of
Ford and the number of passengers, it may be worthwhile to present an odds
ratio separately for 0, 1, 2, 3, and 4 passengers.) The data are from the
National Highway Traffic Safety Administration, Fatality Analysis Reporting
System (http://www-fars.nhtsa.dot.gov/).

Data: ex2018.xls

21.14. **Spock conspiracy trial**.
Reconsider the proportion of women on venires in the Boston U.S. district
courts (case study 5.2 in book). Analyze the data by treating the number of
women out of 30 people on a venire as a binomial response. (a) Do the odds of a
female on a venire differ for the different judges? Answer this with a
drop-in-deviance chi-square test, comparing the full model with judge as a
factor to the reduced model with only an intercept. (b) Do judges A-F differ in
their probabilities of selecting females on the venire? Answer this with a
drop-in-deviance chi-square test by comparing the full model with judge as a
factor to the reduced model which has an intercept and an indicator variable
for Spock’s judge. (c) How different is the odds of a woman on Spock’s judge’s
venires from the odds on the other judges. Answer this by interpreting the
coefficients in the binomial logistic regression model with an intercept and an
indicator variable for Spock’s judge.

21.17.
**Effect of Stress During Conception on Odds of a Male Birth**.
The probability of a male birth in humans is about .51. It has previously been
noticed that lower proportions of male births are observed when offspring are
conceived at times of exposure to smog, floods, or earthquakes. Danish
researchers hypothesized that sources of stress associated with severe life
events may also have some bearing on the sex ratio. To investigate this theory
they obtained the sexes of all 3,072 children who were born in Denmark between
1 January 1980 and 31 December 1992 to women who experienced the following
kinds of severe life events in the year of the birth or the year prior to the
birth: death or admission to hospital for cancer or heart attack of their partner
or of their other children. They also obtained sexes on a sample of 20,337
births for mothers who did not experience these life stress episodes. Shown in
the table below are the percentages of boys among the births, grouped according
to when the severe life event took place. Notice that for one group the
exposure is listed as taking place during the first trimester of pregnancy. The
rationale for this is that the stress associated with the cancer or heart
attack of a family member may well have started before the recorded time of
death or hospital admission. Analyze the data to investigate the researchers’
hypothesis. Write a summary of statistical findings. (Source: Hansen, et al.,
1999, “Severe periconceptional life events and the sex ratio in offspring:
follow up study based on five national registers,” British Medical Journal,
319: 548-549.

Data: ex2117.xls

** **

21.18. **HIV
and circumcision**. Researchers in Kenya identified a cohort of over
1000 prostitutes who were known to be a major reservoir of sexually transmitted
diseases in 1985. It was determined that over 85% of them were infected with
human immunodeficiency virus (HIV) in February, 1986. The researchers then
identified men who acquired a sexually transmitted disease from this group of
women after the men sought treatment at a free clinic. The table below shows
the subset of those men who did not test positive for the HIV virus on their
first visit and who agreed to participate in the study. The men are categorized
according to whether they later tested positive for HIV during the study
period, whether they had one or multiple sexual contacts with the prostitutes,
and whether they were circumcised. Describe how the odds of testing positive
are associated with number of contacts and with whether the male was
circumcised. (Data from Cameron, et al., 1989, Female to male transmission of
human immunodeficiency virus type 1: risk factors for seroconversion in men,
The Lancet.)

Data: ex2118.xls

21.19. **Meta-analysis
of breast cancer and lactation studies**. Meta-analysis refers to the
analysis of analyses. When the main results of studies can be cast into a
two-by-two table of counts, it is natural to combine individual odds ratios
with a logistic regression model that includes a factor to account for
different odds from the different studies. In addition, the odds ratio itself
might differ slightly among studies because of different effects on different
populations or different research techniques. One approach for dealing with
this is to suppose an underlying common odds ratio and to model between-study
variability as extra-binomial variation. The table below shows the results of
ten separate case-control studies on the association of breast cancer and
whether a woman had breast fed children. How much greater are the odds of
breast cancer for those who did not breast feed than for those who did breast
feed? (Data gathered from various sources by Karolyn Kolassa as part of a
Master’s project, Oregon State University.)

Data: ex2119.xls

22.25. **El Nino and Hurricanes**. Reconsider the El Nino and
Hurricane data set from exercise 10.28. Use Poisson log-linear regression to
describe the distribution of (a) number of storms and (b) number of hurricanes
as a function of El Nino temperature and West African wetness.

22.29. **Body
size and reproductive success in a population of male bullfrogs**. As an
example of field observation in evidence of theories of sexual selection,
Arnold and Wade (1984, “On the measurement of natural and sexual selection:
applications, Evolution, 38, p. 720-734) presented the following data set on
size and number of mates observed in 38 male bullfrogs. Is there evidence that
the distribution of number of mates in this population is related to body size?
If so, supply a quantitative description of that relationship, along with an
appropriate measure of uncertainty. Write a brief summary of statistical
findings.

Data: ex2229.xls