Writing About Quantitative Data

If you have ever been on a guided tour of a museum, a mine, a theater, or some other tourist attraction, you know that the guide can make or break your experience. The excessively detail-oriented guide will put you to sleep, the ill-equipped comic guide will anger or embarrass you, and the overly casual guide will leave you puzzled and frustrated. But a guide that manages to walk you through the attraction, highlighting the most important and interesting features, and weaving a coherent story that links together the parts of the attraction, is the guide that will give you the best possible tour. In the same way, the results section of your research paper is the attraction that readers have come to see and your task as guide is to walk them through your analysis in a coherent, deft, and efficient manner. This chapter will alert you to some of the issues involved in achieving this level of sophistication in writing about quantitative data.

The Goal

After you have spent many hours constructing and analyzing a data set, you want to forcefully communicate to your readers what the data reveal as a result of your analysis. Notice that this goal follows from the goals of your literature review (to establish what needs to be studied and why it is important) and the goals of your data and methods section (to describe the data set and how it was constructed). Meanwhile, communicating what the analyzed data reveal sets you up for the goal of your discussion and conclusion sections (to sum up the big theoretical points raised and resolved by your analysis.)

Writing about quantitative results is unlike the writing that we are usually trained to do in school. As opposed to writing an essay with thesis statements and supporting points, you here find yourself alerting the reader to things that which you now consider to be obvious, and which you have sought to make obvious in your tables and figures, and yet you also must point these things out to the readers. Meanwhile, you must allow for readers to verify your claims while at the same time not beating them over the head with endless details that they could get themselves just by reading your tables and figures. So there is a delicate balancing of your efforts to point out the important points while at the same time respecting your readers’ ability to consider the facts for themselves. This is difficult to do well.

An Extended Example: Redundancy and Voice

Let's follow an example derived from research on the men's incomes in 1996. In Table 1 below, we find the descriptive statistics for income, for older and younger men.

The author wants to comment on the difference between the two groups of men.


Table 1: Income for American Men (1996)
 
N
Mean
Median
25th Percentile
75th Percentile
Young Men (18-40)          
Income 10767 36,531 30,000 19,700 43,000
Older Men (41-65)          
Income 10887 48,082 37,000 24,000 55,000

Here is the author's first draft describing the data:

"My analysis shows that the mean earnings for the younger men is $36,531 and for older men it is $48,082.
This is a difference of $11,551. There were 10,767 younger men and 10.887older men in the sub-samples."
 

Redundancy

Boring! This example illustrates the first thing NOT to do. Writing about the analysis does not mean that you repeat what is in the tables. The reader can easily see in the table what the author has written in this example.

Typically we do not write about the sample size (N) in the results section. Only rarely is it important for understanding the results. If sample sizes are small (less than 100 as a rule of thumb) it may be important to mention to the reader that the statistics you are using are more vulnerable to the influence of individual cases (outliers). In this example, however, this is not the case and the comment about the sample sizes is redundant and wasteful.

Meanwhile, the author obviously wants to draw attention to differences between the groups. So it may be better to simply point out that there is a difference of "x" dollars between the two groups’ average incomes without pointing out the raw values from which this difference was computed. The reader can verify your math if she or he wants to, but at the same time the author efficiently points out a difference which she thinks is important. Another way to do this would be to point to the percentage difference between the two groups, identifying that the older group makes about 30% more than the younger group.

This student's first draft shown above is equivalent to the museum guide pointing to Mona Lisa and saying, "Notice that she has long dark hair and is smiling." Boring, redundant, and almost insulting, huh?
 


Voice: Visible or Invisible Authors, Analyses and Audiences

This first draft also raises the issue of "voice". That is, how visible should the author and the author's analysis be in the presentation of the results? This first draft brings the author onto the stage when the author says "my". The author also highlights that the "analysis" is showing something, as opposed to the data revealing something. These are sometimes issues of taste and editorial license, but they are important because at times the author and her analysis can get in the way of what the results and findings. We will address these issues more in subsequent examples.

Here is a revision of the author's first draft:

"Table 1 demonstrates that mean income is dramatically lower for younger men than for older men. Older men have income about 30% higher than young men's.  This is consistent with Oppenheimer's argument regarding the "life-cycle squeeze" (1982) when younger men obtain lower annual earnings at the start of the careers. Meanwhile, the mean is higher than the median for both variables among both groups of men, suggesting that significant outliers are inflating mean earnings and incomes. Even in the upper end of the income brackets the
differences are pronounced. The upper quartile of older men make over $55,000, but for younger men, the upper quartile begins at $43,000."

This revision has moved the author and the analysis off of the stage by side-stepping ownership of the analysis ("my"). Now the table of analyzed data is the source of authority and information rather than "my analysis". The author simply makes interpretive comments about the relative size of the numbers in the table and begins to offer some explanation for why they appear as they do. This revision provides the reader the freedom to read the table for himself but to also consider what the author wants him to begin to conclude. The author has pointed out what she believes to be the most significant features of this part of the analysis and has begun to link it to the theoretical concerns raised earlier in the paper.

When the museum guide says "It is believed that Mona Lisa is smiling because her lover just sent her flowers," the guide is offering interpretation while identifying, in passing, the fact that she is smiling. A much more interesting approach, no?

In the next table, the author wants to show that educational level of men has an important effect on their incomes (see Table 2).


 Table 2: Income by Age and Education for American Men (1996)
 
N
Mean
Median
Young Men (18-40)      
   Less than college 7786 30,040 26,000
   College graduates 2981 53,483 42,000
Older Men (41-65)      
   Less than college 7246 37,507 32,000
   College graduates 3641 69,129 51,421

 
 

Here is a first draft of some text about table 2:

"From Table 2 you can conclude that educational level is extremely important for increasing the income of all men, regardless of age. Computer analysis of the data also reveals that the
effects of college graduation grow over a person's life. While among young men there is a $23,000 difference between college graduates and less educated men, this gap grows to $32,000 for older men."

This text raises two more issues of voice and the visibility of author and reader. The familiar "you" finds no place in formal academic writing. This is because the meaning of "you" is ambiguous. Is the author implying that the reader needed her permission to make this conclusion? Is this a veiled invitation to make this conclusion? A command to do so? Is this conclusion optional, such that some could make such a conclusion and others could not? Hence, the ambiguity. Here is a related case:

"In looking at the average income of men within different educational categories, we can say education has an important impact on…"

When the writer says "we can say", there is an assumption that the reader will want to say it too. The reader is once again visible but now is being asked to join the writer in saying something. Perhaps the following approach would work better:

"The observation that income varies so widely with education supports other researchers' claims that education is one of the most important influences on incomes."

This revised text puts the responsibility on the author to assert the meaning of the data, does not ask the reader to say it too, but allows the reader to accept or reject the interpretation offered.

Consider again the earlier version of the text focused on table 2, focusing on the second sentence:

"Computer analysis of the data also reveals that the effects of college graduation grow over a person's life.".

This text also errs in bringing the computer on to the stage. Generally, this is unwise. The reader does not really care if the author computed these statistics on a computer, an adding machine, an abacus, or on the back of an envelope. Similarly, references to computer software are generally not required (e.g. "…analysis of the data with SPSS…") although you may occasionally see published research where the authors believed that the software's unique abilities needed to be highlighted (or perhaps they want their own statistical prowess to be highlighted!) However, in general, it is best to let the computer be invisible.

Because most computer programs cannot handle names of variables such as "Men’s Earnings", they use truncated names like "MENSINC". Do not use these computer generated code names in tables and/or in writing about tables. Readers should not have to learn a new vocabulary to read the results section. Even if the computer prints out attractive tables with "MENSINC" as the heading of a row or column, change this back to its real meaning, and discuss it as such in the text.

A side-point: As mentioned in the discussion of methods sections, for researchers and students who have struggled with completion of their analysis, it is tempting to want to communicate to the readers how hard they worked to produce this analysis. For example, the author might want to say "Pain-staking and time-consuming efforts to compute the differences in earnings demonstrate that indeed…" Unfortunately, the readers of academic writing are not interested in the difficulties of research. Indeed, the author’s task is to make the results seem so self-evidently self-revealing that the reader will believe that these results effortlessly presented themselves to the author. This observation stands in contrast to the kinds of information that a tour guide would provide where we actually find it interesting that the painter completed the portrait under difficult conditions.

So, who should be visible and invisible in writing about results? For sure, the computer and the readers should be invisible. The data or "the analysis" can be visible, although the author should beware of putting excessive focus on the analytic process and keep attention on the results. And the author? This remains a point of disagreement among academic writers. In the revision for Table 1 suggested above, the author remains off-stage and simply make statements about the results, letting them be the source of authority and information:

"Table 1 demonstrates that mean income is dramatically lower for younger men than for older men. Older men make almost This is consistent with Oppenheimer's argument
regarding the "life-cycle squeeze" (1982) when younger men obtain lower annual earnings at the start of the careers. Meanwhile, the mean is higher than the median for
both variables among both groups of men, suggesting that significant outliers are inflating mean earnings and incomes. Even in the upper end of the income brackets the
differences are pronounced. The upper quartile of older men make over $55,000 but for younger men, the upper quartile begins at $43,000."

While the author remains off-stage here, some writers stand on the stage with their analysis, introducing each stage of the analysis, almost like magicians who say: "Next, I pull a rabbit out of a hat." For example, the author above could introduce Table 1 by saying: "I first compute the mean and median earnings for both groups of men. Table 1 demonstrates that…" Thus the author takes a more central role in the presentation of results. However, notice that the table is still the source of authority and information. In large part, the choice of whether or not the author appears in the text, usually as "I", is an editorial choice that will meet with approval by some and disapproval by other readers.

Tense

In all of the weak and strong examples provided so far, the author writes in the present tense. For example, "Table 1 demonstrates…" or "Computer analysis reveals…" This may feel somewhat awkward to the author since the results actually have been created over time through a laborious process of data construction and analysis. Many first time researchers are inclined to write something like this:

"Evaluation of the data revealed that the gap in earnings between the two groups of men was very large."

Most social science journal write in the present tense when discussing quantitative analyses. This is true even when they are writing about aggregated data covering several decades! The rationale is that if the analysis revealed something last week or last year, it reveals the same thing today. So Table 1 did not just say something on the day that the statistical analysis was completed, but the results continue to say the same thing. The reader can recall that the data were collected during a certain time (this information is revealed in the data and methods section) and the date on the paper indicates when the author is making the current claim.

The benefit of writing in the present tense is that it makes the quantitative results more compelling. Writing about results in the past tense makes them feel far away and clinical. However, some social science journals publish articles that are written in this style.

It should be noted that social science research based on participant observation or face to face interviews may best be communicated by writing in the past tense. If the research process is integral for understanding the results then this particularly makes sense. For example, if the researcher wants the readers to know that the setting in which the data were collected may have influenced the findings, then it makes sense to say so.

"Ronald indicated that he was not being paid enough for his hard work, although when his boss entered the room he quickly changed the subject."

or

"I pressed the manager for more detail when he evaded my question about the earnings of his workers down on the shop floor."

In these instances, the data and the acquisition of the data require that the author write in the past tense. However, quantitative data is generally treated (perhaps naively so) as timeless and context-independent and thus academic writers talk about it in the present tense.

Directing Attention to Tables and Graphs

If you were on a tour of a museum and the guide repeatedly says:  "Look at this painting – it is called __________"

At some point you would begin to wish that the guide would quit saying "Look here, look there" but instead simply point and start talking about the different paintings:  "Here we see the Mona Lisa…(etc.) but over here the portrait of her sister looks quite different."

In the same way, it is challenging to point out tables and figures without being heavy handed. Here are a couple examples from some students’ writing about some tables and figures:

"Looking at Table 1 for men’s earnings and focusing on the mean and median and comparing…, it shows that the mean and the median are…"

"Consider Table 1 which shows that…"

Both of these examples contain an implicit command to "look" or "consider". However, the author can assume that the reader will look and consider after she makes her claims about what Table 1 says. One or two implicit commands may not be bothersome to the reader, but many of them will make the reader feel like he is being bossed around. The goal is to focus on the findings by either stating what a certain table or figure reveals, or by using the parenthetical maps (e.g. Table 1, Figure 1) to point people in the right direction for confirmation of the claim.

Earth-shaking, Surprising, Considerable, and Negligible Results

The results section of the paper is the first place where the author can begin to provide some interpretation about how surprising or expected are the results. Choosing adjectives carefully here is important because it sets the tone for the rest of the paper. After many weeks of painstaking work, the temptation is to claim that the results are remarkable or awe-inspiring when in fact they are much more modest. On the other hand, many authors are excessively humble and fail to assert the importance of their finding. This is where reviewers are helpful for determining how big or little, important or trivial, memorable or forgettable are the results of the research. Without review from others, the author might claim:

"The average total income (using the mean) was much higher than the median."

or

" There is a real discrepancy between the average income of higher and less educated men."

Phrases like "much higher" and "real" are all open for argument. Beware the apparently neutral phrase such as "much higher". There is definitely a place for being persuasive and honest about findings, and if the difference is "huge", "noteworthy", "much bigger", etc. then say so. But make sure that you keep in mind the cynical reader who might wonder why you think $3,000 per year difference between the mean and the median is so huge.

In the second example above, the author has indicated the difference between the two groups is "real" (an apparently reasonable and testable assertion of statistical significance). Words like "big", "real", and "important" have their place in a results section, but be prepared to defend them and consider how they might either be misunderstood or might raise red flags for the reader.
 

Conclusion

Writing about data is one of the least common experiences for most social science majors and it is one of the hardest things to accomplish. You have clinical looking numbers and tables that tell an important sociological story. Overcoming the dullness of numbers and tables to appropriately reveal the compelling story behind them is difficult to accomplish. Meanwhile, as representations of hours of hard work on the part of the author, it is difficult to remain understated and casual enough to keep yourself, your computer and your painful research experiences off of center-stage so that the data can tell the story. And yet, the data do not really tell the story on their own. You are the tour-guide who helps the reader see the story in the data.
 

This document is a draft to be included in the Department of Sociology's Writing Handbook.  Comments to the author, Mark Edwards, are welcome:
    medwards@orst.edu