The Cross Section Ilusion

The side-bar of this blog declares that the Stage is a space to discuss “the intersections of governance, ecology, demographics, and culture.[1] This casts a wide net–at times, too wide of a net. It would be much easier to maintain a blog devoted solely to exploring the ‘dynamics of human civilization,’ chronicling the decline of America’s republican institutions, describing the intricacies of the Chinese strategic tradition and Chinese military history, analyzing contemporary Asian geopolitics, or any of the other recurring sub-themes long term readers of the Stage are familiar with than trying to mash them all together on one site. I keep the purview of the Stage open-ended for a reason: none of these topics can be truly understood in isolation. It is difficult to approach the most exciting questions and the most intriguing theories of our day without crossing disciplinary boundaries. The ecologist and the economist, the psychologist and the strategist, and the archaeologist and the political analyst all have a great deal to teach each other. Reality is not a one trick pony.

It is for such reasons we spend a great deal of time here discussing odd correlations for all sorts of things, trying to discern the connections between variables as different as genetics, climate, family structure, language, and other factors easily looked over and forgotten. This kind of analysis is both fun and necessary, but it must be done with care. It is too easy to get caught up in sloppy thinking when playing this game. I call a particularly common lapse the  “Cross Section Illusion.”

An excellent introduction to the Cross Section illusion and its attendant problems is a recent study by Roland Sturm and An Ruoping published in CA: A Cancer Journal for Clinicians The authors both come from RAND Corp, and as might be expected, write with the statistics crunching style RAND is famous for. Their chosen topic is America’s worsening “obesity epidemic.” Sturm and An suggest that both policy makers and the general public hold faulty ideas about the epidemic’s origins that could be avoided if researchers relied less on  cross sectional studies and more on time series data.

To get a sense of the difference between the two, consider the following:

Cross Section:

Obesity rates by race and sex among U.S. adults, 2011/2012.

Taken from Cynthia L. Ogden,; Margaret D. Carroll, Brian K. Kit, and Katherine M. Flegal, “Prevalence of Obesity Among Adults: United States, 2011–2012,” NCHS Data Brief, no. 131 (Washington DC: Center for Disease Control and Prevention, October 2013).


Time Series:

Increase in Body Mass Index of U.S. adult women, by race or ethnic background, 1986-2012.

Taken from Figure 1, Roland Sturm and Ruowang An, “Obesity and Economic Environments,” CA: A Cancer Journal For Physicians (22 May 2014, accessed June 2014).




Cross Section:

U.S. Adult Obesity Rates by Age and Education Level, 2008.

Taken from College Board,  “Trends in Higher Education: Adult Obesity Rates by Age and Education Level, 2008,” Trends in Higher Education: Figures and Tables (accessed June 2014). Original Data from the CDC’s National Center for Health Statistics (2008).


Time Series:

Increase in Body Mass Index of U.S. adults, by education level, 1986-2012.

Taken from Figure 1, Roland Sturm and Ruowang An, “Obesity and Economic Environments,” CA: A Cancer Journal For Physicians (22 May 2014, accessed June 2014).



Cross Section:

Prevalence of Self Reported Obesity Among U.S. Adults, 2012.

Taken from “Obesity Facts,” Center for Disease Control and Prevention (28 March, 2014; accessed June 2014).

Time Series:

Prevalence of Obesity in U.S. Adults from California, Colorado, and Mississippi, 1990-2012.

Taken from Figure 2, Roland Sturm and Ruowang An, “Obesity and Economic Environments,” CA: A Cancer Journal For Physicians (22 May 2014, accessed June 2014).


If you are concerned with American obesity rates and turn to the cross sectional data to try and figure out what is going on, it is easy to reach a flawed conclusion. The correlation between education and obesity, for example, seems quite clear. The poorer and less educated an American is, the more likely he or she is to be obese. Looking at this data it seems reasonable to suggest that something about poverty is making people more obese–perhaps cruddy processed food is the only thing America’s poor and less educated can afford to buy, or maybe the poor live in urban areas where people do not exercise. These hypotheses are plausible… until you look at the time series. It then becomes apparent that the rich and educated are gaining weight at the same rate as the poor.  Poverty cannot explain this.

Sturm and An make a similar point about geographic explanations of the obesity epidemic, poking fun at the Colorado Diet in particular:

What about geographic differences? There is a famous set of maps by the Centers for Disease Control and Prevention that illustrate the changing obesity prevalence by stage since 1985. However, some interpretations of these maps seem to confuse cross-sectional differences with changes over time. A new diet book about the “Colorado diet” includes the following description by the publisher: “Americans are getting fatter. A third of them are now obese–not just a few pounds overweight, but heavy enough to put their health in jeopardy. But, one state bucks the trend. Colorado is the leanest state in the nation, but not because of something in the air or the water.”
Figure 2 shows the prevalence of BMIs of over 25 kg/m2 (ie, overweight or obese) over time for Colorado (the state with the lowest average BMI or overweight or obesity rates), California, and Mississippi (the state with usually the highest rates). The overweight/obesity rates in Colorado do lag behind those in Mississippi, but we see no evidence of any “bucking the trend.”….To understand the obesity epidemic, rather than asking a question such as “Why are people in Colorado thinner than people in Mississippi?” we need to ask why are people in Colorado gaining weight at the same rates as people in Mississippi? (emphasis added) [2]

The authors continue with the myth-busting for the rest of the paper, ending up with a simple and sensible explanation for rising obesity rates. I encourage those interested in this issue to read the entire thing. But I bring it up here because of the larger issue it illustrates. It is very difficult to make meaningful claims about causation–or even correlation!–on the basis of cross section data alone. Often times seemingly perfect, statistically significant correlations disappear when the same variables are viewed over a longer stretch of time. In other cases-as in this one-time series data reveals that the real story isn’t about variance between two groups at all, but about the rate at which each group is changing. It is all too easy to be fooled by the Cross Section Illusion.


This is worth keeping in mind the next time someone uses a few cross sectional studies to try and convince you that a correlation between wealth, violence, political systems, genes, geography, or whatever else may be the flavor of the day should be taken more seriously. They may be right–but before you cede the point, be sure to check if the time series version of the data supports their claim.

 —————————————

[1] This is how it reads at the time of this writing:  7 June 2014.

[2] Roland Sturm and Ruowang An, “Obesity and Economic Environments,” CA: A Cancer Journal For Physicians (22 May 2014, accessed June 2014).

Leave a Reply to pseudoerasmus Cancel reply

4 Comments

Well, I just had a blogpost making an argument about obesity based entirely on time-serial patterns :

http://pseudoerasmus.com/2014/06/04/the-falling-price-of-fat/

(Don't know if you saw the addendum on Japanese incidence.)

Your overall point is correct, but the problem with your particular illustrations is that the time series confirm one aspect of the cross-sectional data : the differences between groups very much hold over time. So the fact that there is a global (i.e., across-the-board) secular change in the incidence of obesity does not at all invalidate the argument that group differences in obesity can have different causes !

It's funny, the tenor of your argument exactly inverts the tenor of the commenter "Paul" at my blog. He thinks I understressed the group differences with the secular time series data.

Of course there's another source of data bias : parochialism. International comparisons also tell you a lot more than national data alone.

Some have inferred from the Census that most people in Florida are born Hispanic and die Jewish.

@Pseudoerasmus-

I did not see what you wrote about the Japanese case. It was an interesting read; you essentially make the same point that the RAND folks do about Colorado.

Your overall point is correct, but the problem with your particular illustrations is that the time series confirm one aspect of the cross-sectional data : the differences between groups very much hold over time. So the fact that there is a global (i.e., across-the-board) secular change in the incidence of obesity does not at all invalidate the argument that group differences in obesity can have different causes !

This is true. Another reader said the same thing to me via personal communication. As I told him, the main purpose of this post was not to explain away group differences, but to point out that group differences by themselves make a poor explanation for the entire trend. The gap between Mississippi and Colorado in 2012 means little when placed next to the gap between Colorado in 2012 and Colorado in 1980!