Simpson's paradox (or the
Yule-Simpson effect) is a
statistical paradox wherein the successes of groups seem reversed when the groups are combined. This result is often encountered in social and medical science statistics,
[1] and occurs when frequency data are hastily given causal interpretation;
[2] the paradox disappears when causal relations are derived systematically, through formal analysis.
Batting averages
A common example of the paradox involves batting averages in baseball: it is possible for one player to hit for a higher batting average than another player during a given year, and to do so again during the next year, but to have a lower batting average when the two years are combined. This phenomenon, which occurs when there are large differences in the number of at-bats between years, is well-known among sports sabermetricians such as Bill James.
A real-life example is provided by Ken Ross[12] and involves the batting average of baseball players Derek Jeter and David Justice during the years 1995 and 1996:[13]
| 1995 | 1996 | Combined |
Derek Jeter | 12/48 | .250 | 183/582 | .314 | 195/630 | .310 |
David Justice | 104/411 | .253 | 45/140 | .321 | 149/551 | .270 |
In both 1995 and 1996, Justice had a higher batting average (in bold) than Jeter; however, when the two years are combined, Jeter shows a higher batting average than Justice. According to Ross, this phenomenon would be observed about once per year among the interesting baseball players. In this particular case, the paradox can still be observed if the year 1997 is also taken into account:
| 1995 | 1996 | 1997 | Combined |
Derek Jeter | 12/48 | .250 | 183/582 | .314 | 190/654 | .291 | 385/1284 | .300 |
David Justice | 104/411 | .253 | 45/140 | .321 | 163/495 | .329 | 312/1046 | .298 |
[edit] Kidney stone treatment
This is a real-life example from a medical study[14] comparing the success rates of two treatments for kidney stones.[15]
The first table shows the overall success rates and numbers of treatments for both treatments (where Treatment A includes all open procedures and Treatment B is percutaneous nephrolithotomy):
Treatment A | Treatment B |
78% (273/350) | 83% (289/350) |
This seems to show treatment B is more effective. If we include data about kidney stone size, however, the same set of treatments reveals a different answer:
| Treatment A | Treatment B |
Small Stones | Group 1 93% (81/87) | Group 2 87% (234/270) |
Large Stones | Group 3 73% (192/263) | Group 4 69% (55/80) |
Both | 78% (273/350) | 83% (289/350) |
The information about stone size has reversed our conclusion about the effectiveness of each treatment. Now treatment A is seen to be more effective in both cases. In this example the lurking variable (or confounding variable) of stone size was not previously known to be important until its effects were included.
Which treatment is considered better is determined by an inequality between two ratios (successes/total). The reversal of the inequality between the ratios, which creates Simpson's paradox, happens because two effects occur together:
- The sizes of the groups, which are combined when the lurking variable is ignored, are very different. Doctors tend to give the severe cases (large stones) the better treatment (A), and the milder cases (small stones) the inferior treatment (B). Therefore, the totals are dominated by groups 3 and 2, and not by the two much smaller groups 1 and 4.
- The lurking variable has a large effect on the ratios, i.e. the success rate is more strongly influenced by the severity of the case than by the choice of treatment. Therefore, the group of patients with large stones using treatment A (group 3) does worse than the group with small stones, even if the latter used the inferior treatment B (group 2).
[edit] Berkeley sex bias case
One of the best known real life examples of Simpson's paradox occurred when the University of California, Berkeley was sued for bias against women applying to graduate school. The admission figures for fall 1973 showed that men applying were more likely than women to be admitted, and the difference was so large that it was unlikely to be due to chance.[16][3]
| Applicants | % admitted |
Men | 8442 | 44% |
Women | 4321 | 35% |
However when examining the individual departments, it was found that no department was significantly biased against women; in fact, most departments had a small bias against men.
Major | Men | Women |
| Applicants | % admitted | Applicants | % admitted |
A | 825 | 62% | 108 | 82% |
B | 560 | 63% | 25 | 68% |
C | 325 | 37% | 593 | 34% |
D | 417 | 33% | 375 | 35% |
E | 191 | 28% | 393 | 24% |
F | 272 | 6% | 341 | 7% |
The explanation turned out to be that women tended to apply to competitive departments with low rates of admission even among qualified applicants (such as English), while men tended to apply to less-competitive departments with high rates of admission among qualified applicants (such as engineering). The conditions under which department-specific frequency data constitute a proper defense against charges of discrimination are formulated in Pearl (2000).
[edit] 2006 US school study
In July 2006, the United States Department of Education released a study[17] documenting student performances in reading and math in different school settings.[18] It reported that while the math and reading levels for students at grades 4 and 8 were uniformly higher in private/parochial schools than in public schools, repeating the comparisons on demographic subgroups showed much smaller differences, which were nearly equally divided in direction.
[edit] Low birth weight paradox
-
The low birth weight paradox is an apparently paradoxical observation relating to the birth weights and mortality of children born to tobacco smoking mothers. Traditionally, babies weighing less than a certain amount (which varies between countries) have been classified as having low birth weight. In a given population, low birth weight babies have a significantly higher mortality rate than others. However, it has been observed that low birth weight children born to smoking mothers have a lower mortality rate than the low birth weight children of non-smokers.[19]
Vector interpretation
Simpson's paradox can also be illustrated using the 2-dimensional vector space.[21] A success rate of p / q can be represented by a vector
, with a slope of p / q. If two rates p1 / q1 and p2 / q2 are combined, as in the examples given above, the result can be represented by the sum of the vectors (q1,p1) and (q2,p2), which according to the parallelogram rule is the vector (q1 + q2,p1 + p2), with slope
.
Simpson's paradox says that even if a vector
(in blue in the figure) has a smaller slope than another vector
(in red), and
has a smaller slope than
, the sum of the two vectors
(indicated by "+" in the figure) can still have a larger slope than the sum of the two vectors
, as shown in the example.