Talk:Simpson's paradox

Simpson's paradox received a peer review by Wikipedia editors, which is now archived. It may contain ideas you can use to improve this article.

Mathematics Mid‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
Mid	This article has been rated as Mid-priority on the project's priority scale.

Statistics Mid‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Mid	This article has been rated as Mid-importance on the importance scale.

Changes to first few paragraphs[edit]

I'd like to change the first few paragraphs of this article to make it friendlier to folks afraid of math, and was wondering what other people thought. Here's a possibility:

Simpson's paradox is a statistical paradox described by E. H. Simpson in 1951, in which the accomplishments of several groups seem to be reversed with the groups are combined. This seeminhgly impossible result is encountered surprisingly often in social science and medical statistics.

As an example, suppose two people, Ann and Bob, who are let loose on Wikipedia. In the first test, Ann improves 60 percent of the articles she edits while Bob improves 90 percent of the articles he edits. In the second test, Ann improves just 10 percent of the articles she edits while Bob improves 30 percent.

Both times, Bob improved a much higher percentage of articles than Ann - yet when the two tests are combined, Ann has improved a much higher percentage than Bob!

The result comes about this way: In the first test, Ann edits 100 articles, improving 60 of them, while Bob edits just 10 articles, improving 9 of them. In the second test, Ann edits only 10 articles, improving 1 of them, while Bob edits 100 articles, improving 30 of them. When the two tests are added together, both edited 110 articles, yet Ann improved 69 of them (63 percent) while Bob improved only 40 of them (36 percent)!

Seems reasonable enough to me, although I wouldn't say "accomplishments" for "successes". "Success" in statistical jargon is not necessarily a positive thing! How about "ratings" instead?

I presume you are intending to leave the remaining paragraphs unchanged? -- Securiger

That was my thought, yes. So I'll go ahead and do this, then. DavidWBrooks 13:13, 17 Feb 2004 (UTC)

(However, looking it over again, I'll do my arithmetic correctly before I post it! Oops ... DavidWBrooks)

Is it a problem that the example explicitly refers to Wikipedia? (I'm thinking WP:SELF.) Avram 21:24, 10 March 2006 (UTC)[reply]

I'm just a newcomer here, but maybe these changes would be better suited to the simple.wikipedia.com version? Th3et3rnalz (talk) 15:33, 22 March 2020 (UTC)[reply]

Those changes were discussed 14 years ago. The newest comments in a talk page are on the bottom - DavidWBrooks (talk) 16:41, 22 March 2020 (UTC)[reply]

Order[edit]

I am not a frequent editor but shouldn't description come before the examples and not the other way around? —Preceding unsigned comment added by 88.234.7.51 (talk) 12:08, 28 November 2010 (UTC)[reply]

Nice work[edit]

I have recently been browsing the logic & game theory articles. This is the best I have seen so far. Congratulations to all concerned.

John Moore 309 12:36, 24 April 2006 (UTC)[reply]

I just read this article too, having come from Texture filtering and I am very impressed! This article is brilliant! --137.205.76.219 15:48, 27 January 2007 (UTC)[reply]

The same paradox?[edit]

I wonder if this is the same paradox and if it could be used as an example. I find it very easy to understand — and from real life.

Assume a population with 50% men and women and in both groups competence is spread in the same way. Imagine a situation where women are required to have more competence to get a promotion to management. You will then notice that women on the management level are more competent than male managers and that women in sub-management are more competent than men on the same level. This seems paradoxical at first considering that, on the whole, women and men are equally competent. Samulili

It's a nice example. In order to convince myself (and perhaps others) that it's the same paradox, I'll now assume that on average, the women are slightly less competent than the men (no offence, just to sharpen the paradox and make it clearer that Simpson is involved), and I'll add some numbers:

Suppose we have 100 men and 100 women. 18 of the men are highly competent, and 14 of them are in the management. Of the 82 less competent men, 6 are in the management. 17 of the women are highly competent, but only 8 of them are in the management. Of the 83 less competent women, 2 are in the management. Then, of the women in the management, 8/10=80% are highly competent, and of the sub-management women, 9/90=10% are highly competent. Of the men, only 14/20=70% of those in the management group are highly competent, and only 4/80=5% in the sub-management group are highly competent. So, in both groups, more of the women than of the men are highly competent, but combined, only 17/100=17% of the women are highly competent, while 18/100=18% of the men are.

Conclusion: This is indeed a Simpson paradox, and the only change compared to that suggested above is that I made it a little sharper by making the women less competent over all instead of just equally competent. However, I like the original better, and I think someone should go ahead and add it to the article. I'm afraid it takes skills beyond mine to write it in a simple way that makes it clear that it is a Simpson's paradox.--Niels Ø 20:04, 2 May 2006 (UTC)[reply]

How is this a paradox?[edit]

For Ann, the time that she royally screwed up barely counts, while the time that she did poorly counts the most. For Bob, the time that he royally screwed up hugely affected his total, while the time that he did amazing barely counts at all. I don't quite see why the results are surprising. Anyone care to enlighten me?

It all makes sense in the end, but it's still initially surprising for most people who are not aware of the explanation or suspect it. If you only know the partial percentages, then the total percentages would come as a surprise to most people. Obviously, once the weights are introduced, the initial surprise is exchanged for comprehension, but then a paradox is only a seemingly self-contradictory statement anyway, so I see nothing wrong with calling this a paradox. -Kvaks 01:09, 2 September 2005 (UTC)[reply]

Its not strictly a paradox, since there is a straight forward solution. But, its widely known by that name, so we ought to keep it. --best, kevin ···Kzollman | Talk··· 04:19, September 2, 2005 (UTC)

I do not support the idea that the phenomenon is not "really" a paradox. Many good paradoxes are based on representing a situation in such a way that a false conclusion seems obvious.--Niels Ø 08:18, 6 October 2006 (UTC)[reply]

Agreed with Niels. The key point to remember is that in the baseball batting average example, there are large differences in the number of at-bats between years. Rock8591 (talk) 06:18, 9 August 2009 (UTC)[reply]

One of the finer Wiki entries[edit]

The storytelling conceit, complete with sly reference to those other Simpsons, "Bart" and "Lisa," works well for me. This kind of explanation helps me in explaining a concept to others, even as I work to fully grasp it myself. The inclusion of the Wikipedia within the definition does not seem overly self-referential, as one observer has worried. Entries like this are the reason I seek out Wikipedia's take on things before looking to other, traditional sources. Thanks for an entertaining and elucidating entry! Matthew Treder 18:42, 2 May 2006 (UTC)[reply]

Agreed. The examples are clear, well written, and logical. And the references to Bart & Lisa Simpson are not only clever and fun, they also make it EXTREMELY easy for many people to remember this phenomenon as well as its associated name. If we name them Dick & Jane it would be far less memorable. How great it is when practicality and humor intersect! Jon Miller

Indeed! --WikiSlasher (talk) 13:01, 11 December 2007 (UTC)[reply]

The first (graphical) example could be made a whole lot clearer if the symbols x and y and relationships between them were explicitly defined. I would be pleased to contribute to this cause, but -- well, I am still bewildered by it. The other (real life) examples work quite well and make the fictitious, original, self-referent narrative unnecessary. Finally, if a vote gets taken, please cast mine in favor of "fallacy" -- not to exclude "paradox" but to strengthen the importance of this entry.24.130.61.77 (talk) 19:46, 2 January 2011 (UTC)[reply]

Why did I not get this earlier... mattbuck (talk) 14:22, 11 December 2007 (UTC)[reply]

I'm new to commenting here, so I apologize if I'm doing this wrong.

The question was raised as to whether or not it's appropriate for this article to reference Wikipedia [WP:Self]. I believe it may be, but should certainly be discussed. The point of avoiding self references, as I read that guideline, is to not use phrases such as "elsewhere on this site" or "in another Wikipedia article". The point is NOT to pretend that Wikipedia doesn't exist.

The article could reference bowling or mowing lawns or a great host of other activities where the characters' performance can be quantified. I suspect the Wikipedia reference was used simply because the author assumes that those reading it will be familiar with the process.

However, I don't believe that the act of editing Wikipedia articles is a good example of much anything, because most people I know who read Wikipedia have never edited anything. I've been reading for years and only today even created an account to post anything. So the example took a little more effort for me to understand than many other possible analogies could have.

And, continuing that thought and going back to the self reference guideline, the plan as I have understood it is to eventually do a printed Wikipedia. Regardless of the form, any time this article appears outside the wikipedia.org website the chances of the reader understanding the example become greatly diminished.

In other words, I like the example used here, but a different example may be more comprehensible and practical.

Ha! That's funny! Thank's for putting Bart and Lisa in the Simpson's paradox. --69.67.229.185 03:02, 26 August 2006 (UTC)[reply]

A word[edit]

The Lisa-Bart example ends in this sentence: But it is possible to retell the story so that it appears obvious that Bart is more diligent. Would it not be more natural to say "tell" instead of "retell", since it is the original statement of the situation that appears to have this conclusion?--Niels Ø 08:18, 6 October 2006 (UTC)[reply]

Good poinmt. Thanks. I've changed that line to something that I think is even better: But it is possible to have told the story in a way which would make it appear obvious that Bart is more diligent. --Keeves 12:13, 6 October 2006 (UTC)[reply]

The kidney case[edit]

I expanded the text on the two factors at the end of the section to relate more specifically to the medical example. Reading what I've written, it seems natural to ask: Why did doctors give the inferior treatment B to the milder cases, when A is better in those cases too? I have not consulted the references on this case story, but perhaps someone who has (or will) can answer my question. I imagine one of two answers: (i) Before this particular investigation, they did not know that B was inferior even in the milder cases. (ii) Treatment A is more expensive, and is therefore primarily given to those patients who need it the most. In fact, if there are no other confounding variables involved, and if A is more expensive than B, then, within a given budget, the largest number of cures is obtained by treating as many as possible from the large-stone-group with A.--Niels Ø 13:29, 13 October 2006 (UTC)[reply]

Thanks for your changes, it reads more clearly. I don't have access to the original study, but from the review and title it appears to compare surgery, ultrasound and/or using catheters. Unsurprisingly the open surgery (treatment A) is the most effective, and probably is the most the expensive with the greatest post-treatment complications. TobyK 13:36, 31 October 2006 (UTC)[reply]

Suggested addition to aid paradoxical comprehension[edit]

existing section under 'Explanation by example' subtitle

[Who is more accomplished? Lisa and Bart's mutual friends think Lisa is better—her overall success rate is higher. But it is possible to have told the story in a way which would make it appear obvious that Bart is more diligent.]

append with the addition of

+ [However, some will note that the use of statistical analysis to present a biased view is not uncommon, for example in politics. On close inspection, one may find that Bart's edits are of a higher quality, elucidating complex subjects poorly understood by the general populace. Although Lisa and Bart's mutual friends think Lisa is better, history may judge Bart's legacy to humanity to be more significant.]

This may help answer those who fail to comprehend the paradoxical nature

Teeteetee 09:51, 2 March 2007 (UTC)[reply]

How so? The quality of the edits is unrelated to the paradox we're dealing with here; it's entirely about the number of edits.--Niels Ø (noe) 09:56, 2 March 2007 (UTC)[reply]

Extracted from the article's sub-section. . . .

" worth of work/Success/managed/achieved successful/worse/we feel/disappointed/accomplished/mutual friends think/better/diligent "

Are these "entirely about the number of edits" ? Teeteetee 19:34, 4 March 2007 (UTC)[reply]

OK' I didn't put that as clearly as I should have. The point is, we need not distinguish very good edits from minor improvements; that's not what the example is about. Whether they elucidate complex subjects is utterly irrelevant. However, the words accomplished and diligent that you quote may be misleading for the same reason: They seem to suggest some edits not merely improve articles, but that they display particular diligence, which (though of course true) is, as I said, utterly irrelevant.--Niels Ø (noe) 20:25, 4 March 2007 (UTC)[reply]

I do not understand your meaning.

I have tried several times to understand.

If you could avoid criticising existing aspects of the article I might better understand.

....

Do you agree with the following statement ?

"If Bart only edited one article (and that one edit brought about world peace), Lisa's lifetime of editing thousands of articles may statistically appear better (to friends, family, politicians, religious leaders, and others viewing the statistical view), but may be judged by history to be worth less than Bart's one edit."

Teeteetee 11:52, 8 March 2007 (UTC)[reply]

Sure, but it's got nothing to do with Simpson's paradox. The Bart-and-Lisa example is solely about the number of edits that were improvements, and the number' that were not. It does not distinguish between large improvements and small improvements.--Niels Ø (noe) 16:24, 8 March 2007 (UTC)[reply]

By using "it"(in the sentence above "It does not distinguish..."), I assume you mean Simpson's Paradox.

If so, you appear to be writing "Simpson's Paradox does not distinguish between large improvements and small improvements"

....

or, put alternatively,

When Simpson's Paradox occurs improvements can be difficult to distinguish.

Teeteetee 17:29, 12 March 2007 (UTC)[reply]

If you are seriously suggesting changes to the article, I think you should either be bold and make those changes, or explain clearly at this talk page what you'd like to change, and why. I've no idea what your point is.--Niels Ø (noe) 22:01, 12 March 2007 (UTC)[reply]

Thankyou for the advice, but, I was bold on 01March2007. Also, I hoped I had clearly explained my suggestion above (at 09:51, 2 March 2007)

My original article edit can be found here> [4] at the end of the 'Explanation by example' section.Teeteetee 12:31, 13 March 2007 (UTC)[reply]

Well, I believe I have made my concerns clear, where as I do not understand what your point is. Do you think your contribution is related to Simpson's paradox, or does it merely offer an alternative angle on the Lisa-and-Bart example, an angle unrelated to Simpson's paradox? Do you actually understand Simpson's paradox, or are you trying to understand it?--Niels Ø (noe) 12:57, 13 March 2007 (UTC)[reply]

I believe I understand Simpson Paradox.

I also believe context aids understanding.

I was attempting to provide others with some context. Teeteetee 13:50, 3 April 2007 (UTC)[reply]

Then I am at a loss. I am certain I understand Simpson's paradox, and I am certain it (in the Bart-Lisa-example) has nothing to do with distingushing between large and small improvements. The context is clear (wikipedia editing, some edits being improvements, other not). Adding more context - irrelevant to the paradox - will confuse matters by having readers trying to understand how it is relevant. Please explain, what is the point?--Niels Ø (noe) 14:49, 3 April 2007 (UTC)[reply]

How is the Electoral College an example of Simpson's paradox?[edit]

In both the Lisa/Bart example and the kidney stones example, there is a 3x2 table with 6 entries. How can the Electoral College data be presented in this way? There are the 2 parties, so that's the "2" dimension. But what is the "3" dimension?


Example	the "2" dimension	the "3" dimension
Lisa / Bart	Lisa / Bart	Week 1 / Week 2 / Total
kidney stones	Treatment A/B	small stones / large stones / together
Electoral College	Rep / Dem	??? / ??? / total number of Electoral College votes

--Occultations 21:46, 15 May 2007 (UTC)[reply]

I suspect the analogy (the College cannot reproduce the paradox exactly since the outcome in each state is only related to the difference in votes through the sign of the difference, not magnitude. One could not lose the College if every state was won.) is that one can "win" the nationwide popular vote, but under certain circumstances can lose in the College. Baccyak4H (Yak!) 03:07, 16 May 2007 (UTC)[reply]

I've removed the Electoral College example, it's not an example of Simpson's paradox. Unless, that is, someone can show how it fits the 3x2 table pattern. --Occultations 12:53, 28 May 2007 (UTC)[reply]

Do we need the fake example?[edit]

We have four different real-world examples now, some with statistics. Do we need the "bart/lisa" fake example to explain it any more? At the very least, I'd like to move the real examples up above the pretend one - I think lots of people stop reading when the article lurches into "explaining" mode. - DavidWBrooks 23:41, 22 May 2007 (UTC)[reply]

I was about to make an almost identical heading. It's a pretty asinine self-reference in addition to being original research. Milto LOL pia 04:32, 23 May 2007 (UTC)[reply]

I agree with the removal of fake examples (as I've just done with the baseball example). This section should be moved below the examples, and then transformed into a general discussion of what may cause the paradox to appear (talking about weighted averages, confounding variables, etc). Schutz 07:12, 23 May 2007 (UTC)[reply]

Then I'll do the move, and we can do the transformation later. - DavidWBrooks 10:00, 23 May 2007 (UTC) .. oops, never mind: somebody already did.[reply]

You're still welcome to do the transformation now that I have done the move :-) Schutz 13:44, 23 May 2007 (UTC)[reply]

But that will require thought and skill - I hoped I could get away with a nice, mindless move. - DavidWBrooks 14:00, 23 May 2007 (UTC)[reply]

Too late :-) I'll think about the transformation, but, as you say, it requires quite a bit of thinking first. Before that, I'll add a few more references and reformat the examples, and hopefully (if I can get around to doing it), add 2 images. Schutz 21:27, 23 May 2007 (UTC)[reply]

I have readded the example after User:Miltopia removed it, since the consensus above was for now to move the example rather than delete it. We all agree that we have enough real examples and do not need fake examples on top of that; however, this section is the only one that goes beyond giving an example, but also discuss the question of weighted averages. I don't think it is very good, or that it covers everything it should, but at the moment it is better than nothing. If nothing happens with it in the near future, then it can be removed. Schutz 07:44, 24 May 2007 (UTC)[reply]

The Bart-Lisa example is pointless and misleading. The whole point of Simpson's paradox is that differences in underlying groups may be causing changes that lead to misleading results when the groups are not taken into account - the underlying groups are important in themselves and must be investigated for a proper analysis. But in the Bart-Lisa case, the underlying groups are 'week 1' and 'week 2'. Why are the success rates of editing being divided into weeks? The only reason for doing so would be that the success rates are changing consistently across weeks for both Bart and Lisa. But I can obvious see no reason why 'week' would be an appropriate grouping factor. This example makes the impression that you should divide your data into different groups for no reason and assess across those meaningless groups - perhaps doing so until you get the answer you want (e.g. Bart should be better than Lisa. We don't see in across both weeks, so we divide into weeks and aha! there we see it. If we hadn't seen it within weeks, maybe we should divide into days...) 124.197.3.68 (talk) 14:39, 18 February 2010 (UTC)[reply]

I don't understand this comment, or maybe I misunderstand the definition of Simpson's Paradox. I thought that the term applied to any case where the two group results agreed with each other but disagreed with the aggregate result--regardless of the underlying cause. If so, then the Bart-Lisa example fits the definition perfectly.

And, it was by far the easiest one for me to understand because one didn't need to understand or even know anything beyond the simple data that were presented. After I thought about the Bart-Lisa case for a long time, it suddenly hit me how it is possible--even easy!--for the seemingly paradoxical situation to occur. The underlying cause in the Bart-Lisa case is not systematic (a lurking variable). The cause is random variability. The case is very plausible, since the sample sizes are so low. But does it matter what the source of the paradox is? (Random variability vs. lurking variable?). It seems to me that the paradox is the paradox regardless of the underlying causality.

Mcamp@cinci.rr.com (talk) 23:46, 18 August 2014 (UTC)[reply]

Correlation/Causation[edit]

Would it be an idea to add Correlation does not imply causation into the 'See also' section? Apologies if this has already been covered, I don't find any references to it. Flex Flint 08:57, 17 July 2007 (UTC)[reply]

I'd also suggest that Milo Schield's fine paper "Simpson's Paradox and Cornfield's Conditions" (http://web.augsburg.edu/~schield/MiloPapers/99ASA.pdf) be added to the references, and mention Cornfield's conditions somewhere in the main sections. Haruhiko Okumura (talk) 08:42, 14 August 2008 (UTC)[reply]

The correlation/causation issue is important in its own right, but has little to do with "Simpson's Paradox." I would suggest removing this part of the text in the extant introduction. Scrooge62 (talk) 18:28, 2 December 2009 (UTC)[reply]

Yes I think Correlation does not imply causation should have a link somewhere from this article - if it's not appropriate at any pther point, it should be in "See also".

I think the lead is fine as it stands. Correlation/causation is a much wider topic than Simpson's paradox, but it seems to me the ONLY relevance of Simpson's paradox is that it is ONE of the counterexamples that can be used to reject the intuition saying that correlation DOES imply causation.--Noe (talk) 08:15, 3 December 2009 (UTC)[reply]

I would emphatically stress that Correlation does not imply causation is very strongly connected to Simpson's Paradox. Correlation is based on the unconditional (or marginal) relationship between two variables. But causation would be based on their conditional relationship controlling for confounding factors. The fact that a conditional relationship can have the opposite sign of an unconditional relationship is precisely Simpson's Paradox and is also precisely the reason why correlation cannot be taken to imply causation. No two concepts could be more strongly related! -- --Geomon (talk) 06:10, 18 January 2010 (UTC)[reply]

Vector vs. Line[edit]

I reverted a diff [5] changing vector to line in one instance. First, the section it's in is called "Vector Interpretation", so referring to vectors is the expected language of that section. Second, the word change was made in only one instance, making the whole paragraph internally inconsistent as it switched from line in the first instance to vector in all other. qitaana (talk) 22:17, 26 February 2008 (UTC)[reply]

Low birth weight paradox[edit]

How is this an example of Simpson's paradox? From the information given, I see only a medical "paradox", not a statistical one. 72.75.98.88 (talk) 22:23, 15 May 2009 (UTC)[reply]

I agree. It looks like the example states that, given that a child is low birth weight, it has a lower infant mortality rate if born to a smoking mother. It would only be an example of Simpson's paradox if, given the child is born to a smoking mother, it has a lower infant mortality rate if it were low birth weight. JokeySmurf (talk) 05:36, 16 May 2009 (UTC)[reply]

I don't see how that would be Simpson's paradox either. If low birth weight meant lower mortality in both smokers and non-smokers, but higher mortality in the population as a whole, that would be an example of Simpson's paradox. 72.75.98.88 (talk) 13:52, 16 May 2009 (UTC)[reply]

It's poorly stated, but the paradox is that normal birth weight infants of smokers have about the same mortality rate as normal birth weight infants of non-smokers, and low birth weight infants of smokers have a much lower mortality rate than low birth weight infants of non-smokers, but infants of smokers overall have a much higher mortality rate than infants of non-smokers. This is (of course) because many more infants of smokers are low birth weight, and low birth weight babies have a much higher mortality rate than normal birth weight babies. The reference does explicitly state that it is an example of Simpson's paradox. 129.22.208.134 (talk) 20:18, 8 July 2009 (UTC)[reply]

Page updated accordingly. 124.74.76.114 (talk) 07:52, 12 February 2014 (UTC)[reply]

Health care disparities[edit]

The newly added section Health care disparities sounds interesting. However, as it stand, I don't think it belongs. EITHER, it should be expanded to make it an illuminating exapmle of the paradox, OR it should be removed or boiled down to at most one sentence and a reference.--Noe (talk) 08:00, 24 September 2009 (UTC)[reply]

Stigler's law[edit]

To where it reads, Since Edward Simpson did not actually discover this statistical paradox, I propose to add [note 1: See Stigler's law]. To see how this would affect the over-all appearance of this article, view the proposed revision in my sandbox. --Pawyilee (talk) 02:31, 22 February 2010 (UTC)[reply]

There being no objection, I moved it into the article. --Pawyilee (talk) 14:34, 23 February 2010 (UTC)[reply]

Kidney stones[edit]

user:DavidWBrooks recently removed the first table in the kidney stone example, which showed only the results when no distinction is made for kidney stone sizes. As the section now stands, I don't find it satisfactory. I think it needs to be made clearer that false conclusions may be drawn when the lurking variable is not identified. One way to clarify this would be to put back the table (reverting half the edit in question), and I'm inclined to do that - but I'll wait and see...--Nø (talk) 15:37, 7 April 2010 (UTC)[reply]

I removed it because it seemed redundant, unnecessary - the current table (it seems to me) shows everything that the first table showed; in fact, it contains that entire table. Listing two different tables made it seem, I thoiught, as if something changed between them, but the second table was merely an expansion. However, if others disagree, then I certainly will bow to the majority. - DavidWBrooks (talk) 17:19, 7 April 2010 (UTC)[reply]

YYes, the second table contain all info, but the way the section reads now fails to make an important point clear. The easiest way to fix that is to revert your edit, but I'm sure there are other ways (and probably better ways) to fix it. Feel free.--Nø (talk) 07:11, 8 April 2010 (UTC)[reply]

It seems to me that these sentences following the table make the point clear: "The paradoxical conclusion is that treatment A is more effective when used on small stones, and also when used on large stones, yet treatment B is more effective when considering both sizes at the same time. In this example the "lurking" variable (or confounding variable) of the stone size was not previously known to be important until its effects were included." But perhaps not; perhaps the matter needs to be expanded or clarified. - DavidWBrooks (talk) 13:18, 8 April 2010 (UTC)[reply]

What made the fallacy clearer was that the "combined case" and the "obvious" conclusion was stated before the extra information was added and the refined conclusion reached. I think this was a more paedagogical presentation.--Nø (talk) 18:47, 8 April 2010 (UTC)[reply]

We have clarified our disagreement: It struck me as redundant, even a bit confusing. Anybody else have an opinion? - DavidWBrooks (talk) 19:28, 8 April 2010 (UTC)[reply]

Fallacy[edit]

Although this situation is called Simpson's paradox, this article is very useful in illustrating a fallacy in statistics that can be corrected. Of course, Simpson's paradox goes away when one properly accounts for external variables. For example in the Male/Female admissions lawsuit, the statistics can be shown with a common weighting of departments (apples-to-apples-comparison). If this is done, there is no paradox. —Preceding unsigned comment added by Fulldecent (talk • contribs) 06:38, 14 November 2010 (UTC)[reply]

True - like most paradoxes, it's only paradoxical when when described in a misleading way. You can see a paradox as a challenge to find the right way of describing the situation. - Do you suggest a change to the article?--Nø (talk) 11:04, 14 November 2010 (UTC)[reply]

on my removal of "how likely"[edit]

I've just removed the section on "how likely" Simpson's paradox is. The reason for this is that in order to make sense of the statement you need to assume a probability distribution for the entries of a 2x2x2 table (presumably what the section's statement about "assuming certain conditions" was a reference to). My basic argument here is that without a statement of those "certain conditions" the statement is essentially meaningless, so we have to go to the paper to find out what it means.

Omitting explicit reference to a probability distribution in choosing an object "at random" is commonly done in elementary expositions of statistical concepts when the situation is simple enough that the distribution can be inferred from the surrounding context, or there is in some other sense enough "intuition" to suggest a natural choice. Examples like the "Bertrand paradox" show that this it not unproblematic. I would argue that here, there is not enough context or "intuition" to give those without any precise understanding of statistics/probability any sense of what it means to fill in a 2x2x2 table "at random" according to the distribution assumed in the Perlman paper, and it is potentially misleading to present the context-free assertion as if it has enough context to determine an intuitive meaning. (I should point out that the paper itself makes no claims that this distribution is "the only one" worth considering--- nor does it argue, for example, that actual statistical practice in filling in 2x2x2 tables is at all comparable to the model they assume when they calculate the .0166 figure cited here. It just computes various probabilities in a model.) 173.30.19.136 (talk) 06:28, 30 January 2011 (UTC)[reply]

Definition needed for "quality modifier"?[edit]

In the Bart/Lisa example, which I found the most helpful example, especially the graph, do we need to define "quality modifier" or at least provide a link to another article? Frankly, I am not sure what is meant by this phrase. How can numbers be qualitatively different? Clearly, this is a use of "quality" that is different than the common use, where quality is usually contrasted quantity. Here's the sentence: "Also when the two tests are combined using a weighted average, overall, Lisa has improved a much higher percentage than Bart because the quality modifier had a significantly higher percentage. Therefore, like other paradoxes, it only appears to be a paradox because of incorrect assumptions, incomplete or misguided information, or a lack of understanding a particular concept."--Bruce Hall (talk) 04:19, 8 March 2011 (UTC)[reply]

Presentation of tables[edit]

Except in the sex bias case (where there are more than two departments), many of the exampls presented seem to have exactly the same structure and could be presented in the same format. Presenting the various examples in different formats seems confusing to me. If the point of doing so is that different readers will "get it" from different ways of presenting it, it would be clearer to present THE SAME example in different formats. My suggestion would be to present the various examples in more or less the same way. Personally, I prefer the presentation of the kidney stone example with the "group 1, 2, 3, 4" and the two effects listed. What do you think?--Nø (talk) 15:46, 8 March 2011 (UTC)[reply]

Another example[edit]

DavidWBrooks has reverted my newly added example of Simpson's Paradox, which read as follows:

The National Assessment of Educational Progress average test score in mathematics for American 9-year-old children rose, from 1978 to 2004, by 10.0%. But the average score of white 9-year-olds rose by 10.3%, that of Hispanics by 13.3%, that of blacks by 16.7%, and that of all others by 12.8%.^[1] Thus while no racial/ethnic group experienced a gain of less than 10.3%, the children as a whole experienced a gain of only 10.0%, a result that is due to the shift over time in the percentages of the various groups in the total.

His reason for reverting was that (1) it's a weak example since the composite went up, not down, and (2) we really don't need another example. As for (1), I see his point, but I propose adding the following to illustrate the importance of the example:

Jack Jennings of the Center on Education Policy uses these data in asserting^[2] that when the composite data are used, "one important trend tends to be overlooked -- namely, the notable gains made by African American and Latino students in reading and math achievement".

As for point (2), I definitely disagree with it -- just five examples are not enough in my opinion. The more examples we have, the more likely a reader is to find one that resonates with him, one that he can latch onto as, for him, a memorable example. Different people will be prone to latch onto different examples, but I'd bet that more people will latch onto the test scores example than, say, the kidney stone research example.

Comments, anyone? Duoduoduo (talk) 21:08, 9 May 2011 (UTC)[reply]

Since I removed it, my thinking is obvious! I think (just my opinion) Duoduoduo might be interested in this example at least as much for its intriguing policy implications, rather than as a good example of the paradox. The point of examples in a wikipedia article is to illustrate the concept, not to make an argument for it or to cover all possible bases - and five examples is way more than enough to do that. IMHO, of course! - DavidWBrooks (talk) 22:27, 9 May 2011 (UTC)[reply]

I think it's a good real world example that seems to differ from the existing examples because it is not the actions that are changing over time but the composition of the actors. Mathematically it's the same, but I find it intuitively quite different. If space is an issue, I prefer it to the made up Bart/Lisa, Wikipedia-emphasizing example (though I like the thoughtful math put into that example). -- Michael Scott Cuthbert (talk) 15:32, 13 April 2012 (UTC)[reply]

How many examples do we need?[edit]

The article has six examples of the paradox appearing in real-world situations, which is IMHO excessive. I'd like to remove this one, because it's the least detailed and informative; in fact, it's kind of confusing:

Health care disparities

An examination of racial differences in the management of localized prostate cancer in Pennsylvania simultaneously revealed that whites were more likely to receive prostate surgery than blacks, that whites and blacks were equally likely to get surgery, and that blacks were more likely to get surgery than whites. This example statistical analysis used hypothetical data. All of the above conclusions were correct, but they reflected answers to subtly different questions that relied on different parsings of the same aggregate data.[15]

Any objections? - DavidWBrooks (talk) 17:38, 23 September 2011 (UTC)[reply]

Agree to deleting this example. For one thing, the above text is a travesty of the referenced material. For another, the data in the reference do not in fact provide an example of Simpson's paradox as there is no reversal. Qwfp (talk) 18:01, 23 September 2011 (UTC)[reply]

Let's do it, then. *POOF!!!!*- DavidWBrooks (talk) 18:10, 23 September 2011 (UTC)[reply]

Civil Right Act of 1964[edit]

"This arose because regional affiliation is a very strong indicator of how a congressman or senator voted, but party affiliation is a weak indicator." This statement is obvious from the chart, and can even be made more formal. Let's say I pick a Senator or Congressman that voted on the Civil Rights Act of 1964 in a uniformly random way, and you have to guess whether they voted for or against the Civil Rights Act. You get to ask one of two questions: "Do they represent a formerly Confederate State?", or "Are they a Republican or Democrat?". Which question do you ask?

Which question you should ask is obvious: asking about the party affiliation is absolutely no use in making your guess, they best you can do after asking this question is to simply guess "yes", which will be correct 70% of the time. On the other hand, if you ask what region of the country they represent, you can be correct 91% of the time: you'll be right 90% of the time by guessing "yes" if they come from the north (which happens 75% of the time), and you'll be right 94% of the time by guessing "no" if they come from the south (which happens 25% of the time). Obscuranym (talk) 15:56, 14 June 2012 (UTC)[reply]

I have removed the Civil Rights example, partly because we're getting too many real-world examples - we still have four, all of which are pretty well known - and partly because it's not sourced. The data and analysis may be accurate, but they're not reported elsewhere, as the other examples are. Since we don't really need this many examples - the situation is quite clear that it crops up in reality in many different circumstances - it's no drawback to just get rid of it. - DavidWBrooks (talk) 21:33, 14 June 2012 (UTC)[reply]

I think the Civil Rights Act example should be restored. A quick google search confirms that many math websites use this example when discussing Simpson's Paradox and specifically cite this Wikipedia article. This suggests that math educators consider it at least as relevant an example as the others. In fact, the Wikipedia article on the Civil Right Act itself links back to this article. This example should not have been removed without updating that page as well. The example is not original research since the data is taken directly from the CRA Wikipedia article. The text in the example could be improved but the example should be restored. I will update the text and restore the example unless there is a reasonable objection. - Ricklethickets (talk) 11:11, 21 July 2012 (UTC)[reply]

If you must, but add a good reference and make sure it's clear, because the previous wordy example wasn't. The websites that "cite" the article seem to be mostly wikipedia scrapers - a very circular argument, at best. To be honest, I think it's a lame example of the Simpson's paradox because the issue it examines is not very clear, a mix of geography and politics that requies a knowledge of that period of American history to seem surprising. The other examples are much more straightforward. - DavidWBrooks (talk) 13:38, 21 July 2012 (UTC)[reply]

Suggesting an intial simpler example[edit]

I suggest to put an even simpler and clearer example at the begining like this:

Boys and girls applyed for physics or math scholarship

10 boys

10 girls

2 boys applyed for math => 1 awarded this is 50%

8 boys applyed for physics => 2 awarded this is 25%

9 girls applyed for math => 4 awarded this is 44.4%

1 girl applyed for physcis=> 0 awarded this is 0%

in total:

3 boys out of 10 had scholarship this is 30%

4 girls out of 10 had scholarship this is 40%

--Wisamzaqoot (talk) 22:17, 31 July 2012 (UTC)[reply]

This is clearly sexist. 71.220.59.235 (talk) 07:27, 20 October 2013 (UTC)[reply]

Psychologists section[edit]

If psychologists really say this to their subjects, then it is they who are confused:

"Psychological interest in Simpson's paradox seeks to explain why people deem sign reversal to be impossible at first, offended by the idea that a treatment could benefit both males and females and harm the population as a whole. "

I think I understand what is trying to be said, but no sane person could describe a situation where a treatment benefits males and females but not the population as a whole. Every member of the population is M or F. Therefore every member will benefit. End of story.

Now, you could certainly have a Simpson's paradox in such a case, but you would have to say something like "a treatment could benefit both males and females, yet a group receiving the treatment did worse on average than a group not receiving it"

When you say it like this though, its not very counterintuitive, if you've read the earlier part of the article. Hence I think it's not needed and I just removed the illogical clause. — Preceding unsigned comment added by Wstrong (talk • contribs) 19:10, 16 February 2013 (UTC)[reply]

Graph of Bart & Lisa Example[edit]

I believe that the 'Bart' graph (identified as the lower one) in this example is seriously misleading. The percent of articles improved by Bart (14.2% in the 1st week and 100% in the 2nd week) do not fit at all with the graph, which appears to show the opposite: a much greater percent improvement in the first week.

Less seriously, but still confusingly, the height of the bars appears to be only crude estimates. Bart's high contribution of 100% appears to be quite the same height as Lisa's high contribution of 71.4%, while Lisa's low contribution of 0% appears to be the same height as Bart's low contribution of 14.2%.

All in all, a reader has a better chance of understanding the paradox if he/she ignores this graph entirely.Stoddj (talk) 21:51, 8 July 2013 (UTC)[reply]

Introductory Graph[edit]

The graph next to the introduction seems to be misleading, as in that case the groups are distinguished by the variable in which the trend appears as opposed to the examples in which the groups are distinguished by some other variable.Yehoshua2 (talk) 06:15, 9 September 2013 (UTC)[reply]

I agree. The examples and the graphic are quite different forms of Simpson's paradox and I don't see how one is explained by the other. So it seems odd to have a linear-trend-reversal as the most prominent graphic but then only give examples of ratio-reversal. I suggest adding (or replacing) another example that explains that graphic in more detail.

Alternatively, why not use the introductory graphic of the Gerrymandering article as the lead since it is the prime example of ratio-reversal? Georg.anegg (talk) 11:32, 8 November 2017 (UTC)[reply]

Perhaps a change, but don't use the gerrymandering graphic - that's not an obvious illustration of Simpon's paradox at all. It will confuse readers who know nothing about the topic. - DavidWBrooks (talk) 11:59, 8 November 2017 (UTC)[reply]

Joke[edit]

This paradox is related to the old joke about the man who left Scotland for England and thereby raised the average IQ of both countries. It is particularly clear in the joke that the overall average cannot change, because the two situations (before and after) are simply different partitionings of the same set. Yet the sub-averages can both move in the same direction. (They both go up if the Scotland average is higher than the England average and the man is in between.)

You could modify the joke so that the overall average moves in the opposite direction, e.g. if two men leave Scotland and only one goes to England. But that would not only reduce the elegance of the joke; in fact, the constancy of the overall average in this case brings out the true nature of the paradox: not only can the overall average and the sub-averages move in strictly opposite directions, but more generally the overall average and the sub-averages are decoupled in a surprising sense. — Preceding unsigned comment added by 163.1.246.64 (talk) 14:26, 16 January 2014 (UTC)[reply]

Note that the initial populations of Scotland and England need not have any particular ratio for this to work. I haven't thought for long enough yet about the precise relationship between this and the other examples.

Does Simpson's Paradox always disappear when causal relations are brought into consideration?[edit]

Does Simpson's Paradox always disappear when causal relations are brought into consideration, as the text currently implies?

In the example of Lisa and Bart, I do not see any causal relationships being brought into consideration.

Mcamp@cinci.rr.com (talk) 00:58, 16 August 2014 (UTC)[reply]

"UC Berkeley gender bias" departments[edit]

what are the respective departments. I don't seem to able to find them in the given citation or the actual research paper. 92.4.96.96 (talk) 20:13, 23 May 2016 (UTC)[reply]

Refs[edit]

^ [1] for the racial breakdown, and [2] for the combined results.
^ [3]

Let's remove the whole Bart and Lisa section[edit]

Somebody placed a "tone" hatnote on this article but didn't give any details about what bothered them. I hate it when editors do that and often remove such hatnotes, under the assumption that if they can't be bothered to explain their reasoning then the rest of us shouldn't be hassled by their concerns, but in this case he/she/it has a point.

I suspect the problem is the long and clumsy section titled "Description," which gives an imaginary example of the paradox involving Bart and Lisa (because no wikipedia article is allowed to exist without a Simpsons reference) and which is, indeed, written in an unnecessarily loose tone.

I do not believe the section is necessary at all, because we have several real-world examples that provide just as much illustration. I would like to kill that section altogether. What do others think? - DavidWBrooks (talk) 18:28, 25 August 2016 (UTC)[reply]

Well, I don't know if "no wikipedia article is allowed to exist without a Simpsons reference", but given the title of this page, I see why this particular one would be a good candidate for a Simpsons reference ;) Joking aside, I've never been a big fan of this section -- "long and clumsy" sums it well indeed. In my mind, it can go. Schutz (talk) 18:11, 30 August 2016 (UTC)[reply]

It's been a week. I want to be cautious about making such a big change, deleting something that's been in the article for so long, so I'll ask again: What do others think of deleting the section? - DavidWBrooks (talk) 23:47, 1 September 2016 (UTC)[reply]

Actually, now that I have reread the section in more details, I see that it has changed since I last read it. In the past, the numbers used as example were in the order of 100, making the Bart and Lisa example similar to other, real-life, examples. I see that the example now uses a total of 5 events in each case, making it very easy to understand what is happening. In addition, one editor had the good idea of adapting two of my figures (the weighing scales and the vector interpretation) to this actual example. I think the result is quite nice, and actually adds to the whole article. However, everything from "Here are some notations:" still seems long and clumsy to me -- not to say that it adds little to what is in the tables. I would at least remove this part of the section, but the rest of the example is useful. (ideally, we should find a real examples using such low values, and use that one instead...). Schutz (talk) 08:54, 2 September 2016 (UTC)[reply]

It still seems unnecessary to me, but you're right that the "notation" section is really unnecessary. I have removed that portion, and the "tone" hatnote. - DavidWBrooks (talk) 13:05, 4 September 2016 (UTC)[reply]

Trump section[edit]

"This aggregate phenomenon of poor voters being seemingly less likely than the rich to vote for Trump is driven by the following facts: (1) the majority of voters are white; (2) the white are more likely to be rich; (3) the white are more likely to vote for Trump. Once we control for race, we find that poor voters are in fact more likely than the rich to vote for Trump."

I don't like the choice of words here. Is controlling for race necessary? One could also say, once we control for cherrypicking, we find that poor voters are less likely to vote for Trump. Daß Wölf 00:56, 14 November 2016 (UTC)[reply]

P.S. Also, we could easily imagine many ways to partition the 10,000 rich and 1,000 poor voters in the example (e.g. college graduates vs. others, baseball fans vs. others...) where the poor voters end up also less likely to vote for Trump. I can think of no obvious way to decide which partition is meritory. Daß Wölf 01:00, 14 November 2016 (UTC)[reply]

I have removed it - it is original research and speculation. Plus, we have enough real-world examples. - DavidWBrooks (talk) 01:02, 14 November 2016 (UTC)[reply]

The number of examples[edit]

Do we really need six examples (counting Bart and Lisa) for the paradox? They take up more space than all the other sections combined. I would keep only two or maybe three at most, that should be illustrative enough for anyone IMO. Daß Wölf 18:38, 3 June 2017 (UTC)[reply]

It is borderline excessive. We could certainly toss the low birth-weight paradox, referencing to it in "see also" - and frankly, I've never been a fan of the Bart and Lisa item, as above discussion shows; I don't think we need a fake example when we have such detailed description of real life examples. - DavidWBrooks (talk) 20:52, 3 June 2017 (UTC)[reply]

So I have removed the low birthweight paradox. - DavidWBrooks (talk) 01:36, 28 June 2017 (UTC)[reply]

Cogent visual argument up front[edit]

We should start off with a far more cogent visual argument?

Like some sketch with all of four groups misleadingly reversing the true overall trend, something akin to the image held back from us here at my Pat LaVarre [@PELaVarre] (July 29, 2017). "I see this trend here, there, everywhere. And it reverses when I zoom out. ~~ Jabawack MethodsManMD Aug/2016 ..." (Tweet) – via Twitter.

Ouch to see this sketch you would have to click out into Twitter, because I cannot upload that sketch directly here, because indeed I cannot "attest that I own the copyright on" this image. I only know I remixed it from Twitter's video transcode of their Gif of someone's two slides, it is copyright unknown to me.

It tumbled across my desk as F. Perry Wilson [@MethodsManMD] (August 10, 2016). "A study suggesting pasta consumption can reduce BMI is a great example of Simpson's Paradox..." (Tweet) – via Twitter.

It reached me today as a retweet from Emilio Ferrara [Jabawack] (29 July 2017).

I had forgotten which paradox was this paradox of 1899 Pearson et al., 1903 Yule, 1951 Simpson, but googling your work reminded me in a moment, thank you.

Pelavarre (talk) 19:06, 29 July 2017 (UTC)[reply]

A popular variant of Simpson's paradox?[edit]

Given that Men are more likely to be affected than women by some disease, and that young people are more likely to be affected, it is reasonable to conclude that young Men are the most affected group. But it's not true, of course.
— Preceding unsigned comment added by Georg.anegg (talk • contribs) 11:15, 8 November 2017 (UTC)[reply]

Suppose we have the following data of people affected by a disease, say:

	under 30	over 30
Men	9/10	0/3
Women	0/5	1/1

Thus Men (total) have a higher rate (9/13) than Women (1/6),

and under 30's (total) have a higher rate (9/15) than over 30's (1/4).

Yet it is not true that Men-under-30 have a higher rate (9/10) than Women-over-30 (1/1). Do you consider this a variant of Simpson's Paradox? In some sense, it's a double application of the fallacious subset principle:
If Men have a higher rate than Women, then Men-under-30 have a higher rate than Women-under-30.
If under 30's have a higher rate than over 30's, then Women-under-30 have a higher rate than Women-over-30.
Hence Men-under-30 have a higher rate than Women-under-30 who have a higher rate than Women-over-30.

I feel like this is a fallacy that's committed extremely commonly (e.g. by newspapers) and as such deserves mention. (I have looked in other places too but couldn't find it. Please let me know if I missed it.) Georg.anegg (talk) 12:08, 3 November 2017 (UTC)[reply]

It doesn't strike me as much different than the examples we already have - not really worth adding as yet another example to the article, in my opinion. - DavidWBrooks (talk) 12:30, 3 November 2017 (UTC)[reply]

Implications for decision making needs sources and is probably wrong[edit]

This section suggests that obviously Treatment A should always be preferred. I think that's wrong. Think about it. If you saw a product on Amazon with 4 stars and 1000 ratings (small stones-treatment B) vs a product with 4.5 stars and 10 ratings (small stones-treatment A), you'd probably trust the 4 stars of the first product more than the 4.5 of the second. So it's entirely possible that you'd similarly prefer treatment A to B on small stones. Can't prove it on the actual number set without math I don't know, but logically it seems plausible. At least enough to demand further evidence and sourcing for the assertion to the contrary 173.68.75.170 (talk) 05:36, 26 February 2018 (UTC)[reply]

SONG[edit]

Harper, M. https://www.causeweb.org/cause/resources/fun/songs/no-one-counted-simpsons-paradox

POEM[edit]

Lesser, L. (Winter 2010). Confounded. The Mathematical Intelligencer, 32(4), 53.  http://link.springer.com/article/10.1007%2Fs00283-009-9127-x  — Preceding unsigned comment added by 129.108.11.141 (talk) 21:09, 4 June 2018 (UTC)[reply]

Let's remove the Bart and Lisa section (redux)[edit]

Two years ago I suggested dumping the whole Bart and Lisa section - the made-up example. It was improved and the only other editor who responded at the time didn't mind it, so it stayed.

Reading through this article again, I'd still like to dump it and replace it with one of the real-world examples. I don't see how it explains anything better than several of the real examples, which carry more weight because they're not hand-waving.

Any thoughts? - DavidWBrooks (talk) 22:17, 10 November 2018 (UTC)[reply]

Nobody seems to be terribly excited about this. I might just kill it, then. - DavidWBrooks (talk) 14:01, 13 November 2018 (UTC)[reply]

OK, so I did it - although I kept the Vector Interpretation section, moving it down below all the real-world examples. - DavidWBrooks (talk) 21:25, 30 November 2018 (UTC)[reply]

The caption to the vector interpretation image should be changed so it doesn't reference an example that can no longer be found in the article. — Preceding unsigned comment added by 157.182.151.169 (talk) 23:28, 8 January 2019 (UTC)[reply]

Hi @DavidWBrooks: I've just seen your messages today and think the Bart/Lisa example easier for a layperson to understand that the real-world examples as the numbers are smaller, also making the vector graph easier to read. I'd prefer restoring it, but await a second opinion. Cheers, cmɢʟee⎆τaʟκ 18:02, 26 October 2019 (UTC)[reply]

P.S. The discussions above imply that at least some readers find the Bart-Lisa example useful as the numbers are much smaller and can be easily visualised:

#One_of_the_finer_Wiki_entries
#Do_we_need_the_fake_example?
#Definition_needed_for_"quality_modifier"?
#Let's_remove_the_whole_Bart_and_Lisa_section

If it is felt that there are too many examples, I'd rather replace the batting average one with this. Baseball statistics are not universally known about. If there is no objection, I'll remove it and restore the Bart-Lisa example.

Thanks,
cmɢʟee⎆τaʟκ 10:31, 3 November 2019 (UTC)[reply]

The batting average is very well known - probably the best-known example in the United States. It should not be removed. My objection to the made-up example is that it's made up, yet we have several very clear real-world examples which show that this is not a theoretical issue but an actual problem.

Wikipedia is not a how-to manual or a textbook (https://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not#Wikipedia_is_not_a_manual,_guidebook,_textbook,_or_scientific_journal) , and editors making up examples that they think are clearer to layfolk is classic textbook material. Finally, I don't think it IS clearer, it's just another example. - DavidWBrooks (talk) 19:48, 3 November 2019 (UTC)[reply]

Implications for decision making[edit]

The section Implications for decision making is good and useful, but it needs sourcing. Right now it's a textbook example of original research / synthesis, which wikipedia frowns on. - DavidWBrooks (talk) 14:15, 25 August 2019 (UTC)[reply]

Unfortunately, the implied interpretation of the kidney study is misleading. The aggregate success rates depend entirely upon the mismatched ratio of treatment assignment to large and small stones, and so the unweighted aggregates have no valid use in decision-making outside of the specific treatment proportions in the study. If we wanted to estimate the population-level effects of using treatment A over treatment B we would need to weight our samples to adjust them to population proportions of small and large stones (and for other potentially confounding factors). The population-level estimates would then agree with the indication-specific estimates.

In fact, the important question is how the data fit into our causal model, and that is only discussed in a very abstract sense in the third paragraph. If we could find good sources, I'd recommend expanding the causal model discussion with concrete examples and removing the preceding paragraphs entirely. For example, the Kidney example fits a causal model where A is indeed better in all cases, but biased assignment of treatments led to misleading overall estimates. The UC Berkley example fits a model where it does indeed have a problem with rejecting women applicants more often than men, but the fault lies in the under-availability of positions in highly-demanded departments rather than in the admissions process.72.234.243.59 (talk) 07:19, 7 August 2020 (UTC)[reply]

Systemic vs. overt discrimination[edit]

The discussion of the paradox provides an entree to thinking about systemic vs. overt discrimination. For instance, Berkeley really did admit a lower proportion of women than men. The paradox explains that the outcome is not overt sexism, but rather because admission rates are higher to the programs men favour and lower to the programs women favour. But thinking systemically, one should ask why Berkeley enables programs that men favour to admit more students? Similarly, looking at the death penalty sentencing example, one could ask why the death penalty is given more often when the victim is white vs. black. In other words, the paradox provides an explanation but adding a new variable isn't necessarily the end of the investigation. Perhaps there could be a new section after implications for decision making? Crowston (talk) 22:25, 10 June 2020 (UTC)[reply]

I propose to add the following after the UCB case:

This paradoxical result illustrates the difference between overt and systematic discrimination. Berkeley really did admit a lower proportion of women than men. The paradox explains that the outcome is not overt sexism, since women are if anything favored in admission in most departments, but rather because admission rates are higher to the programs men favor and lower to the programs women favor. The example illustrates that discrimination can arise from the allocation of resources, instead of (or as well as) from individual admission decisions. Thinking systemically, one should ask why Berkeley allows programs that men favor (e.g., engineering) to admit a higher proportion of students than the programs women favor (e.g., English)? Crowston (talk) 22:55, 12 May 2021 (UTC)[reply]

Hi, and welcome to Wikipedia. Something about what you're describing might possibly be able to be added to the article, but there are problems with this edit. Firstly, it's not really written in the kind of encyclopedic tone that Wikipedia articles are meant to be. Have a quick read of WP:Encyclopedic style – we shouldn't be, for instance, posing rhetorical questions to the reader. Secondly, Wikipedia is based off verifiability, which means we just try to summarise and reflect what is published in reliable sources, rather than introducing our own original research into articles.

If you find a source for the material you're talking about and clean up the prose a bit, it could maybe be okay, but I'm not entirely sure this is really on topic for an article about statistics. ‑‑Volteer1 (talk) 18:24, 23 May 2021 (UTC)[reply]

Death sentence[edit]

The section on the racial disparity in death sentences should, I think, be removed because it ends: "Radelet found that none of the aforementioned correlations were statistically significant" - That makes it a very poor example of the paradox and we have three other real-world examples.

What do people think? - DavidWBrooks (talk) 13:22, 6 January 2021 (UTC)[reply]

Well, no immediate response so let's remove it. - DavidWBrooks (talk) 21:58, 7 January 2021 (UTC)[reply]

Psychology[edit]

I don't understand the psychology section at all. I would rewrite it, but I have no idea what it means. Could someone who does fix it please? ElectricRay (talk) 07:34, 19 September 2021 (UTC)[reply]

Good point. It's so jargon-laden that it's hard to tell if it's gibberish. - DavidWBrooks (talk) 12:23, 20 September 2021 (UTC)[reply]

Section "Efficacy of Covid-19 vaccines"[edit]

Since someone previously deleted my added section, I would like to open a discussion about it. I think the example shows the enormous practical relevance of the paradox as it leads to an underestimation of the vaccine efficacy and hence probably to a lower vaccination rate which has deadly consequences. If there are too many examples, I suggest to delete the "Batting averages" example which has no practical relevance. It was criticised that the example is just speculation. While there is only one source indeed, other scientists supported Prof. Morris' argument. Moreover, the data used is offical data from the Israeli health authorities. — Preceding unsigned comment added by Deopax (talk • contribs) 16:14, 25 November 2021 (UTC)[reply]

@Deopax: I think that's an excellent idea. While a WP:RECENTISM counterargument could be made, it's very important that the examples be relatable in order to teach readers, and the current pandemic is probably the most widely relatable topic in statistics right now. I've dug up a few papers on the topic of COVID-19 and Simpson paradox that might be worth checking out: [6][7][8][9], see also: [10][11]. Daß Wölf 17:50, 28 November 2021 (UTC)[reply]

No question that current events like COVID-19 make for lively examples, but even with a ready consensus there is a pedantic effect of injecting a hot topic into an article that should be nothing but dreary mathematics. That's why trivia like baseball statistics are worthy. I thought about adding Nassim Taleb's short talk ( https://www.youtube.com/watch?v=XVRfBhy5vGI ) that uses a correlation of COVID-19 vaccines to higher death rates to demonstrate Simpson's Paradox, but even though he's a reliable source and he's correct (as far as I can tell), it's still a radioactive topic and personality that would invite an edit controversy. Richard J Kinch (talk) 03:15, 2 December 2021 (UTC)[reply]

Added some info in COVID-19 vaccine misinformation and hesitancy#Claims of inefficacy. Feel free to edit it further. Zach (Talk) 13:51, 23 February 2022 (UTC)[reply]

The section "Criticism"[edit]

I haven't checked the sources, and those critisisms may exist, but they all seem ill-informed to me. Do we need them here?

One criticism is that the paradox is not really a paradox at all, but rather a failure to properly account for confounding variables or to consider causal relationships between variables.

Most paradoxes are failure to do something properly. Is the paradox of the heap, say, "really a paradox"? Or any of Zeno's paradoxes?

Another criticism of the apparent Simpson's paradox is that it may be a result of the specific way that data is stratified or grouped. The phenomenon may disappear or even reverse if the data is stratified differently or if different confounding variables are considered. Simpson's example actually highlighted a phenomenon called noncollapsibility, which occurs when subgroups with high proportions do not make simple averages when combined together. This suggests that the paradox may not be a universal phenomenon, but rather a specific instance of a more general statistical issue.

What is the critisism here? It seems to me to reiterate what is actually the point of the paradox as a cautionary tale.

Critics of the apparent Simpson's paradox also argue that the focus on the paradox may distract from more important statistical issues, such as the need for careful consideration of confounding variables and causal relationships when interpreting data.

Again, isn't the point of the paradox to promote proper consideration of confounding variables? Nø (talk) 13:45, 13 May 2023 (UTC)[reply]

[1] [1] for the racial breakdown, and [2] for the combined results.

[2] [3]

[1]

[2]