Notebooks

Simpson's Paradox

…a trend appears in several different groups of data but disappears or reverses when these groups are combined. This result is often encountered in social-science and medical-science statistics and is particularly problematic when frequency data is unduly given causal interpretations. The paradox can be resolved when causal relations are appropriately addressed in the statistical modeling. (From Wikipedia)

Are subgroup discovery and Simpson's paradox related? I believe they are, though I haven't fully developed the idea yet. My thinking is that computing a global regression line can be misleading when the data contain strong local patterns that are not aligned with each other. I don't think this is something new, but it would be interesting to explore the idea a bit further.

Recommended

Pearl, J. (2016). Simpson’s Paradox: The riddle that would not die. Blog Post
Bookbinder, H. (2025, April 3). Simpson’s Paradox Explains the World Scriptorium Philosophia