engineering a better experience
Monday, November 14, 2011

Anscombe's Quartet

by The Metaist


Anscombe's quartet was created by Francis Anscombe to demonstrate the effect of outliers. All four datasets have the same average, variance, correlation, and linear regression, but look completely different when graphed.

Graphs of Anscombe's quartet

Lies, damned lies, and statistics!
(Image: Wikimedia)



Anscombe's quartet is a neat example of how even large differences between datasets can be hidden by outliers. There is a false sense that the sets are comparable in some way just because they share many statistical properties.

Luckily, a simple graph of the sets reveals the effect. However, graphs can deceive in several other ways:

  • Captions - A graph's caption can create an expectation that will distort our perception of the graph (the same applies to photographs).

  • Playing with Scales - There are several ways this can be accomplished:

    1. A compressed or expanded scale can make changes seem larger or smaller.
    2. Gaps in the scale can make it harder to make judgements of how large a value is.
    3. Different scale units or sizes can make direct comparisson between two datasets seem possible—even when it doesn't make any mathematical sense.

While I'm interested in the best techniques for presenting data, it's important to understand that there is more to the experience of reading graphs than just presenting the information—our choice of presentation can divert the reader away from the underlying truth about the data.

See Also