Statistics: There’s a 90.75826 chance you are wrong

Listen to this article

Whether the topic falls within politics, sports, science, or last-minute college essays, most people are keen to cite statistics as evidence that their team is winning and their point of view is correct. Of course, you don’t need a statistics class to know that the majority of Americans haven’t taken a Statistics class — and those who have probably didn’t ace it.

Naked Statistics by Charles Wheelan, who also wrote the enlightening and straightforward Naked Economics, separates fact from manipulative dispersion of data in his 2013 book spelling out common mistakes people make when applying or analyzing stats. The author gives you a glimpse at his intentions in his acknowledgements section, where he admits drawing inspiration from Darrell Huff’s How to Lie with Statistics, and explains that Naked Statistics is neither a textbook nor a thorough examination of the subject, but rather an attempt to help people become a bit more numerically literate.

If you’ve been listening to the debates, you know politicians are fond of spouting off percentages, and we’re often willing to take them at their word when we hear something along the lines of, “According to the last such-and-such news report, more than 80 percent of American students are graduating from high school …” (For a moment, pretend that a political candidate would say something positive rather than fear-mongering.)

Wheelan, in one of his earliest chapters, dispels the myth that a true statistic needs no further explanation. Take this example from page 40:

“Even when we agree on a single measure of success, say, student scores, there is plenty of statistical wiggle room. See if you can reconcile the following hypothetical statements, both of which could be true:
Politician A (the challenger): “Our schools are getting worse! Sixty percent of our schools had lower test scores this year than last year.”
Politician B (the incumbent): “Our schools are getting better! Eighty percent of our students had higher test scores this year than last year.”

Here’s a hint: The schools do not all necessarily have the same number of students. If you take another look at the seemingly contradictory statements, what you’ll see is that one politician is using schools as his unit of analysis (“Sixty percent of our schools …”), and the other is using students as the unit of analysis (“Eighty percent of our students…). The unit of analysis is the entity being compared or described by the statistics — school performance by one of them and student performance by the other. It’s entirely possible for most of the students to be improving and most of the schools to be getting worse — if the students showing improvement happen to be in very big schools.”

When it comes to studies, methodology and sample size is everything. While statistics can be one of the most useful tools available, it can also be adapted and used as a way to manipulate simpler truths.

For a real life example, take the last Democratic debate. Hilary Clinton claimed to have received a mere three percent of her campaign contributions from “people in the finance and investment world,” which is technically true but doesn’t give an accurate picture of the total contributions — which she neglected to reveal. That percentage, according to FactCheck.Org, is double her answer at six percent if she includes donations from other groups. While that’s one of the least demonizing partial-lies she (or many of the politicians on both sides) has told, it’s a fairly clear-cut example of how one simply neglects to include the larger sample size. It also means that roughly half of her campaign donations have come from Wall Street and commercial banking industries. How … democratic?

I admit to throwing statistics around without thinking, even in the last year, and in these many sometimes-poignant, sometimes-flippant interchanges with my keyboard. (I love you, Dino Toshiba.) Using nationwide polls, I’ve probably drawn some ignoramus conclusions from percentages of random samples, distributed in no strategic fashion, and with no consideration of the root population represented. That’s because: (1) I can be really lazy I don’t always fact check more than two sources, and (2) We’re taught that large sample sizes mean the data somewhat safe and reliable.

From there, I’ll just go to another myth; We are taught to believe in the myth of predictability from probability.

While probability is a scientifically verified phenomenon, it does not carry the level of certainty so often attached to the term. Just because something is determined probable, Wheelan explains, does not mean we shouldn’t constantly employ the same level of discernment, which we would normally rely on in our critique of the outcome.

“The less likely it is that an outcomes has been observed by chance, the more confident we can be in surmising that some other factor is in play,” he writes.

“Of course I’m going to win. The rest of the candidates are losers.” (YouTube)
“Of course I’m going to win. The rest of the candidates are losers.” (YouTube)

He gives examples of various incidents where the “probability” of someone winning was used to determine if they cheated, then throws in the counter example of a man who won the lottery twice. Probable? Not at all. Fraudulent? Also negative.

Analyzing data has always posed problems for those of us with limited (or no) experience with numerical abstraction, but common sense and a few extra minutes of processing can save you world of misunderstanding.

Nine pages in Wheelan’s book (pages 100-109) simply focus on the most common mistakes people make while analyzing data, as summarized in dumb-speak below.

(1) We assume events are independent when they are not. In other words, we attach value and significance to separate events as a whole, drawing incorrect conclusions. Wheelan gave the example of SIDS deaths, writing that the statistical improbability of more than two babies in one family dying of SIDS was at one point used as sufficient evidence to lock up one or both parents for murder. The two events—SIDS as a tragic cause of death, and murder as a potential cause of death—were linked due to mishandling of and reliance upon misused statistics.

(2) The opposite of #1. The most common example of this is the gambler’s fallacy. Your number hasn’t come up yet, so it’s more likely to come up right? No, your odds are exactly what they were when you stepped into a place designed to take your money.

(3) People believe in clusters like they believe in being “naturally thin.” Clusters are usually a combination of factors, or a product of pure chance. Very rarely can we trace clusters of cancer or other seemingly repeated events to a single variable.

(4) The prosecutor’s fallacy is in full play. Here, Wheeler addresses the pesky issue of examining the full context by giving this example: “Suppose you hear testimony in court to the following effect: (1) a DNA sample found at the scene of a crime matches a sample taken from the defendant; and (2) there is only one chance in a million that the sample recovered at the scene of the crime would match anyone’s besides the defendant. (For the sake of this example, you can assume that the prosecution’s probabilities are correct.) On the basis of that evidence, would you vote to convict?” Well, given that the evidence is still circumstantial, that the database containing random DNA samples may very well have several matches that are relatively high in similarity, and that the presence of the DNA doesn’t give us proof that its carrier was a murderer, the author hopes you would not push for conviction.

(5) We regress or reverse to the mean. From athletes to CEOS to students who “expertly” toss a majority of “tails” in a coin toss example Wheeler readily conducts in classroom visits, it’s easy to overlook the actuality of someone’s performance because they are first touted as extraordinary due to a limited run and several wins. One great season does not a continuously record-breaking team make. (As a Seattle native, recalling the legend of the Seahawks’ double Superbowl, I am all too aware of this one.) Time doesn’t just heal most wounds; it evens most odds.

(6) We discriminate…statistically, of course. From racial profiling to sexist insurance policies, we look at the probability of “risk” involved with male drivers, or we justify Patriot-Act level violations at the airport, from a simple glance at the historical statistics governing young men and Middle-Eastern terrorists. Nevermind the innocent, they claim, because the probability has climbed higher with the guilty few. This is yet another case where a potentially useful tool is only as good as its employer.

The primary lesson to be learned from Wheeler’s book, as well as our arguably statistic-crazed yet scientifically-illiterate nation, is this: If one is to employ statistics, one must also employ reason, context, a fair amount of good judgment, a bit of research, and possibly some restraint where restraint is due.

With that said, I hope you know that One Hundred Percent of what you read on the internet is true, in that it is written by people who truly exist, or produce spam that truly exists. That includes this site, which is part of the internet.*

Thanks for reading, and please check out Naked Statistics or How to Lie with Statistics for actual enlightenment on this subject. Both are excellent reads.

 •••• •••• ••••• •••• ••••

Editor’s Note: *We resemble that remark.