Posts Tagged ‘truth’

June 17 2009

Statistical vindication

by Hang

A few days ago, I wrote about a case of a seemingly fascinating graph which I felt was used inappropriately. I was rightfully castigated in the comments for being too harsh but, to me, it gave the impression of a pattern when there really was none. In reply to some of the comments, I made the observation that

The only reason I wrote about it was because, I was surprised that even I as a reasonable trained statistics guy was momentarily caught off guard by it. Clearly, you meant nothing malicious by it but it’s a technique that could be used for malicious purposes so I wrote about it.

Now, in the wake of the Iranian Elections, it seems like my speculation has been somewhat vindicated. Andrew Sullivan posted what he claimed was the red flag that proved the Iranian elections were a fraud. And it seems eminently convincing. Luckily, Nate Silver produced a null hypothesis graph based on the US elections and demonstrated that the “red flag” was just a case of the exact same statistically fallacy I wrote about a week earlier.

Anything you think is either unoriginal, wrong or both

by Hang

I first discovered this obviously wrong truth when I was doing my honors thesis. Time and again, I would come up with a novel idea or a neat algorithmic trick. Some of them, I would discover had already been invented 3, 5, sometimes 10 years before I came up with it. But the ones I was absolutely sure nobody had published before because I had scoured the literature and covered every approach. Well, all of those original ideas turned out to have some hidden, unforeseen flaw that rendered them either trivial or actively stupid. This lead me to formulate the belief that “anything you think is either unoriginal, wrong or both“. Like all obviously wrong truths, it has the paradoxical property of being obviously wrong and also true.

The premise for the statement comes from the simple observation that good ideas survive and bad ideas die. This means there exists an entire class of awful ideas that people come up with time and again only to eventually discover their wrongness and then abandon them. Every person who discovers them believes themselves to be wholly original since nothing of the sort exists in the world and each of them is met with disappointment, sometimes after many years of sweat and toil. But because failures are almost invisible, they leave no warning signs to future generations that this is an awful idea that should be avoided*.

Anything you think is either unoriginal, wrong or both” is an acknowledgment of your own stupidity. Your first instinct, when you come up with a new idea, should be to try and find out if anyone else has done it before. Your second instinct should be to try and find out if anyone’s done it before. Your third, forth and fifth instincts are to ask how come everyone else figured out this was a dumb idea and I haven’t? If you’ve gotten this far and you still haven’t discovered anything useful, you should start feeling a little bit uneasy, it probably means you weren’t smart enough to discover how wrong you are.

If you have discovered the prior art or the fatal flaw, then breathe a small sigh of relief. Unoriginal ideas are GOOD, wrong ideas are GOOD. An unoriginal but right idea is still valuable to all the other people who’ve never heard of it and chances are, if you’ve never heard of it, there will be a significant fraction of the population to which bringing this idea contributes value. Wrong ideas do more to teach you more about the world than right ideas because they teach you about some discrepancy between your expectations and the world, The corrective force of wrong ideas is what allows you to deftly cut to the core of any issue and tease out just where assumptions are weak and likely to fail.

But if you’re lucky, over the course of your life, you’re going to stumble across many ideas which are both original and right, in which case it’s still better to treat them as unoriginal and wrong. Believing an idea is unoriginal and wrong makes that idea do more work. You attack it more fiercely and from more angles. You keep on asking people if the idea sounds familiar and you’re eager to seek feedback because you’re so damn curious to discover why it could be so wrong yet elude you for so long. In doing so, you disassociate the idea from your ego so that you can take criticism about it calmly and dispassionately. Eventually, that drive of curiosity will force you to action, just to finally prove how this idea is flawed. Treating an idea as unoriginal and wrong means that the only standard you’re willing to accept is success. This brings a clarity or purpose that cuts through the confusion when executing upon that idea. Other people may be willing to make excuses or caveats that salve their ego but, as far as you’re concerned, if an idea is not successful, it’s not right**.

Anything you think is either unoriginal, wrong or both” is an idea that also applies to itself. I’ve been slowly chewing over this idea for almost four years now and it’s been frustrating to me that so far, I haven’t been able to find someone else that’s expressed it as a similar sentiment which by de facto, makes it wrong. I’m putting this out there to invite the embarrassment of someone pointing out the obvious source or the obvious flaw that I’ve managed to miss for so long. Please, tell me how I’m stupid, it would be a welcome relief.

*Some people, when first discovering this problem, come up with elaborate schemes of recording all of these common awful ideas so that future generations can avoid them. This, unfortunately, is a common awful idea.

** not right and wrong are different concepts in the same way that not being a millionaire is different from being homeless.

June 10 2009

Another way to mislead with statistics

by Hang

I ran into a great blog post this morning on Using Mechanical Turk to evaluate search engine quality and came across this seemingly fascinating graph:

Something about that graph just invites reflection. What do marlboro schools, fidelity and ford have to do with each other? Is Bing better at boring queries and Google better at sexy ones? It wasn’t until 5 minutes in that I thought “hang on, shouldn’t the null hypothesis generate a binomial distribution anyway?”

So I decided to run my own simulated Google vs Bing test in which people guessed at random which search engine they liked and got this:

Null Hypothesis for Google vs Bing

As you can see from the simulated graph, asking why marlboro public schools did so much better on Google and tax forms did so much better on Bing is essentially as useful as asking why Query 37 is so much more Google friendly that Query 22.

The blog entry claims that there was a minor but significant  (p < 0.04) difference in overall quality but it’s obvious from the null graph that no individual query is statistically different in quality (I’d unfortunately have to dig out my stats textbook to figure out what test I would need to run to verify this but I’m pretty confidant on my eyeball estimate).

I understand the urge, when you have an interesting dataset to throw up any and all cool visualisations you have, I’ve been guilty of doing it myself many times. But responsible presentation of data requires a certain discipline and responsibility. Each graph should tell you at least one interesting piece of true information and strive to minimize the amount of false information presented. Unfortunately, the aforementioned graph cannot possibly communicate any true information because there is no true information to present and the false information is amplified precisely because it is such a fascinating graph. The worst of both worlds.

If I were the poster of the original piece, the way I would have deliberately not included that graph but I would include the following sentence:

Given our small sample size, we could not find any particular type of queries in which either Google or Bing significantly excelled at. It may be that Bing is better at product searches or Google excels at medical queries but no evidence of this was found in our study.

Even this is problematic but at least it includes several pieces of true information.

Like I said in a previous post on lying through the use of not statistically significant:

Sometimes, I swear, the more statistically savvy a person thinks they are, the easier they are to manipulate. Give me a person who mindlessly parrots “Correlation does not imply causation” and I can make him believe any damn thing I want.

March 27 2009

Not statistically significant and other statistical tricks.

by Hang

Not statistically significant…

Most people have no idea what “Not statistically significant” means and I don’t see the media being too eager to fix this.

Say you read the following piece in a newspaper:

A study done at the University of Washington showed that, after controlling for race and socioeconomic class, there was no statistically significant difference in athletic performance between those who stretched for 5 minutes before running and those who did no stretching at all.

What do you conclude from that? Stretching is useless? WRONG.

Here’s what the hypothetical study actually was: I picked four random guys on campus and asked two of them to stretch and two of them not to. The ones who stretched ran 10% faster.

Why is this then not statistically significant? Because the sample size was too small to infer anything useful and the study was designed poorly.

All “not statistically significant” tells you is that you can’t infer anything from the study but word the study carefully enough and you can have people believe the opposite is true.

Have you ever heard the claim “There’s no statistically significant difference between going to an elite Ivy League school and an equally good state school?” Perhaps from here, here or even here?

Well, from this paper (via a comment in an Overcoming Bias post):

For instance, Dale and Krueger (1999) attempted to estimate the return to attending specific colleges in the College and Beyond data. They assigned individual students to a “cell” based on the colleges to which they are admitted. Within a cell, they compared those who attend a more selective college (the treatment group) to those who attended a less selective college (the control group). If this procedure had gone as planned, all students within a cell would have had the same menu of colleges and would have been arguably equal in aptitude. The procedure did not work in practice because the number of students who reported more than one college in their menu was very small. Moreover, among the students who reported more than one college, there was a very strong tendency to report the college they attended plus one less selective college. Thus, there was almost no variation within cells if the cells were based on actual colleges. Dale and Krueger were forced to merge colleges into crude “group colleges” to form the cells. However, the crude cells made it implausible that all students within a cell were equal in aptitude, and this implausibility eliminated the usefulness of their procedure. Because the procedure works best when students have large menus and most student do not have such menus, the procedure essentially throws away much of the data. A procedure is not good if it throws away much of the data and still does not deliver “treatment” and “control” groups that are plausibly equal in aptitude. Put another way, it is not useful to discard good variation in data without a more than commensurate reduction in the problematic variation in the data. In the end, Dale and Krueger predictably generate statistically insignificant results, which have been unfortunately misinterpreted by commentators who do not sufficient econometric knowledge to understand the study’s methods.

In other words, the study says no such thing, it simply says the study itself was not sufficient to prove that Ivy League educations made you more money because the data wasn’t good enough and yet the media has twisted this into a positive assertion that state schools do indeed make you as much money as Ivy Leagues.

I’m generously inclined to believe that most cases that I see of this error are caused by incompetence but it’s pretty trivial to see how this could be used for malice. Want the public to believe that Internet usage doesn’t cause social maladjustment? Just design a shitty study and claim “We found no statistical difference in social competence between heavy internet users, light internet users and non users”. Bam, half the PR work has already been don for you.

Controlling for…

Here’s another statistical gem I see all the time:

An analysis done at the University of Washington showed that there was zero correlation between race and financial attainment after controlling for IQ, education levels, socioeconomic status and gender.

Heartwarming right, it means if we put blacks and whites in the same situation, they should earn the same amount of money. WRONG.

The key here is to see that we’re looking for financial attainment and controlling for socioeconomic status. Those two things mean the same damn thing. Basically, all this study told us was that being rich causes you to be rich.

Most people view the “controlling for” section of statistical reporting as a sort of benign safeguard. Controlling for things is like… due diligence right, the more the better… It’s easy to numb people into a hypnotic lull with a list of all the things you control for.

But controlling for factors means you get to hide the true cause for things under benign labels. That’s why I’m always so wary of studies that control for socioeconomic status or education levels, especially when they don’t have to. Sure, socioeconomic status might cause obesity but what causes socioeconomic status.

Conclusion

When people do bother to talk about statistical manipulation, they usually focus on issues of statistical fact: Aggressive pruning of outliers, shotgun hypothesis testing and overly loose regressions. But why bother with having to sneak poorly designed studies past peer review when you can just publish a factually accurate study which implies a conclusion completely at odds with the data? That way, you sneak past the defenses of anyone who actually does know something about statistics.

Sometimes, I swear, the more statistically savvy a person thinks they are, the easier they are to manipulate. Give me a person who mindlessly parrots “Correlation does not imply causation” and I can make him believe any damn thing I want.

March 20 2009

The fallacy of the facebook redesign

by Hang

Wow, I’ve been talking about facebook a lot recently. I’ve been hearing this argument a lot about the redesign:

  1. Here is some evidence that users hate the new design
  2. But users have hated every redesign of facebook when it was first rolled out
  3. And eventually users learned to love it

There’s an implicit logical fallacy in this: Just because some users hate good redesigns because they’re change-phobic, does not mean that hating it is evidence for it being a good redesign. Unfortunately, facebook as a culture has learned to ignore user feedback until it gets to the point of overwhelmingness for precisely this reason. Mark my words, this rebellion is not going to be quelled like the last few were because they really are pointing to systematic failures in the design.

Copyright ©2009 BumblebeeLabs — Theme designed by Michael Amini