Monday, July 11, 2011

Lack of understanding statistics.

There's an article on yahoo that illustrates well the sort of flawed conclusions people draw when they look at statistics and don't know how to interpret them. As I like to say, numbers don't lie, but they can mislead. In this article, the author writes about the "home run derby curse" which insinuates that players who participate in the home run derby "mess up their swings" and have poor second halfs. This is a legend that has been very popular ever since Bobby Abreu had a legendary performance at the HRD and had a poor second half following. He compares slugging percentages of HRD participants before the HRD and after the HRD and sees a noticable drop (-.130) which he concludes is evidence that the HRD messes players up.

But wait.

The problem is regression towards the mean. Just because a player hit a lot of homers in his first half doesn't necessarily mean we should expect him to do the same in the second half. It is far more likely that in the second half his performance will fall more in line with his career rate of performance. In fact, the fact that the player has been invited to participate in the HRD already indicates he has had a great first half which is more than likely above his regular rate of performance. This is what we call sampling bias.

To be fair, the author indicates this somewhat in his piece, and to control for it compares it to sluggers who did not participate in the HRD. He finds that sluggers that did not participate had an increase in their slugging in the second half, which he believes lends credence to the hypothesis that the HRD messes up swings. I disagree. The fact that these sluggers were not invited to the HRD indicates that they were having a poorer first half than the players who WERE invited, and are much more likely to have a "better" second half by the same laws of regression. By better I just mean compared to their first half, not compared to the other pool of players.

If I am a very bad player who fluked his way into hitting 25 bombs in the first half of the season, and then I stub my toe during the all-star break and only hit 4 homers the rest of the season, it isn't very logical to conclude that my poorer second half was caused by my stubbed toe. It is much more likely caused by the fact that I am a very bad player who had a fluke run.

In a way, this is exactly what happened with Bobby Abreu. Abreu is, of course, not a very bad player, but he has never been considered a homerun hitter. He didn't hit a lot of homeruns in his career before that HRD and the HRD was a huge surprise. So the fact that he hasn't hit many homers since the HRD shouldn't come as a huge surprise.

Does all of this mean that the HRD has no effect? Of course not, but we have to analyze it far more carefully. What we need to do is compare the expected rate of hitting for players participating in the HRD and compare them to reality. How many homeruns should we expect out of Curtis Granderson in the second half, based on his career to date? And how does that projection compare to what actually happens? If player consistently under-perform these projections after the HRD then I can start to buy into it as a "curse."

No comments: