Using statistics in basketball: the bar is higher
David Leonhardt in his Sunday New York Times "Keeping Score" column has been a pioneer in describing the ways in which statistical analysis has affected sports. (He writes about economics during the week, so, of course, everything he writes is gospel.) This week David writes that baseball "has found itself in the equivalent of a theological dispute about whether [it] is a game of mystery or of data, of statistics and analysis or of intuition and human instinct."
David points out while teams using statistical analysis, such as Oakland and Boston, have achieved a great deal, there is no denying the success of "traditionalist" teams, such as Atlanta and St. Louis. The article is fairly even-handed, but this passage appears to betray the author's feelings.
"Academic research, however, is pretty much on the side of statistics. Whether diagnosing patients or evaluating job candidates, human beings vastly overestimate their ability to make judgments, research shows. Numbers and analysis almost always make people better.
'There have been hundreds of papers on subjects from picking students for a school to predicting the survival of cancer patients,' said Richard Thaler, a University of Chicago economist who uses sports examples in his class on decision-making. When a computer model is given the same information as an expert, the model almost always comes out on top, Thaler said."
This last sentence begs the question, however, because traditionalists would argue that the "computer model" never has the "same information" as the scout or coach. And they would be right. The real question is whether the benefits of more data (often collected and analyzed in a more objective manner) outweighs the costs of a simplified model that necessarily ignores some aspects of reality. Thaler argues above that in most circumstances the answer appears to be yes. And in baseball, I believe the argument in most cases is yes.
But in basketball, I am not so sure. Recently, I received an e-mail from a friend who argues that "basketball stats are a really interesting challenge."
"There's a sense in which [basketball stats are] much more related to economics than baseball stats are, which I always found a bit boring although incredibly accurate and powerful as a game predictor. Baseball is mostly about a small number of repetitive hand/eye coordination tasks, while basketball involves constant maximizing interaction between optimizing actors on the court."
Tabulating statistics may very well be the best way to form predictions about the "repetitive hand/eye coordinating tasks" of baseball, but applying those same techniques to the game of basketball which "involves constant maximizing interaction between optimizing actors" may not prove to be as useful. The costs of a simplified model may be too high.
But do not interpret me to be saying that statistical analysis has no place in basketball. Instead the point I am trying to make is that basketball people are right to be skeptical of statistical analysis, because analyses based upon an overly simple model of the game of basketball often can be more misleading than useful.
A good example of this is the "possession usage vs. offensive efficiency" debate over at APBRmetrics. Dean Oliver, author of Basketball on Paper and consultant for the Seattle Supersonics, makes the following argument.
"Implying that all these high percentage, low usage shooters can ramp up their usage without penalty implies that the people running the NBA are not just a little wrong. It implies also that the fundamental nature of basketball is poorly understood. It implies that any sort of linear weights rating is wrong. . . .
It implies that pretty much every rating method is wrong, because the context in which players are being used is incorrect. [Dan Rosenbaum's] method, which is totally different from others here, has to be wrong because it is flawed by the decision to not let Fred Hoiberg shoot 25 [times] per game. This is not just a matter of a tiny little assumption that has to be proven. This is a principle that really underlies the game of basketball. It very much distinguishes it from baseball, where players take turns being on offense."
This argument by Dean highlights how important a solid understanding of the game of basketball is to good statistical analysis in basketball. But a solid understanding of statistics - perhaps moreso than what is necessary in baseball - is also critical in making the right judgments when using basketball statistics.
I have heard reports of a Western Conference general manager that is heavily using basic unadjusted plus/minus data in his evaluation of free agent acquisitions. I probably understand the nuances of working with plus/minus data about as well as anyone, and I am one of the biggest advocates for plus/minus data. But I shudder when I hear about this general manager.
It is easy to misinterpret what can be learned from plus/minus data, and I see mistaken analyses using these data more often than not. Teams do not play their players randomly. Match-ups matter. Roles matter. And trying to isolate the contribution of a player or two when ten players are on the floor at a time is a tough statistical feat. Hearing a general manager without extensive experience with statistical analysis is making heavy use of these data sounds to me like a recipe for disaster. Without a strong understanding of statistics, as well as a strong understanding of basketball, it is just too easy for statistics to be more misleading than useful.
Another example is Dallas who has for several years made use of adjusted plus/minus ratings in their coaching/front office decisions. And the consultants who do this work for the Mavericks - Wayne Winston and Jeff Sagarin - are unquestionably skilled data analysts. But they have never interacted much with the wider basketball statistics community, and I think this has made it more difficult for them to place their work in the proper perspective. (I cannot begin to describe how the APBRmetrics community has been influential in my thinking.)
In addition, my understanding is that these adjusted plus/minus ratings are largely treated as "raw data" and the coaches/front office are pretty much left to their own devices in interpreting/analyzing the data. This, in my opinion, is a huge mistake, which very well could result in very useful data produced by skilled analysts being more misleading than helpful for the Dallas coaches/front office.
Given all of this, I think it is very much an open question how useful statistical analysis can be in basketball decision-making. Done poorly, I think it can hurt teams. Done well, I think it can be a valuable asset. My sentiments are summed up pretty well in this passage by NickS at APBRmetrics.
"The reason to use stats in any field is because humans are poor at evaluating probability. We tend to see patterns where there aren't, overestimate the probability of low frequency events and, most importantly, have a tendency towards comfirmation bias -- looking for evidence that confirm our preexisting beliefs.
One of the things that's said in defense of stats in baseball is that you can't tell the difference between a .260 hitter and a .280 hitter by watching one game or one series. The difference amounts to one extra hit every 2 weeks. Similarly is there any way to tell just by watching whether Eddy Curry is more or less prone to turnovers than Yao Ming?
Similarly I think that one of the best uses of stats is to provoke questions and try to map out ways in which questions can be answered. How can we tell if a team is shooting 'too many' or 'too few' three-pointers? Do shot-blockers have an 'intimidation' effect? How valuable are 'scoring' point guards compared to 'traditional' point guards? Are specialists more or less valuable than generalists? How valuable is it to have guards who can rebound or big men who can pass? What separates a good shooter from a great shooter? Stats can't answer all of those questions but they can rule out some wrong answers that have intuitive appeal and focus attention on possibilities that are more likely to be correct."
Statistical analysis can play a critical role in basketball decision-making, but it can also be misleading if the complexities of the game of basketball (and the statistical issues generated by those complexities) are not well understood. In other words, the bar is higher for statistical analysis in basketball than it is in baseball. Ultimately this will greatly benefit the teams that incorporate skilled statistical analysts in the right way, because the greater complexities in basketball will mean that it will be harder for other teams to ever catch up with the first teams that get this right. It will be fascinating seeing how this all plays out over the next few years.
Last updated: 4:00 AM, August 29, 2005