Monday, August 29, 2005

Using statistics in basketball: the bar is higher

By Dan T. Rosenbaum

David Leonhardt in his Sunday New York Times "Keeping Score" column has been a pioneer in describing the ways in which statistical analysis has affected sports. (He writes about economics during the week, so, of course, everything he writes is gospel.) This week David writes that baseball "has found itself in the equivalent of a theological dispute about whether [it] is a game of mystery or of data, of statistics and analysis or of intuition and human instinct."

David points out while teams using statistical analysis, such as Oakland and Boston, have achieved a great deal, there is no denying the success of "traditionalist" teams, such as Atlanta and St. Louis. The article is fairly even-handed, but this passage appears to betray the author's feelings.

"Academic research, however, is pretty much on the side of statistics. Whether diagnosing patients or evaluating job candidates, human beings vastly overestimate their ability to make judgments, research shows. Numbers and analysis almost always make people better.

'There have been hundreds of papers on subjects from picking students for a school to predicting the survival of cancer patients,' said Richard Thaler, a University of Chicago economist who uses sports examples in his class on decision-making. When a computer model is given the same information as an expert, the model almost always comes out on top, Thaler said."

This last sentence begs the question, however, because traditionalists would argue that the "computer model" never has the "same information" as the scout or coach. And they would be right. The real question is whether the benefits of more data (often collected and analyzed in a more objective manner) outweighs the costs of a simplified model that necessarily ignores some aspects of reality. Thaler argues above that in most circumstances the answer appears to be yes. And in baseball, I believe the argument in most cases is yes.

But in basketball, I am not so sure. Recently, I received an e-mail from a friend who argues that "basketball stats are a really interesting challenge."

"There's a sense in which [basketball stats are] much more related to economics than baseball stats are, which I always found a bit boring although incredibly accurate and powerful as a game predictor. Baseball is mostly about a small number of repetitive hand/eye coordination tasks, while basketball involves constant maximizing interaction between optimizing actors on the court."

Tabulating statistics may very well be the best way to form predictions about the "repetitive hand/eye coordinating tasks" of baseball, but applying those same techniques to the game of basketball which "involves constant maximizing interaction between optimizing actors" may not prove to be as useful. The costs of a simplified model may be too high.

But do not interpret me to be saying that statistical analysis has no place in basketball. Instead the point I am trying to make is that basketball people are right to be skeptical of statistical analysis, because analyses based upon an overly simple model of the game of basketball often can be more misleading than useful.

A good example of this is the "possession usage vs. offensive efficiency" debate over at APBRmetrics. Dean Oliver, author of Basketball on Paper and consultant for the Seattle Supersonics, makes the following argument.

"Implying that all these high percentage, low usage shooters can ramp up their usage without penalty implies that the people running the NBA are not just a little wrong. It implies also that the fundamental nature of basketball is poorly understood. It implies that any sort of linear weights rating is wrong. . . .

It implies that pretty much every rating method is wrong, because the context in which players are being used is incorrect. [Dan Rosenbaum's] method, which is totally different from others here, has to be wrong because it is flawed by the decision to not let Fred Hoiberg shoot 25 [times] per game. This is not just a matter of a tiny little assumption that has to be proven. This is a principle that really underlies the game of basketball. It very much distinguishes it from baseball, where players take turns being on offense."

This argument by Dean highlights how important a solid understanding of the game of basketball is to good statistical analysis in basketball. But a solid understanding of statistics - perhaps moreso than what is necessary in baseball - is also critical in making the right judgments when using basketball statistics.

I have heard reports of a Western Conference general manager that is heavily using basic unadjusted plus/minus data in his evaluation of free agent acquisitions. I probably understand the nuances of working with plus/minus data about as well as anyone, and I am one of the biggest advocates for plus/minus data. But I shudder when I hear about this general manager.

It is easy to misinterpret what can be learned from plus/minus data, and I see mistaken analyses using these data more often than not. Teams do not play their players randomly. Match-ups matter. Roles matter. And trying to isolate the contribution of a player or two when ten players are on the floor at a time is a tough statistical feat. Hearing a general manager without extensive experience with statistical analysis is making heavy use of these data sounds to me like a recipe for disaster. Without a strong understanding of statistics, as well as a strong understanding of basketball, it is just too easy for statistics to be more misleading than useful.

Another example is Dallas who has for several years made use of adjusted plus/minus ratings in their coaching/front office decisions. And the consultants who do this work for the Mavericks - Wayne Winston and Jeff Sagarin - are unquestionably skilled data analysts. But they have never interacted much with the wider basketball statistics community, and I think this has made it more difficult for them to place their work in the proper perspective. (I cannot begin to describe how the APBRmetrics community has been influential in my thinking.)

In addition, my understanding is that these adjusted plus/minus ratings are largely treated as "raw data" and the coaches/front office are pretty much left to their own devices in interpreting/analyzing the data. This, in my opinion, is a huge mistake, which very well could result in very useful data produced by skilled analysts being more misleading than helpful for the Dallas coaches/front office.

Given all of this, I think it is very much an open question how useful statistical analysis can be in basketball decision-making. Done poorly, I think it can hurt teams. Done well, I think it can be a valuable asset. My sentiments are summed up pretty well in this passage by NickS at APBRmetrics.

"The reason to use stats in any field is because humans are poor at evaluating probability. We tend to see patterns where there aren't, overestimate the probability of low frequency events and, most importantly, have a tendency towards comfirmation bias -- looking for evidence that confirm our preexisting beliefs.

One of the things that's said in defense of stats in baseball is that you can't tell the difference between a .260 hitter and a .280 hitter by watching one game or one series. The difference amounts to one extra hit every 2 weeks. Similarly is there any way to tell just by watching whether Eddy Curry is more or less prone to turnovers than Yao Ming?

Similarly I think that one of the best uses of stats is to provoke questions and try to map out ways in which questions can be answered. How can we tell if a team is shooting 'too many' or 'too few' three-pointers? Do shot-blockers have an 'intimidation' effect? How valuable are 'scoring' point guards compared to 'traditional' point guards? Are specialists more or less valuable than generalists? How valuable is it to have guards who can rebound or big men who can pass? What separates a good shooter from a great shooter? Stats can't answer all of those questions but they can rule out some wrong answers that have intuitive appeal and focus attention on possibilities that are more likely to be correct."

Statistical analysis can play a critical role in basketball decision-making, but it can also be misleading if the complexities of the game of basketball (and the statistical issues generated by those complexities) are not well understood. In other words, the bar is higher for statistical analysis in basketball than it is in baseball. Ultimately this will greatly benefit the teams that incorporate skilled statistical analysts in the right way, because the greater complexities in basketball will mean that it will be harder for other teams to ever catch up with the first teams that get this right. It will be fascinating seeing how this all plays out over the next few years.

Last updated: 4:00 AM, August 29, 2005

4 Comments:

Blogger Dudley said...

Good insights in this installment. It seems obvious to me why statistical analysis is so much simpler in baseball than in basketball; in baseball the players are essentially isolated in their playmaking. While there are a few aspects that require real coordination among team members, mostly it is a game of individual skill tests among two groups aggregated over 9 innings. In basketball, team coordination is a tremendous component.
I have one question, which you probably can't answer: I'm a big trailblazers fan, and I really hope that western conference GM you described putting misplaced faith in +/- wasn't John Nash, was it?
The Trailblazers last season were a prime example of why +/- can be misleading. Their top getter, with an unreal +25.0 was rarely used scrub Maurice Baker (clearly low sample size). Last year in the middle of the year, the coach was fired, the primary scorer (zach randolph) went out for the season with a knee injury, and veterans Nick Van Excel, Abdur Rahim and Theo Ratliff took turns being injured. Before that the team was winning at around a .450 clip, but they finished the rest of the year with 5 wins and 22 losses. During this time rookies like Sebastian Telfair, Travis Outlaw and Ha Seung Jin got their first real action, and Damon Stoudamire was playing around 40 minutes per game. All of their +/- ratios suffered from playing in units that had never played together, had few veterans and very little healthy depth on the bench. For +/- calculations, their aggregate success (or lack of it) was compared with that of the healthy veteran team from the first half of the year.
If the season had been separated into two parts for the purposes of +/- calculations, I think the results would much more accurately reflect the real contributions of those players who mostly participated exclusively in either part of the season, namely Van Excel, Telfair, Outlaw and Ha. Sorry for the long post...

8/30/2005 4:48 AM  
Anonymous Anonymous said...

I read a decent article in The Sporting News about the rise of statistical analysis in sports. It didn't go too in-depth, but it was interesting.

8/30/2005 9:20 AM  
Blogger davis21wylie said...

What about the PER as an indicator? +/- numbers have some inherent bias (even when adjusted) because you're not isolating the player's actions per se... you're just looking at how the player's team did relative to the quality of opponents while he was on the floor. In a broad sense, this identifies the best players, but the same (although in a far more rudimentary form) criticisms exist of the +/- system in hockey. While adjusted +/- in hoops will not over-emphasize team ability like the NHL's stat does (i.e., players from Colorado or Detroit leading the league in +/- every year), it still cannot explain on a per-possession basis what each player contributes. The PER does that, though, arguably better than any other method (offesnively, that is). The PER's weakness is defense, because blocks and steals are worthless indicators. Keeping all of this in mind, why not do "Minus" ratings? Defense is much more of a team concept, the pluses are nothing that PER cannot handle (in terms of evaluating offense), while adjusted minuses may be the best method of evaluating position defense.

9/02/2005 11:53 PM  
Anonymous Mike Z. said...

Dan:

I generally don't post on any of the relevant boards, and have not commented on your (or others') work in the past, mostly because the confidentiality requirements of my job basically mean I can't be interacting with the general community on this stuff.

But this post was right on. Any front office personnel (or amateur analysts) who don't understand the game AND how any given stat relates to it will be unable to use statistics optimally, and, in fact, can end up misinterpreting the stats and what they mean. As a result, the task of explaining how the stats work and how they ought to be interpreted (and, more importantly how they ought NOT be interpreted) is one of the toughest, and yet most important, parts of the basketball analyst's job.

Since the vast majority of those in charge of teams are not trained statisticians (this is probably a good thing, for some of the reasons Dan mentions), those who aspire to do basketball stats for a living would do well to work hard at being at least as good at this communication-based part of the job as they are at thinking up new ways to look at the game.

Remember, Bill James was not a classically-trained statistician, but he was/is a great writer, and always is prepared with a good baseball anecdote to drive any given point home to non-stats folk. And yet it took him 20 years to get teams to listen to him consistently. In a sport like basketball that's so much more complicated than baseball, the communication necessary for effective integration of new stat-based methodologies with classic basketball analysis will be far more important than it was in baseball.

Great post.

-Mike Zarren
Basketball Operations Analyst
Boston Celtics

11/07/2005 10:36 AM  

Post a Comment

<< Home