One of the few downsides of living in the tropics is that we have periodic power outages. Tonight was such a night. So I spent the night reading a book in the dark with an LED candle (yay progress!?!). Sadly, this prevented me from writing a post. Luckily, I had a great guest post lined up.
Tonight, I will turn this space over to Alex Konkel, Sports Skeptic, grad student by day, writer of great posts (classes permitting) by night.This is Alex second guest spot in this space and if he keeps making me jealous with his stuff definitely won’t be his last.
Hi everybody! A couple days ago I put up a post asking why people use adjusted plus/minus (APM), and then it was almost immediately responded to by Henry Abbot at TrueHoop. Henry said “A lot of people believe in their work, and while I honestly still haven’t grasped exactly how to use plus/minus, I’m certain it’s one part of a good, broad analysis of a team or a player. (I’m also sure that, limited though it may be, for the time being it’s just about the only way to measure individual NBA defense in any meaningful way, so it simply can’t be ignored.) “ I think this is a relatively common thought that fit right into my post, so I’ve amended it a bit for Arturo. Hope you enjoy.
First, a brief description of adjusted plus/minus. APM is an alternative to boxscore measures of player productivity. Instead of looking at how many points, rebounds, steals, etc, a player accumulates and then weighting them in some manner to get a single number that measures his production (the method used for Wins Produced, Win Shares, PER, etc.), APM uses regression to directly estimate how a team does when he is on the floor.
APM has been discussed and described a few places; you can look at posts by Arturo, Aaron at basketballvalue.com, Dan Rosenbaum, Eli Witus, a number of places in the APBR community, Dave Berri, and I’m sure elsewhere. Obviously a lot of credit goes to them; I’m summarizing their work and thoughts here. You should also take a look at some of those if you’re unclear on how APM is calculated.
The APM regression is meant to calculate each player’s contribution to the team’s bottom line (outscoring the opponent or being outscored) while accounting for his teammates and his opponents. The main benefit is that APM should account for things that don’t appear in the boxscore; does player X set good screens, does he space the floor, does he close out on shooters, does he disrupt the opponents’ offense. As noted by Henry, defensive value is perhaps the best part of APM, since the only individual boxscore measures of defense are steals and blocks. The second best part is the fact that regression is meant to account for the other variables, in this case teammates and opponents. Sure, Kevin Love gets a lot of rebounds, but maybe it’s because his teammates force opponents into bad shots, and that’s where the value is? Maybe he scores so much because defenses key in on Beasley? In theory, APM gives a measure of a player’s value completely separated from other players in the league, regardless of how they might contribute.
That last sentence also summarizes the downside (‘in theory’ and ‘regardless’) to APM. One big problem is theoretical; APM is a black box. The data goes in and the numbers come out, but we can’t say why they turn out the way they do. If Kobe is above average, is it due to his scoring? Is it his clutch ability? APM can be separated for offense and defense, so there’s some value there, but if someone is an above-average defender you can’t say why.
With box score measures, you can point to where a player gets value and declare that to be why he is producing. This turns out to be a big issue with the use of APM. It turns out, for example, that Kobe is one of the worst players this year according to APM. Assuming this is just ‘noise’ and looking at the two-year APM numbers, we still see some odd results. Out of qualifying players, Monta Ellis and Chauncey Billups are the 12th and 13th worst players in the league. Joakim Noah is 47th worst. All three are considered to be at least average according to many boxscore metrics; Billups and Noah are more than likely above average. People who look to APM to inform their player analysis would say that APM is telling us that those players must be bad at something that isn’t in the boxscore; in fact, they must be terrible if whatever those things are move them from above average to among the worst in the league. However, we have no idea what that might be. A common conclusion is defense, but isn’t Noah supposed to be good at defense? No matter; he must be bad at something.
The other issue is a practical one: players tend to play with the same guys over and over. Starters are a good example; they are often on the court at the same time. An extreme example from the same technique used for hockey comes from the Sedin brothers; Daniel appears to share over 90% of his ice time with Henrik (extremely impressive given that the number comes from 3 seasons of data). What this means is that those players (which are variables in the regression) are highly collinear: their values follow each other very closely across observations. Players who play together a lot have virtually identical contributions to the model (they are both 1 or 0 most of the time), and thus the model cannot tell them apart. This leads to two issues mathematically: unstable coefficients, meaning that players may be given incorrect APM scores, and high errors, meaning that we can’t be very certain about how good a player actually is. The common solution, practically speaking, is to add more data: if you include previous seasons to add more data points and gain some leverage from players being separated due to injuries and trade, the estimates become better. Kobe Bryant is a good example. His APM this year is -5.23 with an error of 6.86. If we had to guess, Kobe is a very bad player. But we can’t be sure because the error is so big; we can only be somewhat sure that he’s somewhere between awful and slightly above average. If you add in last year as well, though, he has a score of 4.06 with an error of 3.59. Over the past season and so far this year, Kobe is a positive contributor, and we can be somewhat sure that he’s above average (although not very sure, and again we don’t know why). It also turns out, as described in Arturo’s post, that the APM regression does a very bad job of describing what happens on the court. For whatever reason (noisy data or otherwise), the R squared is very low; you would not be terribly wrong if you just declared every player equally good. So the practical issue of noise in the data leads to unreliable, perhaps incorrect, estimates of player productivity.
A few methods have been suggested for dealing with these issues (beyond adding more seasons). One is to try statistical plus-minus (SPM), which uses regression to predict APM from box score metrics. The Rosenbaum link above does this as part 2 of his final APM measure, and Evan has done something similar with regularized APM and his model. Since the boxscore tells us why someone is effective (e.g., we can see that the shot a good percentage, or get a lot of steals), connecting that to APM can be informative. Another option is the regularized APM I just mentioned; it’s also called ridge regression. What this does in practice is move all players close to average (0). However, even with multiple years of data, RAPM is not as predictive as you might like. It also seems problematic to take a technique that already has trouble telling people apart from average, and then move everyone closer to average. It also turns out that SPM relies on a somewhat tenuous relationship between the boxscore and APM values; the regression is much stronger than the APM regression itself, but perhaps not great in an absolute sense. Only about half of the difference in players’ APM scores can be explained by their boxscore stats.
In summary, APM is a statistic that has great promise but big practical issues. These issues have not gone unnoticed; beyond Arturo and Dave Berri’s posts, some people at the APBR site have been very cautious about its use (including RAPM). But other people are not; it’s used as the basis for various SPM models and the same approach is used to analyze rebounding. And, obviously, Wayne Winston and Henry Abbott think that it’s worthwhile. This leads me to the bleg portion of the post, aimed mostly at people who do use APM: why keep using it? The one-year results, even for RPM, are so noisy as to be unusable. It has very little predictive power; the people you think are good this year could be great, terrible, or anywhere between the next year. Has anyone attempted to see if APM becomes more predictive with more non-overlapping years? For example, if you create 2-year APM from 07-08 and 08-09 and used it to predict the 2-year APM from 09-10 and 10-11, how well does that turn out? Despite the noise, some people use it to evaluate their own model or build new ones; why rely on something so unreliable to determine your model? As mentioned above, comparing APM and boxscore metrics is common in evaluating a player and my sense is that APM is given the benefit of the doubt. Given its unreliability, why? If you only use multiple-year APM in an attempt to fix the noise issues, how do you know who was good just last year, or this year so far? If I’m a GM at the trade deadline, I need to know how Carmelo is doing now and likely to do in the future, not how well he’s played over the past four years. Weighting seasons is meant to cover that issue, but I bet it does little to improve the errors.
Essentially we have a measurement that is known, even by some of its users, to be unreliable (and in some cases outright unusable). Yet it informs their decisions about players and gets the benefit of the doubt when it disagrees with other, more transparent, metrics. One of the appeals is that the APM measure is ‘unbiased’; there are no agendas or decisions made about what actions on the court might be worth however many points. But we have no idea how it weights anything, meaning we can’t evaluate or make decisions based on its ratings, or in fact if a player’s rating is even accurate. To make an analogy, let’s say I wanted to know if it’s going to rain tomorrow or not. I could look to a computer model with known parameters and behaviors, I could look in the almanac culled together from years of weather patterns and experience, or I could flip a coin. Help me out guys: why use APM? Why flip the coin?