You can lay the blame for inspiring this post squarely on the shoulders of Nate Silver, Dave Berri and Andres Alvarez.
Nate Silver for inspiring me with his work and prodding me into action with his Melo piece.
Dave Berri for also inspiring me with his work and letting me borrow his podium and his audience to have my say .
And Andres Alvarez for setting the data free and always forcing me to up my game.
My muse can be very fickle. Some posts I write in minutes, some take hours or days. Some never get written. It’s all about feeling inspired and having something to say. This particular post has been in my head for months. I’ve written drafts. I’ve done hours and hours of exhaustive research on this. I’ve built models, I’ve bought software to confirm and verify my findings. I’ve run my findings past other people to confirm I am not seeing things.
My point is this: I did the work so now I feel fully justified in breaking out my pimp hand. You may not like it but as always you can go off and confirm what I find. Please remember that I am merely an agent for science.
Let’s talk about Adjusted Plus/Minus.
But let’s talk about the scientific method first. According to the Wikipedia article, the scientific method is:
“The Oxford English Dictionary says that scientific method is: “a method of procedure that has characterized natural science since the 17th century, consisting in systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses……
In the 20th century, a hypothetico-deductive model[8] for scientific method was formulated (for a more formal discussion, see below):
- 1. Use your experience: Consider the problem and try to make sense of it. Look for previous explanations. If this is a new problem to you, then move to step 2.
- 2. Form a conjecture: When nothing else is yet known, try to state an explanation, to someone else, or to your notebook.
- 3. Deduce a prediction from that explanation: If you assume 2 is true, what consequences follow?
- 4. Test: Look for the opposite of each consequence in order to disprove 2. It is a logical error to seek 3 directly as proof of 2. This error is called affirming the consequent.[9]“
Nobel Laureate Richard Feynman sums it up here . For him,the Key to science is you make a guess at a truth or law, you calculate the consequences of that guess and then you test it against nature. If it disagrees with experiment your guess is wrong. This is the crux of the matter.
Longtime readers know that this is not my first time taking a model apart. In fact it’s something of a habit. I love testing and deconstructing NBA performance models. Some examples are:
https://arturogalletti.wordpress.com/2010/07/01/win-regression-for-the-nba-2/
https://arturogalletti.wordpress.com/2010/07/05/defense-adjust-wins-produced1st-pass/
https://arturogalletti.wordpress.com/2010/07/20/predictive-stats-bad-metrics-correlation-in-the-nba/
https://arturogalletti.wordpress.com/2010/08/17/introducing-worp-wins-over-replacement-player/
A lot of it has to do with my passion to explore and test what’s out there to see what I can adapt and learn to use in my own quest to build a better model. But lately, I had been short of time. As Is said, the genesis and inspiration for this post comes from Nate Silver positing the existence of a Melo effect and Andres putting up a game splits tool. Both made me feel like I had lost a step. I’d stopped advancing towards my desired goal of a better model.
No more.
I decided to start looking for play by play data sources (and this is a story for future time trust me). One of the things I focused on was the logistics behind Adjusted Plus Minus.
Now I had my reservations. The main one being this. Prof. Berri has more than once kindly shared some of his work with me. One forthcoming article ( Berri, David J. “Measuring Performance in the National Basketball Association.” In The Handbook of Sports Economics, eds. Stephen Shmanske and Leo Kahane; Oxford University Press) has the following neat little table:
This table shows all the most commonly used models of player performance in the NBA (See here for full on explanations) and it shows two key things:
- Explanatory Power to Wins (think experimental correlation to reality)
- Consistency of Player performance year to year (Does the model tell us something of use about the player over time)
You’ll note that the Scoring models are very consistent over time but don’t really measure success. Plus-Minus models have high explanatory power to wins but lower consistency (with APM being the worst). WS and WP do decent jobs at both with Wins Produced having a better consistency at the player level over time.
That lack of consistency for APM was a worrisome little tidbit that I filed away for future reference.
But as always, I set my doubts aside and decided to go off and build it myself.
My first step was to go to the most often quoted source for APM, BasketballValue.com. At this point, I want to make clear that the model I will be discussing is based on and I quote BasketballValue.com: ” The adjusted +/- calculations are in the spirit of the work of Dan Rosenbaum“. There are other regression based +/- models there that use different methodologies with varying degrees of success and I make no claim as to their validity. I may in fact, have some ideas around this myself ( but again future Arturo will cover this at some point in the near future).
So I read the source material and set about building +/- and giving it my own little twist. I took all the splits >2minutes and set it up as folows:
Home Margin = b0 + a1H1+a1H2+……+ b1X1 + b2X2 + . . . + bKXK + e, where
H are the different homecourt scenarios (which I’ve talked about before at length) and X are player minutes. I do +mplayed for home players,-mplayed for road players and 0 for not played.
I regressed everything out for 2009 and I got the following:
And :
Players and Homecourt broken down and that makes for a neat little post right?
Except I like to double check my numbers and two things jumped out. One the correlation to wins was very low (~10% R^2) and the +/- numbers don’t quite add at the team level. Somehow they do add up in the final +/- APM numbers.
This officially ticked off some warning bells. Something was funny.
So I decided to go off and do some more experimenting.
I decided to step thru the work from here: http://www.82games.com/comm30.htm (the Dan Rosenbaum piece which is the basis claimed for APM). The article describes a 3 step process which breaks down as follows:
Step 1: Regress Point Margin per possesion to the Players on Court vs reference players. Take all players >250 Minutes Played for 2002-3,2003-4 . Weigh by possessions, year, and game situation(There’s a whole algorithm for this which you can look up in the piece if you want and I had to spend a few hours building in excel) .Regress. He calls this True +/-.
I ran this regression multiple times. I ran it with the data set from the article. I went out and found the initial thread on APBR (http://sonicscentral.com/apbrmetrics/viewtopic.php?t=327), I also went and downloaded the data set generated here (by someone who is not me just to make sure I wasn’t screwing it up): http://www.countthebasket.com/blog/2008/06/01/calculating-adjusted-plus-minus/
Every single regression gave me less that 5% R-Sq. So I feel confident in the statement that the correlation of the model in step 1 (as described) is <5%.
The question then becomes how we get to 95% overall Correlation to wins? Let’s talk about the other steps.
Step 2:
The model now takes the True +/- values for each player from the first equation and regresses those against those player’s stats to determine weights for each stat. He reports an R^2 of 44%. It’s important to note that this is the R^2 between the True +/- for the 420 players and the Stats and not the stats and Point Margin per possession or Wins.
After, the next bit is to take the Weight of each stat in point margin and uses it to calculate each players statistical +/- (think a version of Win Score or NBA Efficiency with a supremely slim correlation to wins and point margin).
Confused yet? Wait there’s more. Now comes the really sketchy bit
Step 3 :Calculating Adjusted +/-
The final step is to take the Pure regression and the Stats model and adds them up by player like so:
APM = x* Pure +/- + (1-x)*Statistical +/-
And proceed to adjust x between 10% and 90% for each player to minimize the error. In essence he tweaks the rating to get a high R-Square.
To summarize, the APM model calculates two variables with a low correlation to wins (R^2 <5%) and adds them up to minimize the error and guarantee a 90%+ Rsq. for the overall model.
Funny that.
What does this mean exactly? Well, the R^2 for the APM model is very much a fabrication. The correlation to point margin & wins of the model shown in Basketball value is artificially inflated by adding the error back in. To put this in perspective, I would bet a simple model using minutes played % for a team to assign wins to substitute true +/- and Wins Produced or Win Shares for Statistical +/- would be much more consistent to team wins prior to the error correction (and produce more consistent results and I may in fact have another post for Future Arturo).
I went into this exercise hoping to find something that would make my life easier. If it worked, I could use it to derive and infer all sorts of cool stuff about opponents and defense. Sadly, the APM model examined does not hold up under scrutiny. It is built to account for all the variability in the process but hold very little actual correlation to the actual process.
In brief, I failed to simplify my life (with APM although I looked at some other variations of this kind of model as listed on the table that had some interesting potential ) and I probably just exponentially increased the list of people who have less than kind thoughts towards me. At the end of the day however it’s all about science and science is a cruel mistress.
Regardless, let me get you started:
Because we are all about the fanservice 🙂
For Prof. Berri’s great followup piece on this click here
PS. I received a very gracious note from Aaron Barzilai, Ph.D from BasketballValue.com. It follows in full:
Arturo,
A couple people just pointed me to your post, so I wanted to clarify what I’m providing on basketballvalue.com. I think there might be a little confusion.
Certainly, as you quoted from my web page “The adjusted +/- calculations are in the spirit of the work of Dan Rosenbaum”. On basketballvalue.com, I report both unadjusted numbers (e.g. Overall Rating) and adjusted numbers (e.g. 1 Year Adj. +/-). These adjusted numbers are the result of the basic regression as outlined in Rosenbaum’s article that you link to, labeled formula (1). In that paper, he lists the results in Table 1 as “Pure Adjusted Plus/Minus Ratings”.
The rest of your post above gets into the technique that Dan introduces for Statistical Plus/Minus ratings and then Overall Plus/Minus ratings. While that’s interesting work, that’s not what is presented on basketballvalue.com. I tried to make that clear in the various explanations that have been posted, including the 82games articles referenced in the comments(“These ratings have been determined using only the matchup data available at basketballvalue.com. They do not explicitly include box score statistics”). Sorry if that still comes across as a little opaque.
Also, I just thought I should mention that the purpose of the site is really just to be a service to the community. The raw data is available for download so that others can use it in their work, which I find really rewarding. The unadjusted and adjusted +/- results are also updated on an almost daily basis so that people who are interested but might not be able to reproduce them on their own can see them. I don’t include commentary, and I think my position is summed up best in the conclusion of http://www.82games.com/barzilai2.htm :
“the results must be one piece of a broader assessment of the player.”
Thanks,
Aaron
Aaron,
I very much appreciate thoughtful response as well as the data services provided.
The numbers reported on your site can correctly be said to more closely resemble step 1 of the Rosenbaum paper (with an additional step to center the data). This is the initial step that I worked out to about a 5% R^2 .
I do totally agree with the statement :“the results must be one piece of a broader assessment of the player.” and I think I stated as much earlier.
You just know I’m coming back to this in the future 🙂
Chicago Tim
03/04/2011
What about regularized adjusted plus minus?
arturogalletti
03/04/2011
CT,
The problem with all the simple +/- models is that they’re misspecified. The trick to building a model for any unknown variable y is to make a guess at what the known variables x’s are that provide information about the unknown. Then you build your model and test it against nature and see if it tells you anything of note.
A player’s presence on the court is only one of the variables that one should consider when building a model. In fact, If I were doing it my x’s for wins would be statistical events of player performance, location,rest, altitude, the refs, the coaches, the players. Go from there and see what happens.
I’ve played with building some of these in different ways (as have others) and you can build a better model than APM.
My point is that the trick is in the varibles that you believe have significance and not the method of regression.
Rex
03/04/2011
Trying to get a handle on just how sketchy Step 3 of APM is. Are you saying that the value for x is done separately for each player, making it possible that every player has a different x value?
If so, your summary of the method (in the next sentence) greatly understates how ad hockey it is.
arturogalletti
03/04/2011
Rex,
Yes to the first part and hell yes to the second one. It’s the kind statistical manipulation that in the pharmaceutical industry would get you shut down..
Robbie O'Malley
03/04/2011
You had me at Richard Feynman.
Crow
03/04/2011
How much data did you throw away by eliminating the splits <2minutes, why did you do it and doesn't it impact the validity of your test of "the existing model" if you decide to change the model?
If you are interested in science why test just the first weakest model, the 1 year traditional Adjusted +/- instead of the multi-season APM or RAPM or multi-season RAPM when they have been shown / asseted as significantly better? If it is a first step fine, but if you stop there the motives are questionable.
Are you going to review the 4 research posters at the Sloan Conference, mostly by PhDs about ways to improve APM further looking at the importance of pairs, recognizing that all possessions are not the same and other things?
There has been talk of included Baynesian priors in APM at APBRmetrics. I hope you pursue that line too in the name of science and better estimates instead of just knocking now a model, the simplest version of that model.
Crow
03/04/2011
make that “asserted” in above
arturogalletti
03/04/2011
Crow,
My initial take eliminated the splits <2 minutes to clean out noise but that was only my version.
Afterwards, I actually took great pains to replicate the model exactly as listed. Multiple times with multiple data sets. I used the exact weighing mechanism listed in the piece. I also looked for independently done reconstructions and they all game similar results (<5%).
I was specifically reviewing the most cited model (and the one whose method construction is publicly available ).
As for alternate models, I have been looking at those. A model using a +/- calculated from WP48 or WS48 (which I have constructed previously or you could regress your own stat model) for example as my priors and then regressing from is an interesting idea. However it's an unoriginal one as I feel that there are people already doing this kind of work.
I just feel at this point that getting clean sources of Play by play for better defensive drilldowns for a better a priori number is a much higher return on my time investment.
Crow
03/04/2011
Reviewing Rosenbaum’s extra steps for “Overall +/-” is worthwhile / overdue but please note that none of the other published versions of Adjusted +/- use “Statistical +/-” at all. All those published versions are pure Adjusted +/-.
Overall +/- may be worthwhile, even better- if done right- but it is a blended or higher model.
arturogalletti
03/04/2011
Crow,
See my point above. A statistical +/- model (which is in essence what WP48 or WS48 is) is the perfect departing point for a Bayesian construction. My comeback question would be about the additional x’s that you use for regression. Players,Player pairs & groups are all nice but all my work posits that Homecourt,Altitude, rest, the refs, the coaches all have similar or greater effects on the result.
Crow
03/04/2011
Looking at one game, I count 15 minutes of time with lineups stints of 2 minutes or less. So you’ve decided to throw away about 1/3rd of that game. It is may not be a neutral impact overall decision.
arturogalletti
03/04/2011
Crow,
It’s about the sample size actually. My take is that any game sample with less than 5 possessions borders on insignificance (i.e the sample is so small that I cannot make any inference as to the actual shape of the variable measured).One of the main source of error for these models seems to be that combined with the missing test case problem (i.e there just aren’t enough valid samples for the particular combinations to make accurate conclusions which is were the discrete event models come in).
Crow
03/04/2011
What if you used a Statistical +/- system to assign 50-75% of the credit for what happens on the court and then run the Adjusted +/- calculation for the remaining 25-50% with a Baynesian that could be based on statistical +/-, Adjusted +/-, a blend and / or other stuff?
There are things players do (picks, spacing, blockouts, shot intimidations, saved turnovers, jump balls won, help defense, etc.) that are helpful that don’t show up in the traditional boxscore or even one extended to estimated counterpart defense including shot defense. How much impact that amounts to as a % of overall valuable activity is hard to say but to me it is not near zero. Either you say I can’t directly count that stuff and hence will ignore it and assume it is the same for all players or just unknowable, or you try to measure it indirectly by some method akin to Adjusted +/-. Or you use data without and with that attempt to measure beyond the boxscore and see if the stories told are different. If there are paradoxes revealed, you can try to wrestle with and resolve them with the aid of observation and judgment.
arturogalletti
03/04/2011
Crow,
You’re asking the right questions actually. Think about the following: How do you set the % breakdown? The Statistical +/- assigns the value for the events that happen on the court (and we know that there room for improvement there i.e the defense and opponnet stats but let’s assume we have those). The next step is to figure out which variables have real meaning: The refs? The Coaches? The quality of your Teammates (think Diminishing Returns)? Homecourt? Altitude? Rest? We’ve seen and proven that these all mean something, throw in the players on the court (by themselves and or in pairs and then we can go fix these.
My thinking is that we cannot separate our model from the reality of player events that happen on the court (particularly if those events are consistent over time). As you say we start from those mesurable facts and add in the information that has real meaning.
Crow
03/04/2011
FWIW, I found a .75 correlation (r) between 2 yr and 4 yr RAPM. But they included some of the same data and ideally I guess it might be better to compared 2 yr RAPM to totally distinct 2 yr RAPM. With this season getting near completion that could be done with basketballvalue records but not with RAPM. It will probably beat the consistency of 1 year APM comparisons and maybe by a wide margin.
Properly constructed by a statistical expert, I think some blended version of a box-score based statistical metric and Adjusted +/- would surely beat Adjusted +/- on its own and might also be able to beat the box-score based statistical metric on its own too. The counted portion of what players do with impact is probably larger than the uncounted with impact (probably between 2 and 4 – 1) but both matter in the game outcome.
arturogalletti
03/04/2011
Crow,
Yes. The trick is in adding in the right variables.
Crow
03/04/2011
Thanks for the replies.
Eliminating stints of less than 2 minutes may have pluses as well as minuses. Eliminating stints of less than 1 minute or 30 seconds might be worth considering / testing as well.
I appreciate your work in revealing the impact of Homecourt,Altitude and rest in terms of Adjusted +/- impact and in the evolution of the model I think other Adjusted +/- practitioners should also consider adding these elements.
“back2newbelf” has added coaches into onve version of RAPM. Refs have been studied and their impacts could also be fit into Adjusted +/- based analysis.
The difference / maybe semi=advantage of “Statistical +/-” over straightforward boxscore metrics is that the value of uncounted acts are attached to one or more discrete stat values so you are capturing 100% of the impacts (and not say 60-80% with the boxscore based metric) even if you may be mis-assigning some of them at player level.
In Rosenbaum’s Overall +/- the average % weight that Adjusted +/- got was something like 20%. So compared to a statistical metric they agreed on about 80% of value assignment being based on the box score. 80% isn’t 100% but that comparison is not stark. Whether the 20% of value assignment by Adjusted +/- can be done well enough right now to be worth adding is a fair question and I appreciate your genuine interest in research. My main counterpoint is that Adjusted +/- is at least worth looking at right now alongside other metrics and it is also worth trying to improve, given the power of stats in the hands of capable, motivated investigators.
arturogalletti
03/04/2011
Crow,
I appreciate reasonable questions. I think the key point is that models like Win shares and Wins Produced are statistical +/- models and the consistency of player value over time in these models (and the ability to make specific observations like I’ve done on multiple occasions here with things like the Half Baked notion or the Championship equation) argues for significance of this kind of model. There are factors beyond the normal boxscore to consider (and I’m not repeating them again :-)) and that’s were the work lies for improvement. But the existing models provide a wealth of real information.
The APM model as currently constructed on BasketballValue is not something I can put any credence in at this point given what I now know about it’s construction. However, models like Wayne Winston’s are interesting as points of references. I do tend to take closed models with a huge grain of salt now. Call me Doubting Thomas.
Crow
03/04/2011
I differ some with your conclusion but I respect your investigation. I’d value even even more investigation but of course you can and will take your talents to the questions and depths of investigation of your choosing.
Alex
03/04/2011
Crow, why would you have any confidence in how statistical +/- attributes credit for ‘uncounted acts’? Not only is it probably crediting the wrong players (since APM barely accounts for which player led to a team outscoring the opponent), but the boxscore statistics don’t do a great job of describing APM. Put another way, you have two steps of noise between a boxscore stat and crediting a player with setting a screen or spreading the floor, and both sources of noise are pretty sizable.
Crow
03/04/2011
I don’t have a fixed answer of how much confidence I have in Adjusted +/-. At times I can appear to be a big advocate. At other times I can be a fairly strong critic of it.
I am not suited to being the technical referee of various pro and con papers measuring the power of Adjusted +/-. Still I think that there are additional longer / perhaps better “neutral” constructed tests that should be run.
The Adjusted +/- values really point estimates of ranges and I keep that in mind so I only feel that I can know a player is somewhere between average and above average in overall impact, or above average and elite, or average and below average, etc. And I know the statistical rating so I can deduce how big the outside the boxscore impact is estimated to be. If it seems too extreme or too high or low compared to observation I can and will discount that information or at least use it with even more caution than normal.
That the results of Adjusted +/-, especially in multi-season / RAPM versions “make sense” / “seem close” or “close enough” in at least 2/3rds of the cases is not a very strong argument but it is a somewhat supporting argument for a user who accepts some uncertainty and error. The method is upfront about its limitations and it is up to the user to abide by them and for critics to recognize that the metric recognizes its degree of accuracy. I don’t find myself saying this stuff is crazy and worthless but I am not saying it is perfectly accurately or the best stuff out there either.
The factor level Adjusted +/- rating make sense a lot of the time too and I think does a lot to demystify the overall rating. it may be easier to spot surprises and errors at this level as comparisons to targeted subsets of statistical data are more manageable.
Consistency or inconsistency between individual player Adjusted +/- and lineup Adjusted +/- is a concern but there can be valid reasons for that. I have always felt that pairings can be significant and will try to find out what the Sloan poster focused on that topic does to examine them and what it finds. The abstracts appears to say that formally having player interaction terms in the regression reduce the errors.
Improve Adjusted +/- in 5-10 ways and you may have really gotten somewhere. Some theories that become very important and very highly regarded (within the bounds of what we presently or can know) get built step by step over time. Sometimes by innovators or fan of innovators. Sometimes by critics of the earlier model.
Crow
03/04/2011
A box score based metric compared to 100% pure Adjusted +/- would be a stark comparison. But stark comparison can be useful. If they agree, probably ok to be confident. If they do not, I would be so confident in either metric being the exclusive “right” one. It raises questions for further review.
Alex
03/04/2011
Arturo, do you have any other home court variables besides altitude, home team rest, and away team rest? I’m trying to figure out why the constant doesn’t fold in HrORrOAltO. Maybe those are entered as continuous variables instead of categorical?
stats_guy
03/04/2011
Wow, thanks for posting this. I was stunned that adjusted +/- was this adhoc.
So in essence, ‘statistical’ APM looks almost like WinScore except that it’s regressing against a value of a fantasy variable predicted by yet another linear regression, which happened to be called ‘pure’ APM. It’s not surprising that this would have less predictive power than WoW wrt team differential.
And then, you’re saying that it’s basically changing the weight between ‘pure’ and ‘stat’ individually for every player to minimize error from the overall team’s point differential !? That’s not even ad-hoc, it’s completely meaningless. I would be stunned if that actually gets published in an academic journal.
I think that somebody should really work on a non-linear, statistical model to predict players performance, e.g. graphical model or maybe even something like a collaborative filtering. I thought that adjusted plus/minus was something a little more sophisticated but good-God it’s yet another linear model (and looks much more ad-hoc than even WoW).
Crow
03/04/2011
stats_guy, if you both desire and are able to work on a non-linear, statistical model to predict players performance, e.g. graphical model or maybe even something like a collaborative filtering, I hope you will and preferably in public view. Looking up the definition of collaborative filtering I see some commonality with a rough idea I shared with a GM awhile back and got some interest on. Perhaps this technique properly applied would be productive. It maybe a part of a set of techniques upon which to build the better overall model.
stats_guy
03/04/2011
Collaborative filtering recently gained huge popularity thanks to in large part to Netflix competition.
You know, it would be real interesting if NBA teams held a similar competition (e.g. predict which team wins in a matchup, or predict a player’s performance in a new environment) with a sizable prize money ($1 million, still nothing compared to what they waste on some terrible decisions).
WoW does a good job of mapping individual stats to overall team differential. But I bet that what people are really interested in is – who wins a playoff series between two teams, how a team will perform with new personnel in a given year.
In a way, team differential can be thought of as a baseline predictor (e.g. probability of winning ~ sigmoid(difference of team differential)). But I suspect that there indeed are such things as matchups (with variable strength) and we can certainly do better than simple binomial probability based on differentials.
If I ever get time, I have a couple of ideas to try out.
Crow
03/04/2011
I believe “overall +/-” was intended as an applied management tool, coping with the two sources of data you have and the relative accuracy of each. Blending them together as well as you can to reduce error, recognizing variation in the quality of the data at player level, does not seem deceptive or wrong to me , it sounds appropriate / practical. I don’t believe there ever was any intention to published or promote it in an academic journal.
Crow
03/04/2011
But maybe appropriate replacement player specifications and ridge regression or other techniques would do better and be more technically defensible.
Crow
03/04/2011
FWIW, Joe Sill was on the 1st runner-up team for the Netflix competition. He has a Ph.D. He thought APM with ridge regression was worth his time and perhaps his reputation too to some degree. He presented his work with RAPM at the Sloan Conference last year. He found higher r2 than previous shown. Now he consults for an NBA team.
For years I had thought and said Adjusted +/- at the 4 Factor level would be helpful to see in public. He made it happen.
stats_guy
03/04/2011
I don’t know what ridge regression is. Is that basically linear regression with regularization ? In this case, if you are fitting the x * pure + (1 – x) * stat to team differential, that would be equivalent to linear regression a * pure + b * stat with constraints that a + b = 1, a > 0, b > 0
stats_guy
03/04/2011
Is this the paper that you are talking about?
Click to access joeSillSloanSportsPaperWithLogo.pdf
It seems to me that the only thing he did was add an L2 penalty term to the objective function (still the same regression against differential). A standard technique in machine learning to prevent overfitting (a gaussian Bayesian prior on W).
Crow
03/04/2011
I put the above reply under the wrong post.
I agree his paper is fairly brief and his treatment probably doesn’t not exhaust the adjustments that could and should be made.
Crow
03/04/2011
That is my understanding.
Crow
03/04/2011
Well I guess I am having trouble hitting the right reply button.
There is another fellow who has indicated he might run Adjusted +/- with some other Baynesian prior(s). There are many possibilities and I think further action along that avenue would help, among other initiatives.
stats_guy
03/04/2011
I’m sure that there are ways to improve the model but I’m still not convinced by the ad-hoc nature of mixing pure adjusted plus/minus and then the result of ‘statistical’ plus/minus predictor.
I suppose that the intuition is that we’ll capture whatever we can correlate with official stats (stat APM) and, we also have whatever we can’t capture with official stats (pure APM).
In comparison to WoW, the pure APM is very much like defensive adjustment (in WoW’s case, simple uniform distribution AFAIK).
However, there should be a more pricipled approach of mixing these two. For instance, maybe a player contributes all that with his stats and MORE (instead of having the weights add up to 1).
Crow
03/04/2011
As you say a player does contribute with his individual stats and MORE. But some of what they get final credit for in the statistical model is built upon the uncounted contributions of others so getting that “full credit” may be “too much” in the end.
Having the Adjusted and Statistical +/- weights add up to 1 isn’t really the best approach as pure original Adjusted +/- has all contributions and while it picks up the uncounted it will partly doublecount the counted ones again and the share of credit given to the uncounted might end up too low. Maybe an improvement over nothing having the uncounted credited / debited at all though.
Assigning 50-75% of credit to boxscore stats then looking for the remainder may or may not do a better job but it too probably wouldn’t be a perfect job.
Constructing a 4 Factor offensive and defensive Adjusted model with 1 or more additional variables awaiting a label or left without one might be more appropriate and more accurate, without any double-counting. If properly constructed.
Crow
03/05/2011
Clarification: you’d still do the overall Adjusted +/- as best you can, then work to break it down into more than 4*2 factors.
Crow
03/04/2011
I agree that ultimately this or any tool should be used to prepare for and win in the playoffs in general and in view of specific likely later round match-ups.
Crow
03/04/2011
Playoff performance against progressively better teams is not necessarily the same as performance against the league from top to bottom in regular season and you can use the data against top teams to try to help understand this better. The level of competition issue should not be ignored by any metric, even though there are different ways to address it.
Crow
03/04/2011
Arturo:
“I just feel at this point that getting clean sources of Play by play for better defensive drilldowns for a better a priori number is a much higher return on my time investment.”
I am not sure I understand exactly what you would try to get and how you’d use it. Would appreciate any clarification.
arturogalletti
03/04/2011
Crow,
What I’ve always wanted is a clean source of play by play data that lets me assign stats to the actor and the player acted upon. That would give me the ability to build ADJP8 for the player and his opponent use that to build true opponent adjusted WP48. If the data is good enough, I’d also have the ability to stratify boxscore stats further (think charges, types of rebounds,types of shots, etc.) and rerun all the regression models.
I’ve got some older data sets that approximate this but no current ones. We have a project in place right now to start setting something like this up though.
This gets us to the point were we can see the players contribution and the contribution of his opponent based on the discrete events taking place on the court. with that in hand I can start working on enviromental effects and non-linearity at the edges.
Crow
03/04/2011
Yes more reliable and fleshed out counterpart defense information would be very helpful to many and many metric approaches. If NBA scorekeepers can use their customized keyboards to about or more than double the data entry or if computer-aided video analysis can be trained to deliver the data.
Crow
03/05/2011
The estimated error in Adjusted +/- is estimated error from perfection, in the knowing the precise value of 300-400 players who play 5 on 5 in ever-changing combinations. That is a hard thing to be perfect on.
But how often can you get close? In the best 4 season regularized Adjusted +/- model in public the average errors are getting down to about 2.5 points. So for about two-thirds of the NBA population, the estimated RAPM value is believed to be within plus or minus 2.5 points of the true value. It does not meet standard proof of statistical significance for most players, just a small part of the upper and lower tails. But is the value still worth something to managers? I’d say yes. It would be worth a lot more if the error could be cut by another third or more and I think that doesn’t not seem impossible if further work to improve the model is conducted.
The regression values for stats on average that underlie Wins Produced players ratings also vary from the true player value of specific actions of specific players in specific game situations of that game’s scoreboard outcome but estimated errors from “true value” are not given.
What is getting “added back in” to “Overall +/-“ which you labeled “the errors”, is Statistical +/- or a version of a boxscore metric. The estimated value of countable player activity. You might not really like this calibration of boxscore activity but when it is directly tested (not the step 1 model), it performs in the same range as most boxscore metrics in retrodiction and projection. I find it odd to characterize the assessed value of countable player activity as “error” and I don’t believe it is as weak as the step 1 pure Adjusted +/-. Giving Statistical +/- about 80% of total weight in Overall +/- is in my mind a good thing as it is acknowledging that countable player activity deserves about 80% of total credit.
In the Overall +/- model, Adjusted +/- is the tail of the dog, not the whole dog. The original paper made that point directly. And it was clear to me, though I occasionally drift away from it. That no one else produced full sets of Overall +/- values again but rather just showed pure Adjusted +/- has always seemed weird to me and I have said so several times at APBRmetrics. I have sometimes cobbled a version of it together myself as needed for my own player assessments. What the Cavs actually use, pure Adjusted +/- or Overall /- is a question I have raised several times, obviously without an answer. But this development can be viewed in different ways. Tragic error (using Pure Adjusted +/- alone) or privatizing the better information (Overall +/-)? Dallas apparently has used pure Adjusted +/- at least partially, for a time. Other teams might use pure Adjusted +/- to some degree. Using it alone is not the best course of action. Using it as one tool among many is safer but at one’s own choice. But it should be seen as an enhancement or another snapshot rather than stand alone.
Crow
03/05/2011
One thing you can use pure regularized multi-season Adjusted +/- for is to see how many rotation players are estimated to be pulling the team down more than a minimum. Role players tend not to have high box-score value but they can help you, be neutral or hurt you. I haven’t done a systematic multi=season study of recent top contenders and champs yet (a good to do) but looking at this season the Celtics, Lakers, Bulls, Spurs and Magic were the contenders who had rotations where no main rotation player was estimated as worse than -1 on 2 yr RAPM. Is that just better luck from noise than other contenders or good systems and the best roster constructions and player fits into role where they can succeed? It could be some of each, but it is something you can either find and think about or not know to think about. There are other ways to use Adjusted +/-, rough as it is, to spur more research and more efforts to understand and possibly reconcile disparate ratings and to construct successful lineups in general or in the face of specific match-ups and game situations.
whiffleball
03/06/2011
Arturo, you can’t expect basketballvalue numbers to stay consistent. Why? Because the 1 year APMs have standard errors of 5, and the 2-year APMs have standard errors of about 3.5.
When you see the +16.09 APM for Wade in 2009-2010, it comes with a 4.85 standard error, which corresponds to a 95% confidence interval of about (+6, +26). So of course the numbers won’t remain consistent and the overall numbers won’t correlate to wins.
The usefulness of APM is when you use 4, 5, 6 years of data to get that standard error close to 1.0. That way, we can look back and look at who the most effective players were over a long stretch of time and make our own conclusions. “Players like Chuck Hayes, Andrei Kirilenko, and Ron Artest are near the top at their positions…interesting.” Then, when a similar player like Amir Johnson hits the free agent market, maybe you wonder if he’s worth more to his team than the average GM gives him credit for.
arturogalletti
03/06/2011
Whiffleball,
Forget about consistency, the calculation itself is wrong. Think about it. I can pick an infinite number of combinations for the APM= Pure*a + (1-a)*Stats step. That means that there’s an infinite number of solutions.
The answer tells you absolutely nothing (other than the possibly the prejudices of the person picking the a’s) and 5 years of nothing is still nothing.
whiffleball
03/07/2011
That final step isn’t something I’m aware of, and I’ve calculated 4-year APMs before using Eli Witus’ countthebasket method. I agree that’s an odd step, but I think you’re focusing on it too much. Remove that step from the calculation and instead use the “raw” 5 year APMs to find the useful information.
Will you link to the source where you found step 3?
Crow
03/07/2011
‘I can pick an infinite number of combinations for the APM= Pure*a + (1-a)*Stats step. ”
That is misidentifying the model outright. You might not think it is significant, but it appears to be the source of your confusion and this debate.
“Pure*a + (1-a)*Stats step.” is equal to “Overall +/=” which is not simply “Adjusted +/-“. And as I said that was published once 7-8 years ago and is not Adjusted +/- as commonly discussed in the media.
Crow
03/06/2011
Analyzing two models at once leads to lots of potential difficulty in conversation and in the flow and fairness of the analysis.
Whiffleball was talking about Adjusted +/- and you responded with a critique of Overall +/-. They are related but they are two models .
Crow
03/06/2011
They are related in Overall +/= but not in Adjusted +/-.
Crow
03/06/2011
“They” being the Adjusted +/- model and the Statistical +/- model.
Crow
03/06/2011
To be more precise there are 3 models- APM, SPM and Overall +/-.
Analyzing Overall +/- is something but it is analyzing a model that hasn’t been published in 7-8 years. If you want to talk about Adjusted +/- and you said “Let’s talk about Adjusted Plus/Minus.” near the start of the article then stick to that topic. You changed the topic and want to talk about Rosenbaum’s Statistical +/- and Overall +/- models and then you want to discredit Adjusted +/- by critiquing all three. That is your choice but that is not good research practice in my eyes. Maybe you should have said “let’s talk about Rosenbaum’s 3 old models”.
Crow
03/06/2011
Nothing that gets done in Statistical +/- or Overall +/- gets back to pure Adjusted +/-. Nothing.
Crow
03/06/2011
Perhaps past or current Statistical +/- or past Overall +/- could or should be used as a prior in a current hybrid +/- but that would be creating a new model.
arturogalletti
03/06/2011
Crow,
Hmm.
Let’s check at the source:
http://www.82games.com/barzilai2.htm and I quote:
“Adjusted +/- ratings indicate how many additional points are contributed to a team’s scoring margin by a given player in comparison to the league-average player over the span of a typical game (100 offensive and defensive possessions). These ratings are considered “adjusted” since they start with the simple +/- rating and apply a regression model as outlined by Rosenbaum to adjust for the impact of all other players on the court. All players in the top 75% in minutes during the season have been modeled, and the results have been centered so that the league-average player has an adjusted +/- value of zero. These ratings have been determined using only the matchup data available at basketballvalue.com. They do not explicitly include box score statistics, but they do reflect the value created by amassing such statistics as they contribute to a team’s net point differential. In addition, adjusted +/- ratings capture the valuable effect of myriad aspects of the game that go unrecorded in box scores, such as setting picks, boxing out, and defensive play. ”
Nope, the criticism is totally and completely valid.
EvanZ
03/07/2011
“They do not explicitly include box score statistics, but they do reflect the value created by amassing such statistics as they contribute to a team’s net point differential. ”
What part of “do not explicitly include box score statistics” do you not understand?
arturogalletti
03/07/2011
Evan,
What part of:
“These ratings are considered “adjusted” since they start with the simple +/- rating and apply a regression model as outlined by Rosenbaum to adjust for the impact of all other players on the court.”
&
“and the results have been centered so that the league-average player has an adjusted +/- value of zero. ”
don’t you?
Let me reiterate simply :
I ran the model here (Rosembaum:http://www.82games.com/comm30.htm) and here( Witus: http://www.countthebasket.com/blog/2008/06/01/calculating-adjusted-plus-minus/)
Both have an extremely low R^2 in the initial regression.
Both don’t line at all at the team level when estimating point margin.
Both produce results that are inconsistent over time
The first winds up with an R^2 of 95% after adjustment.
Feel free to confirm on your own.
EvanZ
03/07/2011
Arturo, just so I get this straight. You are now saying that basketball-value and Eli W. use (or used) statistical +/- (i.e. box score stats regressed onto unadjusted +/- data) in their calculation of adjusted +/-. Do I have that correct?
Crow
03/06/2011
Nope it is not.
“the simple +/- rating” mentioned is the net scoreboard change, not Statistical +/-.
“They do not explicitly include box score statistics (that would be a metric like Statistical +/=, which was not used), but they do reflect the value created by amassing such statistics as they contribute to a team’s net point differential.” (Yep, because that is what net change in the scoreboard is.)
Crow
03/06/2011
team’s net point differential = net change in the scoreboard for all play stints.
= value of counted and uncounted activity, which Adjusted +/- is set to reflect and then apportion according it s methodology rather than the boxscore or a boxscore based metric.
Crow
03/06/2011
Another quote from your linked source:
“It is important to note that the adjusted +/- rating is not a “holy grail” statistic.”
The argument against Adjusted +/- is an argument against Adjusted +/- in the hands and eyes of some users, but not all.
Crow
03/06/2011
“they do reflect the value created by amassing such statistics as they contribute to a team’s net point differential.”
But that next sentence was “In addition, adjusted +/- ratings capture the valuable effect of myriad aspects of the game that go unrecorded in box scores, such as setting picks, boxing out, and defensive play.”
Yes Adjusted +/- captures both at player level. That was the full statement made. Your apparent interpretation is quite different from what I think the full passage meant. It was not talking about Statistical +/- being used in Adjusted +/- , just that the value of boxscore is also covered in Adjusted +/- as is the uncounted valuable activity.
Boxscore based metrics can’t capture value from what is not counted, so they assign all value to the counted stats. A practical solution but one that leaves some valuable activity unrecognized separately. That is true for Statistical +/- too and it is a flaw or compromise. Overall +/- is one attempt to weight both but flawed too as previously stated.
arturogalletti
03/06/2011
Crow,
They’re saying “were doing it like Rosembaum”
Assuming they use the 1st step and regress (<5% R^2) then they adjust in some way (which they don't explain but I'm assuming it similar to step 3) to make sure the results line up to actual point margin and get that artificially inflated R^2.
One word: "Shenanigans"
whiffleball
03/07/2011
What adjustment are you talking about? I know the basketballvalue method because I’ve emailed them to ask. There’s no statistical +/- used in the model.
The only adjustment I can think of is simply finding the mean APM for all players in the model, and if it’s +3, just subtract 3 from every player. Part of the reason the mean isn’t naturally zero is because any player who doesn’t meet a minutes criteria is categorized as “other”, so a lineup of James/Wade/Bosh/Arroyo/JuwanHoward for example would have an E(y) where James=1, Wade=1, Bosh=1, and Other=2. As a result, “Other” plays a significant amount of minutes and is usually -4 or so.
Alex
03/07/2011
I think when basketballvalue says that they’ve set it up “like Rosenbaum”, they just mean in terms of the initial regression model. So their numbers would just be the poorly-fitting ‘pure’ plus/minus, while Rosenbaum’s would have the other questionable steps.
arturogalletti
03/07/2011
Alex,
My issue is the low correlation and what the specifics of the adjustment are. The adjustment to get it from 5% to 95% correlation is so large as to throw into question the results completely.
Crow
03/06/2011
I think you are misunderstanding in fundamental ways but I can’t seem to clarify that enough for you so I will probably stop soon.
You say you want to evaluate a model then you change it for the evaluation. You analyze the oldest, simplest version then say analysis of the newer, better versions aren’t worth your time. You mix critiques of three models at once. You are still apparently convinced that Statistical +/- impacts Adjusted +/- when Adjusted +/- pre-existed Statistical +/- and was used to derive Statistical +/-. I don’t know if I’d use the word “shenanigans”, but I will use the phrase “Doubting Thomas” for me of your analysis on these models concurrently.
Crow
03/06/2011
Correction: I should have used the word “disagreeing” instead of “doubting Thomas” because I am not Thomas and I am not merely “doubting”; I am saying you are misunderstanding and /or misrepresenting what goes into pure Adjusted +/-.
The thing is, you don’t need to do what you are doing. It hurts your case. Critique Adjusted +/- on its own as it is constructed and just as it is constructed. And as it is used by some, beyond what it should be. Critique other models too if you want but critiques of Statistical and Overall +/- do not address or go back to pure Adjusted +/-.
Crow
03/06/2011
In Adjusted +/- we already know the net point margins of the stints. The Adjusted +/- fits the Adjusted +/- ratings for the best possible player solutions for the whole database of the point margins of the observations. No loop to step 2 or 3 going back to pure Adjusted +/-. None. They are extensions from Adjusted +/-.
Greyberger
03/06/2011
Forgive me if I’m being obtuse, but why not just contact the proprietors of Basketballvalue.com and ask? As I understand they’re reasonably public and they can better explain whether the statistical +/- component you’re talking about is part of the ratings there or just a confirmation tool.
Aaron
03/07/2011
Arturo,
A couple people just pointed me to your post, so I wanted to clarify what I’m providing on basketballvalue.com. I think there might be a little confusion.
Certainly, as you quoted from my web page “The adjusted +/- calculations are in the spirit of the work of Dan Rosenbaum”. On basketballvalue.com, I report both unadjusted numbers (e.g. Overall Rating) and adjusted numbers (e.g. 1 Year Adj. +/-). These adjusted numbers are the result of the basic regression as outlined in Rosenbaum’s article that you link to, labeled formula (1). In that paper, he lists the results in Table 1 as “Pure Adjusted Plus/Minus Ratings”.
The rest of your post above gets into the technique that Dan introduces for Statistical Plus/Minus ratings and then Overall Plus/Minus ratings. While that’s interesting work, that’s not what is presented on basketballvalue.com. I tried to make that clear in the various explanations that have been posted, including the 82games articles referenced in the comments(“These ratings have been determined using only the matchup data available at basketballvalue.com. They do not explicitly include box score statistics”). Sorry if that still comes across as a little opaque.
Also, I just thought I should mention that the purpose of the site is really just to be a service to the community. The raw data is available for download so that others can use it in their work, which I find really rewarding. The unadjusted and adjusted +/- results are also updated on an almost daily basis so that people who are interested but might not be able to reproduce them on their own can see them. I don’t include commentary, and I think my position is summed up best in the conclusion of http://www.82games.com/barzilai2.htm :
“the results must be one piece of a broader assessment of the player.”
Thanks,
Aaron
arturogalletti
03/07/2011
Aaron,
Thanks for the thoughtful response. This did clarify a lot for me.
I greatly appreciate the data services provided and find them very useful.
The numbers reported on your site can correctly be said to more closely resemble step 1 of the Rosenbaum paper (with an additional step to center the data). This is the initial step that I worked out to about a 5% R^2.
I do totally agree with the statement :“the results must be one piece of a broader assessment of the player.” and I think I stated as much earlier.
Greyberger
03/08/2011
Are you going to fix it?
arturogalletti
03/08/2011
Greyberger,
The piece and the comments stand on their own merit I think. The piece deconstructs the Rosenbaum model in full. I’ll put a note in for the Basketballvalue feedback (and my response).
I will come back specifically to this in the future and expand on it further.
Crow
03/07/2011
Greyberger you made a good suggestion. It doesn’t appear that what I, whiffleball, Alex and Evan have said can convince Arturo or at least they haven’t yet. I’d hope he’d contact Aaron directly but that is ultimately his choice.
EvanZ
03/07/2011
I contacted Aaron and he wrote a response. I assume it will be posted here soon.