Whereof what’s past is prologue; what to come,
In yours and my discharge. –The Tempest
There’s a very good reason that I spent the past few days posting and diccussing a model t project rookie performance. I want to use the Wins Produced model to project future player productivity based on past performance. I want to use this model to forecast team success. I want to rank players, identify future stars and identify teams that have what it takes to contend and win.
This sound like great fun, but before we get there, I need to do some math (think of this as the show your work portion of the program).
So in this post we debut the productivity model for players in the NBA. For newcomers go here for the Basics . Here we’ll talk about players that are already in the NBA (for rookies go here , here & here ). Let’s get to the model building.
Building the Model
This was a complicated build to say the least. I used data for every player who played more than 400 minutes from 1978 on. I am projecting ADJP48 (raw player win production per 48 minutes). I built multiple models and refined them so I would get the highest possible correlation for the last five years. The models are as follows:
- Model #1: Last year’s ADJP48
- Model #2: Model 1 plus Age & Position model based on % change
- Model #3: Model 1 plus Age & Position model based on total change
- Model #4: Weighted average of last 3 year ADJP48 with a special rule that only looks at the last season for players under 25 years of age.
- Model #5: Model 4 plus Age & Position model based on % change
- Model #6: Model 4 plus Age & Position model based on total change
The correlations for these models look as follows:
With model 6 being the clear winner. Now 70 %to 77% correlation doesn’t sound like a lot but a funny thing happens when I account for minutes played:
The model improves with a larger sample size (minutes). So the prediction model is more accurate for the players that will get the most minutes. So we can have some good confidence in the ability of the model to forecast future success. We will of course put the model to the test in this space. As in soon. As in probably tomorrow.
Devin Dignam
10/12/2010
Ah, so you WERE working on it already…figures.
I think I noticed a typo:
* Model #5: Model 2 plus Age & Position model based on % change
* Model #6: Model 2 plus Age & Position model based on total change
Those are probably supposed to be:
* Model #5: Model 4 plus Age & Position model based on % change
* Model #6: Model 4 plus Age & Position model based on total change
And Model 4 looks suspiciously like the method I came up with to do my projections. What weights are you using, and what are the details of the Age and Position models (or are they to remain secret)?
arturogalletti
10/13/2010
Devin,
Good catch. I’ll fix it. I hopefully covered your last no-prize completely. I’ll work on the left handers at some point.
It’s been brought to my attention that I shouldn’t give everything away to the nice millionaires and billionares :-).
The weights are about .62,.26,.12. The Age and Position adjustment is the trick to it.
You must also remember that we seem to share a brain.
Devin Dignam
10/13/2010
Yeah, you covered it pretty well last time, and I know you have plenty of stuff you’ve been working on. Just reading through the post, I was confused for a bit and wanted to make sure that people didn’t get lost there (although I’m sure a good deal of people don’t really examine the methodology all that much – they just want the oh-so-juicy results).
I totally agree on keeping some stuff secret – that’s what’s going to get you hired one day, and you should guard it closely. There’s a reason that many “nice millionaires and billionaires” are so wealthy…and I’m afraid it has very little to do with “nice”.
My weights are .56, .34, and .11. Again, funny that our weights are relatively close. How did you come up with yours? I simply pulled a George Bush and went with my gut. Luckily my gut has a better track record than his.
Devin Dignam
10/13/2010
Actually, correction: the weights are .56, .33, and .11 (I swear I know how to round).
arturogalletti
10/13/2010
I actually used statistical software to come up with the initial weights then played around to come up with some good correlation. I actually think that if I did this for a living and not as a hobby I could significantly improve it by adding things like injury models and some peak year modeling as well. But as something I do on my free time? I’m very happy with the results.
ilikeflowers
10/13/2010
I imagine that injured players are really throwing off the correlations for the lower minute players. Is there data kept that would allow you to filter out likely injured players? I’m guessing that players who have large gaps where they didn’t log any minutes (for any team) where they previously played large minutes are going to be the injured ones. Then it would be easier to see how many minutes a non-injured player needs to play before the correlations get nice.
Also, it’d be cool to see the correlations for the population of players who changed teams versus the population of players who didn’t.
Double also, could these changes in correlations by minute be used to estimate the wp48 error range?
arturogalletti
10/13/2010
Wait til the next post 🙂
Jimbo (Oz)
10/13/2010
Arturo – I honestly think you should get Mr Berri to talk to Bill Simmons about the work you are doing – I think with a bit of encouragement you could get on as a podcast guest (if you wanted to), and turn this into a well paid full time job with an NBA team (hopefully the Spurs !!).
arturogalletti
10/13/2010
I love Simmons’s work but I’m fairly sure he’s not a WOW fan. See here and here. Nevertheless that would be something to cross off my bucket list.