Readers familiar to this blog know that I’ve been working on a model to predict success in the NBA using the Wins Produced metric (See the Basics here). In a sense, it’s the mission statement of this blog. The intent is to shake out the tools and build a model piece by pace, put it through it’s paces, rinse and repeat and over time get closer to simulating the truth.
The development version is already up and released to the public for beta testing (see here) and the full pre-season build is coming (and endless refinements as the season goes along) but before I get to that I need to deal with one of my favorite topics: the draft and rookies.
Now the draft is notoriously hard to model and a simple answer would be to just use some dummy variables for rookies and carry on but readers by now know that I never take the easy path. So the question becomes how do we model rookies?
For this exercise, I went ahead and did a full build combining all the combine data from Draft Express (yes all of it, I have been working on this for a while) with all the WP48 data for rookies. Then I took the data and started looking for variables that correlate to rookie year Raw Productivity per 48 minutes (ADJP48) . Please note that I said rookie year and not 1st 4 years that is a slightly different model (and post ). I found the following variables that correlate in a meaningful way:
- Age when drafted
- Win Score per 40 minutes
The equation I came up with based on these variables is:
ADJP48 = K – A* HEIGHT + B* SIMPOS – C* DFTAGE + D* WS40
Were K,A,B,C,D are constant
With a correlation of 42% for every player that played more than 400 minutes as rookies coming from college (from 1996 to 2010 that’s 373 players). In Graph form it looks something like this:
The full table is here. But what does it actually mean? When I look at the error by Age and Position I see the following:
The model is consistent and it’ll allow me to look at a player and predict within reason who they’re going to be. Given that I only care about one side of the tail (i.e. if my model oversells a player (false positives) it costs me money, if it undersells him (false negatives) its money in my pocket) the model is better than the straight correlation indicates.
Let’s illustrate. Here’s the best ranked rookies who actually played from 1997 thru 2006 (the last ten year period where the draftees have at least 4 years of data):
If I consider a hit drafting a player who is at least a career .090 WP48 player then the model hit 36 of 50 times for 72%. So if I have multiple picks in a draft, I’m assured a decent player and since the average pick for the group is 13 these players will be available late. As for the last few years here are the recommended picks:
You’ll note that Blake Griffin isn’t in this group (hasn’t played yet) but overall the list is strong. Beasley is the turd in the punch bowl but I would remind everyone that he’ s only played two years in the league (and this might be by his own admission the first year he plays clean).
As for the misses?
Missing Lee and Odom hurts but it’ll have to do until we build a better college model.
So now that we have the model the next logical step is to project the incoming 2010 rookie class and I’ll do just that. Tomorrow. In part 2.