Prove me wrong, Rook Part 1

Posted on 10/08/2010 by Arturo Galletti

Readers familiar to this blog know that I’ve been working on a model to predict success in the NBA using the Wins Produced metric (See the Basics here). In a sense, it’s the mission statement of this blog. The intent is to shake out the tools and build a model piece by pace, put it through it’s paces, rinse and repeat and over time get closer to simulating the truth.

I'm not quite trying to build a universe here (Image courtesy of xkcd.com)

The development version is already up and released to the public for beta testing (see here) and the full pre-season build is coming (and endless refinements as the season goes along) but before I get to that I need to deal with one of my favorite topics: the draft and rookies.

Sure there are better rookie images but Frank Quietly is awesome

Now the draft is notoriously hard to model and a simple answer would be to just use some dummy variables for rookies and carry on but readers by now know that I never take the easy path. So the question becomes how do we model rookies?

For this exercise, I went ahead and did a full build combining all the combine data from Draft Express (yes all of it, I have been working on this for a while) with all the WP48 data for rookies. Then I took the data and started looking for variables that correlate to rookie year Raw Productivity per 48 minutes (ADJP48) . Please note that I said rookie year and not 1st 4 years that is a slightly different model (and post :-)). I found the following variables that correlate in a meaningful way:

Height
Position
Age when drafted
Win Score per 40 minutes

The equation I came up with based on these variables is:

ADJP48 = K – A* HEIGHT + B* SIMPOS – C* DFTAGE + D* WS40

Were K,A,B,C,D are constant

With a correlation of 42% for every player that played more than 400 minutes as rookies coming from college (from 1996 to 2010 that’s 373 players). In Graph form it looks something like this:

The full table is here. But what does it actually mean? When I look at the error by Age and Position I see the following:

The model is consistent and it’ll allow me to look at a player and predict within reason who they’re going to be. Given that I only care about one side of the tail (i.e. if my model oversells a player (false positives) it costs me money, if it undersells him (false negatives) its money in my pocket) the model is better than the straight correlation indicates.

Let’s illustrate. Here’s the best ranked rookies who actually played from 1997 thru 2006 (the last ten year period where the draftees have at least 4 years of data):

If I consider a hit drafting a player who is at least a career .090 WP48 player then the model hit 36 of 50 times for 72%. So if I have multiple picks in a draft, I’m assured a decent player and since the average pick for the group is 13 these players will be available late. As for the last few years here are the recommended picks:

You’ll note that Blake Griffin isn’t in this group (hasn’t played yet) but overall the list is strong. Beasley is the turd in the punch bowl but I would remind everyone that he’ s only played two years in the league (and this might be by his own admission the first year he plays clean).

As for the misses?

Missing Lee and Odom hurts but it’ll have to do until we build a better college model.

So now that we have the model the next logical step is to project the incoming 2010 rookie class and I’ll do just that. Tomorrow. In part 2.

Part 2 is here

Posted in: Uncategorized

23 Responses “Prove me wrong, Rook Part 1” →

Alex

10/08/2010

Hey Arturo – It looks like you use simple position as a continuous variable here. Do you do better if you make it categorical? I assume the jumps in productivity aren’t the same from point to SG to SF to PF to center.

Reply

arturogalletti

10/08/2010

Very probably. Got to leave some improvement for the next version. I’ll play with running a by position regression equation..

Reply

jglanton

10/08/2010

Arturo,
The first the that came to mind when you used ‘height’ in the formula was to refine it to use ‘reach’. It might help remove some anomalies to separate the pterodactyls from the T-Rexes, as some of the pterodactyls overachieve for their height, and vice-versa.

Reply

arturogalletti

10/08/2010

We looked at reach as one of the variables and it didn’t really correlate strongly. The combine data is actually a big waste of time so far. So far the only questions that matter are:
Can you play?
What position?
Are you tall?
How old are you?
Everything else resembled noise. I will however revisit the combine data and the can you play question in the future.

Reply

Neal Frazier

10/09/2010

When looking at the age of the draftee, is the problem with younger players more that they aren’t mature enough to compete with men yet or is it that we haven’t seen them enough to figure out how good they will be yet? Not sure how you would tease this out in the numbers…

Reply

arturogalletti

10/09/2010

Actually, the model favors younger players. If you have to players with similar numbers go younger.

Reply

Shawn Ryan

10/09/2010

Damn Arturo! I want to be just like you when I grow up!

Reply

arturogalletti

10/09/2010

Thanks. Just wait till the sequel! 🙂

Reply

Fred Bush

10/10/2010

So, height is bad? Am I misreading your equation or are you burying the lede?

Reply

arturogalletti

10/10/2010

It’s a combined effect. College performance is devalued by height and increases with youth. So the performance number is more likely to correlate if you’re shorter and younger. So a 19 year old 6’6” center who lit it up is more likely to have success. If you’re tall and old you have to dominate in college to dominate in the pros.

Reply

Fred Bush

10/10/2010

If that’s true, I’m going to guess that’s a highly exploitable flaw in teams’ valuations of players. I would assume that most teams think that, all things being equal, a taller player would be better. How much of the difference between actual draft position and your algorithm’s draft position is explained by that single variable being (-) rather than (+)?

Reply

arturogalletti

10/10/2010

Though to tell but it’s significant. I’ll run some numbers. The point is height should lead to production or it’s worthless.

Reply

Evanz

10/11/2010

I see Horford and Speights on the list. Was Noah a miss?

Reply

arturogalletti

10/11/2010

Yogi missed him, Boo Boo got him (see part 2)

Reply

9 Trackbacks For This Post

Podcast News and borrowing generously from a post by Arturo « Nerd Numbers the Blog →
October 9th, 2010 → 00:03
[…] that I want to talk up one of the show’s hosts Arturo. He recently released a draft model. In it he shows he hit 36/50 times on players he predicted to be good and dings himself for missing […]
Prove me wrong, Rook (Part 2) « Arturo's Silly Little Stats →
October 10th, 2010 → 01:34
[…] I unveiled one of those, my rookie model and it was deemed awesome. Again, Frank Quietly = […]
Prove me wrong,Rook (Finale): Projecting the 2010 Rookies « Arturo's Silly Little Stats →
October 11th, 2010 → 01:07
[…] performance (Yogi and Boo Boo). For the math behind it see the Basics . For the model build see parts 1 & part 2. Now, we get to the payoff where we feed the numbers for the 2010 rookies into the […]
Rookie Preseason Statistics | The Wages of Wins Journal →
October 25th, 2010 → 00:45
[…] Stats” – has offered a few studies of rookies recently (see his “Prove me Wrong” series HERE, HERE, and HERE). His latest – reposted below – is a quick look at the preseason rookie […]
2012 NBA Draft Extravaganza: The Rankings | The Wages of Wins Journal →
June 26th, 2012 → 12:50
[…] built two models to predict the future performance of NBA draft picks (go here for the model build parts 1 & part 2 ). In very general terms the models use the available data to predict future […]
2012 NBA Draft Extravaganza: The Rankings | Epou →
June 27th, 2012 → 04:50
[…] future performance of NBA draft picks (go here for the model build parts 1 & part 2 ). In very general terms the models use the available […]
2013 NBA Draft Extravaganza: The Worst Draft Ever! →
June 25th, 2013 → 01:58
[…] built two models to predict the future performance of NBA draft picks (go here for the model build parts 1 & part 2 ). In very general terms the models use the available data to predict future […]
2013 NBA Draft Extravaganza rev.2 A Pretty Good Draft →
June 25th, 2013 → 12:44
[…] built two models to predict the future performance of NBA draft picks (go here for the model build parts 1 & part 2 ). In very general terms, the models use the available data to predict future […]
2013 NBA Draft Extravaganza Rev 3: Eliminating the Big Man Bias, the Euro Numbers and the Cheat Sheet →
June 27th, 2013 → 02:03
[…] original build in detail is achived here (parts 1 & part 2 ). In very general terms, the models use the available data to predict future […]

	Andrew Sutton on NCAA Advanced Stats (Take…
	Andrew Sutton on NCAA Advanced Stats (Take…
	brgulker on NBA Mid-Season Rankings and Re…
	Arturo Galletti on NBA Mid-Season Rankings and Re…
	Andrew Sutton on NCAA Advanced Stats (Take…

Prove me wrong, Rook Part 1

Leave a comment Cancel reply

Need to find something?

CC license

Follow me on Twitter

Recent Comments

Top Posts

Archives

Email Subscription

Prove me wrong, Rook Part 1

Share this:

Related

Leave a comment Cancel reply

Need to find something?

CC license

Follow me on Twitter

Recent Comments

Top Posts

Archives

Email Subscription