A Guide to jumping to conclusions in the NBA and Employee of the Year for 2009-2010

Posted on 11/18/2010 by

36



My first move as the manager of the machine shop was to introduce standardized work. –Taiichi Ohno father of the Toyota Production System

Quality over Quantity. Consistency. These are the hallmarks on which truly excellent organizations are founded. The centerpiece of the Toyota Production system is the reduction of variability. Variability leads to waste and loss and that is anathema to true success.

This is going to get very real, very fast. Before we start sling tables and numbers, feel free to go read and review the Basics.

Real vs. Fantasy (Image courtesy of xkcd.com)

Success in the NBA should be no different. I‘ve talked before (see here for example) about the value of consistency night in and night out. Where I a GM, a player/employee who consistently delivers night in and night out would the ideal employee. One of my priorities would be to have a system to evaluate total productivity and productivity variation for all the players on my roster. Luckily, Wins Produced provides just such a framework for this analysis. The quicker I can complete this analysis and determine the value and potential of my roster, the better my edge against other GM’s in trades. My enemy here is time and a small sample size. I want to reach valid conclusions on player talent ahead of the market but I am aware that the quicker I reach conclusions, the larger my error will be.

This post will focus on two things: Player variability/reliability and the size of error introduced by sample size. I want to rate the players and I want to know how quickly I can do it. For this I’m going to need a hell of a lot of data. Luckily I have Andres Alvarez, and his mad skills ( All Powered by Nerd Numbers) at my disposal. Andres went out and did splits for every player and every game for last season. Did you know that 442 players combined for  24796 individual games played last season in the NBA last season? Now you do thanks to Andres J. With all this Wins Produced data in hand for the 2009-2010 Season, I can go off and do an analysis of value, variability and predictive value of the numbers by chronological sample size.

Now, not every game qualifies for this analysis. To qualify as a sample, I’m requiring ten minutes played for a game. For the player to qualify for the ranking/evaluation sample, I want at least 20 game samples and 800 minutes played. For the correlation analysis, I’m going to use players with at least 50 games samples. This leaves 232 players and 16616 game samples for correlation (all players for 2009-2010 with at least 50 games with >10 MP, Avg is 71 games) and 286 players and 18936 game samples for the Reliability Value (or Employee of the Year)  rankings (Minimum 20 Games with >10 Minutes Played & >799 Minutes total). Enough with the talking, let’s get to variability.

How good is that sample in the internet?

I’ve said before that the beginning of any season is a tantalizing time full of promise, expectations but mostly questions. Will my favorite team/player be better or worse than expected? Will a team’s surprising/disappointing start prove to be a mirage or be sustained thru 82 games? How fast can we start jumping to conclusions? For the media who has no real conscience or memory, the answer can be measured in nanoseconds.

For the hopefully rational group of people that are my readers, this is a much tougher question. We know of things like the law of large numbers (LLN). As the number of samples in a data set increases we will get closer and closer to the real value of something  and conversely, the error (or more accurately  the possibility of it) gets larger and larger the smaller the sample. So rushing to judgement based on a small sample is premature. A larger sample size is called for before we can make any solid conclusions.

We know that already. Frankly just saying get a larger sample size is a little fuzzy for my tastes. Luckily, you know me, data, excel and math by now. If there’s some sort of answer to be found, I’m going to give it the ol’Harvard try.

Let’s take a look at last year’s game data. We’ll look at numbers for the full year and for chronological game samples for 5,10,15,20,25,30 and 40 games. We’ll look at:

  • Raw Productivity (ADJP48) correlation to final Season Number: This is how closely the sample correlates to the final full season number for the player
  • Average Total Error in ADJP48 from Sample to total for season (ADJP48): This is the difference from the sample value to the final full season number for the player expressed in Raw Productivity per 48 minutes (ADJP48)
  • % Avg. Total Error in ADJP48 from Sample to total for season: This is the difference from the sample value to the final full season number for the player expressed as % of Final season number
  • Avg. Absolute Error in ADJP48 from Sample to total for season (ADJP48) : This is the absolute difference from the sample value to the final full season number for the player expressed in Raw Productivity per 48 minutes (ADJP48)
  • % Avg. Absolute Error in ADJP48 from Sample to total for season: This is the absolute difference from the sample value to the final full season number for the player expressed as % of Final season number
  • Std Deviation of Raw Productivity (ADJP48) correlation to final Season Number: This is how closely the sample variation correlates to the final full season variation for the player
  • % Avg. Absolute Error in stddev of ADJP48 from Sample to total for season: This is the absolute difference from the sample variation to the final full season variation for the player expressed as % of Final season variation

Table Time!

Fascinating. Let’s analyze.

The second column tells us that once we have a 15 games sample the correlation is above 75% (which is good). 20 games is above 80%, 30 gives us 90% and 40 games is almost a lock at 94%. So at this point in the season player productivity for the full year can be predicted with about 70% accuracy (depending on sample size). In two more weeks this should be close to 80%.

In terms of overall error (column #3 & #4), we do not see a lot of variation. This means that league wide productivity and things like position adjustments and replacement levels for players can be very accurately set with a very small sample (5 games yields a 2% variation).

As for absolute error (columns 5& 6), we see a similar story as with the correlation data. Right now you’d expect player productivity variation for the rest of the year to be about 15% to 20%. By the middle of december, this’ll be down to about 10%.

For the actual variability (columns 7 &8), the results are a little different. Correlation increases more linearly the larger the sample. However for absolute population variation the percentages track absolute error. So at this point you have a fair idea of a player game to game variation.

So to synthesize,  at this point in the season there’s about a 30% uncertainty in the numbers (assuming the data follows the 2009-2010 pattern but this is a safe assumption).  By the end of the year this’ll be down to 15% and by the All star break to 5%. I expect this might be improved by eliminating injured players and rookies.

That covers the hard math portion of our program. Let’s do some fun rankings!

Employee of the Year for the NBA in 2009-2010

A lot of you out there are going through your own year-end evaluations. Hopefully, you feel these are a fair assessment of your contribution to the success of your enterprise.Your value and your consistency was measured and compared to your peers and you we’re rated fairly in comparison to your peers. So the total opposite of the typical All-NBA Balloting.

How'd this get here?

What I will attempt here is to evaluate players based on the guidelines set above. I’ll look at numbers for 2009-2010 : WP48, Wins Produced, WP48 Std dev, WP48 I can expect  85% of Time. I’ll rank each player in each category and average the ranks. Player with the lowest average rank get the overall highest rating. If you remember we have 286 players and 18936 game samples for the  Employee of the Year  rankings (Minimum 20 Games with >10 Minutes Played & >799 Minutes total).

Table time again:

So Lebron James in a landslide. the top ten is rounded out by: Jason Kidd, Rajon Rondo, Mike Miller, Andre Iguodala, Pau Gasol, Al Horford, Ben Wallace, David Lee and Steve Nash.

The bottom ten reflects guys who should not by any means be on your team (sorry Mr. Pargo, Rasheed and others but ball don’t lie)

If we use this to do my own All NBA teams we get:

Thabo Sefolosha  was a huge surprise in the first team. Zach Randolph for the second team. But I guess together with everyone else on this list, they got the job done night in and night out.

 

WP48
Wins Produced
Std dev
>WP48 85% of Time
Worst Game
Best Game
Rank WP48
Rank Wins
Rank Variability
Rank Worst Day
Avg Rank
Rank
Posted in: Uncategorized