I was looking at some of the regression models for wins in the NBA and decided to take a stab at building my own for fun. Just for the hell of it I’m going to post my final model and talk through the process of building it.
Project: Build a DIY Wins from Stats model for the NBA
Tools:
2002 to 2009 data from BasketballReference.com
Microsoft Excel
Minitab
The love and patience of a good woman.
Step 1: The Data Build:
The first thing to do for this and most analysis projects is build the data set . For reasons of time and availability I’m going to focus on box scores information offensive and defensive from 2002 to 2009. Here’s the stats:
FG (Field Goals Made), FGA(Field Goal Attempts),3P (3 Point Made), 3PA (3 Point Attempts), FT (Free Throws Made), FTA (Free Throw Attempts), ORB (Offensive Rebounds), DRB (defensive Rebounds), STL (Steal), TOV (turnovers), PF (Personal Fouls)
Please note that all stats used will be similarly scaled for easy comparison.
Step 2: The first regression (Defense and Offense Team and Opponents Stats Independently):
So now that I have all the data, I’m going to go ahead and shove into Minitab. I’ m going to start with just offensive own team stats:
The regression equation is
W = 84.0 + 0.0445 FG – 0.0583 FGA + 0.0550 3P – 0.00866 3PA + 0.0176 FT
– 0.0170 FTA + 0.0635 ORB + 0.0555 DRB + 0.0118 AST + 0.0683 STL
+ 0.0112 BLK – 0.0620 TOV + 0.00656 PF
R-Sq(adj) = 85.6%
So just looking at offense your stats explains 85.6% percent of variability in wins of the teams.
Lets do just defense opponent stats.
The regression equation is
W = 18.9 – 0.0509 OppFGM + 0.0423 OppFGA – 0.0215 OppFTM + 0.0055 OppFTA
– 0.0516 Opp3PM + 0.0205 Opp3PA – 0.0468 OppORB – 0.0220 OppDRB
– 0.00509 OppAsst + 0.00591 OppSTL – 0.0648 OppBlk + 0.0289 OppTOV
+ 0.00569 OppPF
R-Sq(adj) = 58.1%
So just looking at defense opponent stats explains 58.1% percent of variability in wins of the teams.
This would lead to the conclusion that What you do is more important than what the opponent does. I guess those soundbites about playing your game are right .Offense is a more Important Factor in winning in the NBA than Defense. Paul Westhead and Mike D’Antoni may have been on to something.
Step 3: The every variable regression:
Now let’s put all the variables together:
The regression equation is
W = 64.6 + 0.0743 FG – 0.0307 FGA + 0.0194 3P + 0.00513 3PA + 0.0397 FT
– 0.0155 FTA + 0.0364 ORB + 0.00278 DRB + 0.00332 AST + 0.0100 STL
+ 0.00308 BLK – 0.0169 TOV – 0.00484 PF – 0.0605 OppFGM + 0.0113 OppFGA
– 0.0275 OppFTM + 0.00692 OppFTA – 0.0378 Opp3PM + 0.00461 Opp3PA
– 0.0105 OppORB + 0.0135 OppDRB + 0.00032 OppAsst + 0.00181 OppSTL
– 0.00751 OppBlk + 0.00544 OppTOV + 0.00214 OppPF
R-Sq(adj) = 93.9%
So the box score variables explains 93.9% percent of variability in wins of the teams from 2002 thru 2009.
What’s interesting here is the weight of the variables:
If I’m a gm or coach looking at this data it’s clear that some of these variables are more important than others. With this in hand I can start planning a strategy to focus my money and my time to maximize return. Let’s table this up:
If I summarize this by facet of the game (Wins are by 100 additional of each stat):
Improving your shooting efficiency is the best thing you can do to win more in the NBA. Inefficient shooting (hello Iverson) kills your team. FG defense comes in second. Free Throws and then Rebounding are after. Everything else is relatively worthless.
marparker
07/15/2010
The link from WoW didn’t work, but I was able to find this post searching through the archives. Now the next interesting step to take this would be to figure out which individual player variables most effect team variables. I know I’ve read a study which found that point guard 3 point percentage which very important to team offensive efficiency.
Guy
07/15/2010
Interesting analysis. However, you’ve mistakenly included three defensive variables as “offense”: DRB, blocks, and steals. If you count those as defense, you will find that offense and defense are of approximately equal importance.
In fact we know this must be true, because the variance of points scored and points allowed at the team level is about the same. Teams vary in offensive and defensive quality by the same amount. So by definition, defense explains as many wins and losses as offense does.
arturogalletti
07/15/2010
Guy,
You’re totally right. I’ll probably revisit this at some point in the Future.
brgulker
07/21/2010
Excellent post, Arturo.
arturogalletti
07/21/2010
Thanks,
I’ve been actually thinking that I’m going to re-visit this in the next few weeks. As I’ve played more and more with Wins Produced, the idea of building a similar marginal value based model but based on incorporating off & defensive box score stats fully intrigues me. Maybe I can settle the David Lee question once and for all.
Chicago Tim
07/26/2010
Wait, turnovers and assists can be ignored? Pure point guards don’t matter? That doesn’t sound right.
arturogalletti
07/26/2010
Tim,
Assists have to be indirectly model as they are embedded in scoring. Turnovers are the last two. I’ll get back to this at some point and expand.