Computer crashes. A long holiday weekend. A random flash flood in a shopping mall parking lot. Forgetting your ipod at work. A last minute decision to visit the in-laws. All seemingly unrelated events that lead to the present state of reality where I’m sitting in my mother in-law’s house on my wife’s computer writing a post, which since I don’t have all my files, was not my originally predicted one. This post is the result of odd series of circumstances. It is an outlier. Fittingly, since I’m going to tell the story of the worst team in the wins produced database, the biggest outlier in the model and how now matter how good a predictive model is, sometimes weird s#%^ just happens.
Some background first. As part of our on-going attempts to improve the wages of win network, Andres Alvarez has been working on expanding and improving his excellent automated wins produced site (see here). I have the honor of helping with the beta testing of the data. As part of that testing I got to pool Andres database with Prof. Berri’s for comparison. Both have their limitations ( the algorithm for the automated build around position allocation needs some further refinement & Prof. Berri has to enter his data manually to our ever-loving gratitude and appreciation) but the results are strikingly similar. Over the period since the merger, both models correlate to team wins at 94% and both agree on their largest outlier, the worst team since the Merger.
The Worst Team since the Merger
The worst team according to wins produced is clearly the 1993 Dallas Mavericks. This team is tied for worst record since the merger (98 Denver before you ask) but the wins projections for this team were off the charts. The automated model actually predicted negative wins for this team. To put it in perspective here are the ten worst teams by projected Wins Produced:
You’ll note that no team is even within 10 projected wins of this team. This team was projected to reasonably expect to lose every game. This team is the super team of losers. They are in fact a bizarro version of the Miami Heat and their out of control wins projections for next year.
See a funny thing happened with that Mavs team, they won 11 games. They beat some teams with winning records. What happened? Random s%$^ happened but this team helped point out a limitation to our model. The Wins produced model treats all wins as being created equal and being equally valued events and while this is a reasonable approximation in most cases it still that, an approximation. Wins (and Loses) are a limited resource and logically. getting the 72 win (or loss) of a season is incrementally harder than the 41st .
Let me expand on this point a bit. If we look at wins since the merger and exclude the strike season in 99, we find that wins are normally distributed with a std deviation for wins of 12.6 wins total. This means that we can expect >95% of all seasons to fall between 16 and 66 wins and it also helps the theory that incremental wins/loses outside of these parameters require incrementally bad or good performances.
So, A team as monumentally bad as the Mavericks can still win games because conditions on a game to game basis are to variable to guarantee the result. Injuries, off days, bad refereeing, back to backs ,meaningless end of the season games all happen and can play havoc with our models. So all insanely high or insanely low projections have to be taken with a grain of salt. So when we look at the 2010 to 2011 season in general and the heat in particular care will have to be taken to temper our expectations.




Guy
07/25/2010
Arturo: The differences between projected and actual wins at the extremes is exactly what you’d expect, not an anomaly. Win Produced assumes a linear relationship between points and wins (33 points = 1 win). That’s a fine approximation for measuring individual players, and will even work for teams 95% of the time. But the real relationship between points and wins is described by the “pythagorean expectation” of (PS^x/(PS^x+PA^x). This means at extreme negative point differentials (more than -10) the linear model will predict too few wins, and at high positive differentials will predict too many.
There is some disagreement over the best value of X for basketball, but let’s use exponent of 14. Here are the projections:
Differential/Linear Wins/Pythag Wins/Actual Wins
phi96 -10 / 16 / 15 / 18
Den98 -12 / 11 / 12 / 11
Dal93 -15 / 4 / 8 / 11
You can see that actual records are well within range of random variation given their pythag projections. The same thing happens in reverse at the high end: a +15 differential will yield about 74 wins in reality, but WP will project 78. If you want to project extreme teams (like the Heat), just convert WP into a projected point differential, then use pythag to estimate wins.
arturogalletti
07/25/2010
Guy,
I didn’t get into it because of a lack of minitab but that was precisely my point. If you plot the predicted win function vs actual there are differences and they get worse at the edge. The point differential conversion is a valid transformation to the model to deal with extremes. The point was that at the extremes the skew of the model (caused by linearity) becomes more apparent but the model is good in the vast majority of cases. Good reply though.
Chicago Tim
07/25/2010
Very interesting point. There are many reasons why the Heat may fall short of their sky-high projected win total. But I still think, barring major injuries, they should easily win enough to secure home court advantage in the playoffs, and that’s all they really need.
arturogalletti
07/25/2010
It might matter if you’re thinking of going to Vegas though.
Jimbo
07/26/2010
Arturo – can you explain something for me ? How does your / Prof Berri’s model deal with the points margin of a game ? From what I read from Hollinger’s power rankings, this is just as important in predicting success as the actual number of wins…
arturogalletti
07/26/2010
Jimbo,
It’s Prof. Berri’s bike, I just like to take it out for spins and he very kindly lets me. As for points margin, I’m going to point you at the explanation of Wins Produced first. Wins Produced uses all the listed variable to produce a linear model for wins. Point margin is built into three of the numbers in the calculation: PROD (Player’s Production), DEFTM48 (Or the defensive adjusment which just normalizes defenses to the league average) and the Position adjustment (Opponent’s Productivity). Pythagoreans models for points look at point differential as a % of Points scored and use that to predict wins (and do a very good job at that). Wins Produced is in essence looking at productivity differential as it relates to Wins (which allows for better accuracy at measuring player value). So Point models are very accurate at wins by team but not at segregating value by player or as predictive/forecasting tools. Wins Produced does a good job at predicting wins, allocating value to individual player’s and predicting value over time for those player’s but by being a linear model loses accuracy at the extremes which are better handled by something like a points model. In the future I may look at some sort of conversion from WP to Point’s differential for extreme cases.