Steve: I lied. Um… All that stuff I said about being a crack head? It just helps me sell magazines. I’m actually an unemployed… software engineer.
Peter Gibbons : You’re a software engineer?
Steve : Yup.
Samir : Things, uh… it must be very rough for you.
Steve : Actually man, I make more money selling magazine subscriptions, than I ever did at Intertrode!
Quote from Office Space
So it’s not quite Sunday morning but it’s Labor Day (which explains the quotes and some of the images), so it’s time for a Sunday paper style feature and my try at an in-depth statistical piece. As always, 100% satisfaction guaranteed or your money back .
For this piece I will be talking about my magical flying unicorn.
Not that kind of unicorn. Let me explain. My Wages of Wins network colleague Andres Alvarez has been doing a super fun series called Time Machine Teams. In it, Andres takes a team’s roster from a given year and allow the team to use the best season from any player’s career. For the series, Andres asked me for help modeling his time travel series and as a result I sat down and came up with one. I’ve posted bits of it already ( see here and here). To quote Andres : “It is more like having a friend and asking to borrow their bike and instead they lend you a flying horse.” I lol’ed.
So for this post, we’re going to spend some time detailing how I built a probability model for the NBA playoffs and some of the interesting things I’ve discovered along the way.
Straight up Win Percentage
The first part of the build is figuring out how teams matchup and who’d win. There are multiple ways to do this but here’s what I chose (for the playoffs). I need to know the followings:
- The Wins produced over the course of the season by both team’s top 6 players by minute allocation in the playoffs.
- The average possesion per game for both teams .
I choose the top 6 because of my half baked theory (see here). The theory states that only your top 6 (rotation wise) matter come playoff time . For the purposes of this exercise I’m going to look at playoff numbers for the last three season (2008-2010). The table is as follows:
So the top 6 in the rotation accounted for 80% of the minutes and 99.6% of the wins. The next step is to work out the average WP48 of the Top 6 (playoff version) for each team and adjust it to the standard number of possession in an NBA game in 2010 (188.9 possessions a game for both teams). If I was being anal here, I’d do this on a matchup by matchup basis (but I’m not and the difference isn’t that large). A typical calculation for a team (say Milwaukee) looks like:
Minutes Played (Playoff Top 6):10892
Wins Produced (Playoff Top 6):27.3
Avg WP48 (Playoff Top 6): .120
Pace : 187.65 Poss a game
Pace Adjusted Avg WP48 (Playoff Top 6): .121
If we look at the 2010 teams:
Now for the match-up calculation. If we take Deer vs the Hawks (assume for now that they’re playing at a neutral site), we can work out the probability of the better team winning by comparing the WP48 of their playoff rotations. The calculation looks like so:
Atlanta WP48 (Playoff Top 6):.162
Milwaukee WP48 (Playoff Top 6):.121
Probability of the Hawks winning at a neutral site =5*Margin per 48 MP * %of Minutes Played /% of Wins Generated + .500
We can simplify this to %Hawks win @ neutral aprox equal to = 4*(.162-.121) + .500 = 66%
So Atlanta would be expected to beat Milwaukee based on their playoff rotation and Regular season performance 66% of the time at a neutral site. If we expand to look at the entire group of playoff teams for 2010:
The next step is to figure out Homecourt Advantage.
For the Homecourt advantage I had a series of options. I could:
- Ignore it. But this is not reasonable so it was discarded.
- Use the playoff data. But this data set has no guarantee of even team over time on both sides of the equation
- Use the regular season data. This data set is guaranteed to have an even sample in terms of teams. If the homecourt advantage did not matter over a large enough sample the data would converge to .500. It doesn’t. If we look at games from 1999 thru 2008 we see that the home team wins 60.6% of time and therefore this is the number we will use.
So to adjust Homecourt we can simply substitute .606 for .500 in our neutral site equation for the home team and .394 for the away team. Calculating for the Hawks and Deer again
Probability of the Hawks winning at Home = 4*(.162-.121) + .606 = 76.6%
Probability of the Hawks winning on the Road = 4*(.162-.121) + .394 = 55.8%
If we again work this out for ever team in the 2010 Playoffs:
So at this point, I’m going to stop for the day. Tomorrow I will go into part 2 and discuss building the probability model for the 7 game series and finally putting it all together.
See you tomorrow for part 2