A Sunday Kind of Piece (Special Labor Day Edition Parte Dos): Building Probability Models for the NBA Playoffs

Posted on 09/06/2010 by


Peter Gibbons: So I was sitting in my cubicle today, and I realized, ever since I started working, every single day of my life has been worse than the day before it. So that means that every single day that you see me, that’s on the worst day of my life.
Dr. Swanson: What about today? Is today the worst day of your life?
Peter Gibbons: Yeah.
Dr. Swanson: Wow, that’s messed up.

Quote from Office Space

Happy Labor Day enjoy the vacation from being wage slaves everybody. No worries though, you and your stapler will be reunited tomorrow.

They get my red swingline over my dead body!

It’s time for part two of this week’s  in-depth statistical piece. As always, 100% satisfaction guaranteed or your money back :-) . We will continue today with the explanation of  how I built a probability model for the NBA playoffs (or as Andres called it in his  super fun series,  Time Machine Team, my magical flying unicorn) and some of the interesting things I’ve discovered along the way (See here for Part 1).

PAX is still this weekend(http://www.penny-arcade.com/)

The Customary Recap of Part 1

For those of you to lazy to click over to part one (you can’t be bothered?), I will quickly recap.

  • The first part of the build is figuring out how teams matchup and who’d win. For the the playoffs,I choose the top 6 because of my half baked theory (see here). The theory states that only your top 6 (rotation wise) matter come playoff time . For the purposes of this exercise I’m going to look at playoff numbers for  the last three season (2008-2010) where the top 6 in the rotation accounted for 80% of the minutes and 99.6% of the wins.
  • We then take the wins produced over the course of the season by both team’s top 6 players by minute allocation in the playoffs (the playoff rotation) and the average possesion per game for both teams and work out the average WP48 of the Top 6 (playoff version) for each team and adjust it to the standard number of possession in an NBA game in 2010 (188.9 possessions a game for both teams)). This gives us pace adjusted WP48 for the playoff rotation.
  • Now that we have a baseline for the playoff rotation and a an adjusted win production, we can come up with an equation to predict who will win when Team A faces Team B at a neutral site:

%TeamA beats TeamB @ neutral site  = 4*(Pace Adjusted WP48 Team A-Pace Adjusted WP48 Team A) + .500 (seriously go here for more detail)

  • The next step is to figure out Homecourt Advantage. Here we use the regular season data.  If we look at games from 1999 thru 2008 we see that the home team wins 60.6% of time and therefore this is the number we will use. The adjusted equations become:

%TeamA beats TeamB @ Home  = 4*(Pace Adjusted WP48 Team A-Pace Adjusted WP48 Team A) + .606

%TeamA beats TeamB @ Home  = 4*(Pace Adjusted WP48 Team A-Pace Adjusted WP48 Team A) + .394

So now we can use WP48, the playoff rotation and possession numbers to calculate the probability of Team A beating Team B in a single game played on a neutral site, at home and on the road.  Now we can discuss building the probability model for the 7 game series and finally putting it all together.

The Seven Game Series

To simulate the typical seven game series, I’m using a method straight from my probability classes, the binomial theorem.

The binomial theorem can be used to work out the probability of  a series of events with two distinct outcomes whose probabilities add up to 100%.  Let’s stay with basketball to explain. If we first assume the following variables:

x: Probability of Home team winning

y: Probability of Road team winning

And if the two teams play n number of games this can be written as:

\begin{align} (x+y)^n & = {n \choose 0}x^n y^0 + {n \choose 1}x^{n-1}y^1 + {n \choose 2}x^{n-2}y^2 + {n \choose 3}x^{n-3}y^3 + \cdots \\ & {} \qquad \cdots + {n \choose n-1}x^1 y^{n-1} + {n \choose n}x^0 y^n, \end{align}

The binomial coefficient  \tbinom nk can be interpreted as the number of ways to choose k elements from an n-element set (or win k games out of n) .   \tbinom nk is:

{n \choose k} = \frac{n (n-1) \cdots (n-k+1)}{k (k-1) \cdots 1} = \prod_{\ell=1}^k \frac{n-\ell+1}{\ell}

This is a little complicated so let me simplify.

For a three to seven game sets the expansion looks like this:

\begin{align} (x+y)^3 & = x^3 + 3x^2y + 3xy^2 + y^3, \\[8pt] (x+y)^4 & = x^4 + 4x^3y + 6x^2y^2 + 4xy^3 + y^4, \\[8pt] (x+y)^5 & = x^5 + 5x^4y + 10x^3y^2 + 10x^2y^3 + 5xy^4 + y^5, \\[8pt] (x+y)^6 & = x^6 + 6x^5y + 15x^4y^2 + 20x^3y^3 + 15x^2y^4 + 6xy^5 + y^6, \\[8pt] (x+y)^7 & = x^7 + 7x^6y + 21x^5y^2 + 35x^4y^3 + 35x^3y^4 + 21x^2y^5 + 7xy^6 + y^7. \end{align}

What this actually means is that for a three game set there is

  • 1 way for the home team to win all three games and the probability is simply
  • 3 ways for the home team to win two of  three games (1&2, 2&3, 1&3)
  • 3 ways for the home team to win one of  three games (1, 2 or 3)

And each term is the probability of the single result. Now a seven game series needs to be treated as a 4 game series for the home team and a 3 game away series (to simplify the math, no really). Let’s switch the variables a little bit. For team A and team B playing in a seven game playoff series ,with A being the higher seed  we have:

Ah: Probability of Team A winning @ home

Br: Probability of Team B winning on the road

and Ah+Br =1

Ar: Probability of Team A winning on the road

Bh: Probability of Team B winning @ home

and Ar+Bh =1

The equation for the 4 game set is:

(Ah+Br)^4= Ah^4+4*Ah^3*Br+ 6*Ah^2*Br^2+4*Ah*Br^3+Br^4

The equation for the 3 game set is:

(Ar+Bh)^3= Ar^3+3*Ar^2*Bh+ 3*Ar*Bh^2+Bh^3

As for the equation for team A winning the series?

Prob of Team A winning the series=

Prob of winning 4 Home games

Prob of winning 3 Home games * Prob of winning more than no away games

Prob of winning 2 Home games * Prob of winning more than 1 away game

Prob of winning 1 Home games * Prob of winning more than 2 away game


Prob of Team A winning the series=


(4*Ah^3*Br)* (1-Bh^3)+

(6*Ah^2*Br^2)* (1-3*Ar*Bh^2-Bh^3)+


Still there? Good now that we have an equation. Let’s put it all together.

Putting it all together

If you remember from yesterday we worked out home and away matchups for every team in the 2010 Playoffs:

Home team in blue in the left hand column

Road team in blue in the left hand column

One important note is that I capped out win probabilities at no better than 90% and no worse than 10%. This was done as the best team ever record wise still managed to lose 9 of 82 games (and the worst team ever still won 11 of 82).

To finish this exercise, If we run the numbers for the 2010 playoffs we get (home team in blue in the left hand column):

There are some interesting findings that come up related to the homecourt advantage but these will wait for a future post.

Posted in: Uncategorized