**Steve**: I lied. Um… All that stuff I said about being a crack head? It just helps me sell magazines. I’m actually an unemployed… software engineer.

**Peter Gibbons***: You’re a software engineer?
Steve*

*: Yup.*

[

[

*sighs*]**Samir***: Things, uh… it must be very rough for you.*

**Steve***: Actually man, I make more money selling magazine subscriptions, than I ever did at Intertrode!*

Quote from Office Space

So it’s not quite Sunday morning but it’s Labor Day (which explains the quotes and some of the images), so it’s time for a Sunday paper style feature and my try at an in-depth statistical piece. As always, 100% satisfaction guaranteed or your money back .

For this piece I will be talking about my magical flying unicorn.

Not that kind of unicorn. Let me explain. My Wages of Wins network colleague Andres Alvarez has been doing a super fun series called Time Machine Teams. In it, Andres takes a team’s roster from a given year and allow the team to use the best season from any player’s career. For the series, Andres asked me for help modeling his time travel series and as a result I sat down and came up with one. I’ve posted bits of it already ( see here and here). To quote Andres : *“It is more like having a friend and asking to borrow their bike and instead they lend you a flying horse.” *I lol’ed.

So for this post, we’re going to spend some time detailing how I built a probability model for the NBA playoffs and some of the interesting things I’ve discovered along the way.

**Straight up Win Percentage**

The first part of the build is figuring out how teams matchup and who’d win.** **There are multiple ways to do this but here’s what I chose (for the playoffs). I need to know the followings**:**

- The Wins produced over the course of the season by both team’s top 6 players by minute allocation in the playoffs.
- The average possesion per game for both teams .

I choose the top 6 because of my half baked theory (see here). The theory states that only your top 6 (rotation wise) matter come playoff time . For the purposes of this exercise I’m going to look at playoff numbers for the last three season (2008-2010). The table is as follows:

So the top 6 in the rotation accounted for 80% of the minutes and 99.6% of the wins. The next step is to work out the average WP48 of the Top 6 (playoff version) for each team and adjust it to the standard number of possession in an NBA game in 2010 (188.9 possessions a game for both teams). If I was being anal here, I’d do this on a matchup by matchup basis (but I’m not and the difference isn’t that large). A typical calculation for a team (say Milwaukee) looks like:

Minutes Played (Playoff Top 6):10892

Wins Produced (Playoff Top 6):27.3

Avg WP48 (Playoff Top 6): .120

Pace : 187.65 Poss a game

Pace Adjusted Avg WP48 (Playoff Top 6): .121

If we look at the 2010 teams:

Now for the match-up calculation. If we take Deer vs the Hawks (assume for now that they’re playing at a neutral site), we can work out the probability of the better team winning by comparing the WP48 of their playoff rotations. The calculation looks like so:

Atlanta WP48 (Playoff Top 6):.162

Milwaukee WP48 (Playoff Top 6):.121

Probability of the Hawks winning at a neutral site =5*Margin per 48 MP * %of Minutes Played /% of Wins Generated + .500

We can simplify this to %Hawks win @ neutral aprox equal to = 4*(.162-.121) + .500 = 66%

So Atlanta would be expected to beat Milwaukee based on their playoff rotation and Regular season performance 66% of the time at a neutral site. If we expand to look at the entire group of playoff teams for 2010:

The next step is to figure out Homecourt Advantage.

**Home-Court Advantage**

For the Homecourt advantage I had a series of options. I could:

- Ignore it. But this is not reasonable so it was discarded.
- Use the playoff data. But this data set has no guarantee of even team over time on both sides of the equation
- Use the regular season data. This data set is guaranteed to have an even sample in terms of teams. If the homecourt advantage did not matter over a large enough sample the data would converge to .500. It doesn’t. If we look at games from 1999 thru 2008 we see that the home team wins 60.6% of time and therefore this is the number we will use.

So to adjust Homecourt we can simply substitute .606 for .500 in our neutral site equation for the home team and .394 for the away team. Calculating for the Hawks and Deer again

Probability of the Hawks winning at Home = 4*(.162-.121) + .606 = 76.6%

Probability of the Hawks winning on the Road = 4*(.162-.121) + .394 = 55.8%

If we again work this out for ever team in the 2010 Playoffs:

So at this point, I’m going to stop for the day. Tomorrow I will go into part 2 and discuss building the probability model for the 7 game series and finally putting it all together**.**

**See you tomorrow for part 2**

*Uncategorized*

reservoirgod

09/06/2010

Arturo:

Very interesting! Wish this existed when I was writing my bleacher report article for Team USA. How does this method compare to the log5 method developed by Bill James that was adapted and improved for basketball by Ken Pomeroy (from kenpom.com & Basketball Prospectus) as well as Ed Kupfer (from APBRmetrics)?

arturogalletti

09/06/2010

RG,

I’d have to take a look. I’m using a method straight from my probability classes. I’m figuring out the probability of two discrete events (winning at home and away). I’m using win production adjusted for pace but i’d have to see what they use. In Part 2, I’m using the combinatorial interpretation of the binomial theorem to work out the solution to winning a seven game series (and I know that might be a little obscure but I’ll explain soon :-)).

arturogalletti

09/06/2010

So log 5 does:

Probability of Team A beating Team B = [(Team A win%) * (1 - Team B Win%)] /

[(Team A win%) * (1 - Team B Win%) +

(1 - Team A win%) * (Team B Win%)]

and this is good and comes from Bayes theorem. For Homecourt Advantage it does

Probability of Team A beating Team B =

[(Team A win%) * (1 - Team B Win%) * HCA] /

[(Team A win%) * (1 - Team B Win%) * HCA +

(1 - Team A win%) * (Team B Win%) * (1 - HCA)]

with HCA set at .606

I’m doing Marginal win production per game adjusted for Possesion and Homecourt to calculate the probability of Team A beating Team B. I’d be interested to see how these line up. I suspect that the correlation will be strong the more homogenous the group of opponents is.

nerdnumbers

09/06/2010

Arturo,

Again I love this model. The first round of the playoffs should go in this week. As I’m sure you notice (especially with your Clone article) it is really hard to keep up with all the good ideas readers + other articles give.

Something I really like about this model is its simplicity, as in what it does and why you did each step and what each step does is very clear. Reading this article made me say “that all makes sense!”, whereas reading an article on PER makes me suspicious that a lab animal smashed the keys to make a formula and a grad student typed up a random key

arturogalletti

09/06/2010

We did warn you about working with PER. :-)

neal frazier

09/06/2010

I have a question I just thought of about your ‘half baked notion’ – Even though the top 6 produce ~99% of the wins of playoff teams on average, the rest of the team can still be vitale to winning. I am imagining a scenario where the playoff pool consists of 2 types of teams, a deep type where 7+guys produce +2 wins collectively and a shallow type that produce where 7+ guys produce -2 wins collectively. The average would be 0 wins produced by the 7+ guys, but the reality for each team individually would be that the 7+ guys would have a big impact on team outcomes.

When you look at the wins produced of the 7+ guys on a team by team basis are they mostly close to 0 or are there teams with strong negative and strong positive production?

arturogalletti

09/06/2010

Great question. The short answer is not really. Only about 6% of the 7th and 8th players made any impact. The long answer will be upcoming in a future post :-)