# Re-examining myths and explaining how regression works

Posted on 11/24/2010 by

You know, sometimes it’s fun to poke the bear.

Did you know that when the bomb squad wants to defuse a bomb and they don’t know how to, they take it out to an empty field somewhere and blow it up? That’s kinda the opposite of what we do here. My defining trait is that I’m an engineer and a scientist.  I don’t want to ignore questions and problems, I want to take a tool to it and try to solve or understand it.

Yesterday’s post was a result of this. I knew I was kicking an anthill. Today is no different.

Grab a drink, take a bathroom break, because this will take a while.

One of the hallmarks of society is intelligent polite discussion. If any argument,tool or theory is worth anything it can stand up to scrutiny and review (such as Wins Produced see The Basics) . I may not agree with you but this is why I take the time to respond to the questions. I typically arrive at surprising and unexpected conclusions.

The point of having a blog is to invite discourse. As long as everyone has a thick skin and is prepared to be wrong (and right ) , including me. We’ll keep advancing our understanding. Together.

And you know in this case, Guy is right.

What!!!!!

Not like you think (sorry, couldn’t help myself 😉 ). We do have the data to answer some of these  questions and some of the answers are really surprising.

Before we get to that, lets talk a bit about linear regression.

Let's put that Ivy League education to good use

Linear regression is one of the most commonly used approaches to modeling the relationship between an variable y (say wins) and one or more variables X (say box score stats). Linear regression is simple, useful and well understood and in a lot of cases it works.  It’s typically used for two things:

• Predicting or forecasting Y (say Wins) based on a know set of X’s. This is something we continually do here at ASLS.
• Given a variable Y (say wins again) and a numbers of variables (say, I don’t know, points, def. rebounds, offensive rebounds, assists, etc.) then linear regression analysis can be applied to quantify the strength of the relationship between them.  This is what Prof. Berri did. He did it again in Stumbling on Wins. I did it here and at least five other times.

What  does this mean for this discussion? I’m happy with the coefficients math gave me for rebounding and other variables, I’m not changing them.

You shut your mouth when math is talking! (Image courtesy of xkcd.com)

Ok, let’s get to the question and answer portion of the program. First let’s look at rebounding. Are there diminishing returns for rebounding? As I said, I have the data (yeah, every single player and season since 1979, I made excel tap out today), let’s take a look:

If I look at the best rebounder (by qty i.e the most rebounds for that team) rebound rate per 48 minutes for every team vs. the rebounding rate there appear to be a diminishing effect. A correlation of 3.4% does not much water. Did I just prove diminishing returns for rebounds? Not so fast there Kemosabe:

If I take out the best rebounder (by qty) for every team, I may not be getting the best rebounder by rate (per 48 minutes), however if I take out the best two rebounders on each team I get a surprising result. There is a positive relationship between the two best rebounders on each team. So having two good rebounders next to other increases returns (everyone else does see some mild dropoff at the extreme). Who knew?

Phil Jackson knew that's who!

Again fairly weaksauce correlation though.

The second question had to do with above average rebounders.  Can I find one with >15 treb per 48 whose teamates where above average. Here’s a list of all the  players whose rebound rate is >15 per 48 who led their team’s in total rebounds since 1979 :

So the average rebound rate per 48 for teams without their best rebounder is 7.65. By my count 40% of the people on this list qualify.This concludes the rebounding portion of our program.

Let’s talk Wins Produced. If I repeat the exercise I did for rebounding  of comparing the best player on each team (in terms of wins produced) vs the rest of the team:

No diminishing returns here. The data is a little stratified so let’s repeat the trick of looking at the second best separate from the rest but this time let’s add the third best:

Now Color coded for your protection

Hmm. Fascinating. Playing with good players makes you a better player. Instead of diminishing returns for WP48, we see increasing returns, and it holds for your top three. And it doesn’t affect the rest of the team that much. Again who knew?

Anything is Possible