The DIY Experience

Posted on 12/10/2010 by


I remember the first comment I ever wrote on the WOW blog. Sadly, I don’t remember which post it was written it but I remember the gist of it. In it I stated that the admirable part of Wins Produced was the fact that it was open source in nature. Prof. Berri made a clear and mathematically sound model and put it out for the world to criticize. This is in stark contrast to others who’ve done similar endeavours and I was impressed by that. See the trick to this is peer review.

I really found this funny today

So when I started this blog, to mess around and try to contribute something  I decided that whatever I did, I was always going to pay attention to the feedback I receive. Keep in mind that I have full time job and a life outside of this blog, so the input of intelligent readers is always appreciated (and everyone else thanks for the pageviews 🙂 ). I’ve always been fascinated by the on-going discussions in the comments tab, and I try to respond to comments as much as I can (and as a lot of you know my e-mail as well).

Now, the latest ongoing discussion is fascinating (and will probably generate at least one more post) but I’ve been without a web connection at home for two days and very busy with some upcoming projects so I’m going to be brief and I’m going to choose one particular comment to respond to that I think adresses most of my take on the ongoing discussion. Here it is, reader Benjamin writes:


I haven’t posted before but I’ve read a lot on this site leading up to this season. I want to start by saying thanks not just for putting all the time into thinking about quantifying basketball, which is obviously a labor of love, but also sharing those thoughts with the world which takes another big helping of hard work and can be frustrating at times.

Thank you for reading. I really though when I started that I was totally insane for doing this. So did my wife by the way.

I hope there is some value with my sharing my perspective on this debate.

Thank you for reading.

When I first Dr. Berri’s site his wins produced model appealed to me because it does seem that most quantitative analysis of basketball overemphasized scoring to this point. I found Dr. Berri’s writing compelling and intutive. So when Dr. Berri linked over to your site and I was able to follow day in and day out you work putting the model to work to make predictions and find intriguing nuggets of info, I was thrilled. I didn’t do a lot of thinking about the ins and outs of what you were saying because like Dr. Berri’s I found your writing authorative and intuitive.

I try to lay out a story as I write but I’m also very aware of the limitations of building a set of tools out in public. Much like in real life work situations, some things work out great the first time (say for example the rookie model or the point margin model) and some things are a lot harder and take time. The risk and reward of laying a beta out for the world to see is really stresful and rewarding at the same time. You get to see me get it really right and really wrong. Thankfully, I’m somewhat thick-skinned.

When the season started and you started comparing the predictive success of the various models a concern creeped into my thinking. This simmons guy (who I read and listen to frequently) was doing pretty good. This made me think, if simmons, a guy who spends a considerable amount of time analyzing other sports, could do this well, it’s not inconcievable that someone who is paid to analyze basketball could do as well or perhaps better than the model without all the quant stuff. Maybe it wasn’t time to fire all the GM’s yet ;)

Simmons (who I’ve been reading since the Boston Sports Guy days) has consistently outperformed pretty much every analyst when it comes to basketball. Someone did an actual study on this and his eye for picking win totals is great. He’s actually what inspired me to take a stab at building a predictive model to try to automate, and quantify what the experts who do this for a living do. I fully realize that this is not the work of a few months but probably years.

I began to think about what I did and didn’t like about the model.
* On the offensive side of the ball I love the emphasis on effectiveness over volume and in particular how much turnovers are penalized.
* I’m not entirely thrilled about the value of assists. I don’t like ‘em because their recording is subjective so if you play more games for a generous stat crew you do better. And as a basketball player I’ve been in plenty of situations where the shooter does more of the work (i.e. choosing when and where to cut, coming of a screen correctly, etc.) that leads to the basket then an assister and it seems odd to get the same credit for that play as one where there is some real passing magic.
* On the other side of the ball I had some reservations about defensive rebounds not taking into account the defensive effort that lead to the rebound

I’ve played with these myself over time. The point margin analysis really nailed it for me though. 1230 samples using the marginal WP to Point Margin linear conversion based on straight boxscore stats and the results showed 2 point differential 6 times (.48%) and a 1 point 392 times (31.8%).  I think we can definitely argue granularity of the stats. We can argue the effect of teammates and consistency. We can even argue the allocation of value but I’m fairly sure that the value assignment at the team level is spot on.
When guy and some dude started beating the rebounding drum I thought, great we’re going to get somewhere on one of the areas I though the model might need som refinement. Unfortunately it doesn’t seem that way. In all fairness I haven’t read either of Dr. Berri’s books. But it seems to me that rather than address Guy/Some dude’s specific criticisms you have both tended to point back to previous work or generate new data that looks synonmous with old data. In other words I don’t feel like you’ve said, what if we were wrong, how we would go about proving that based on these criticisms. Instead you’ve said look at how self consistent our data is, and, here is more data that demonstrates consistency. As an engineer I know how easy it is to generate self consistent data and still miss something huge so I haven’t found this especially convincing. On the other hand, as this has dragged on I think guy has taken the less useful role of being a critic rather than a member of the community. As a critic I don’t think someone has to offer an alternative, only demonstrate flaws. But as a member of a community, I think you have a responsibility to follow up flaws with at least attempts at solutions.

Ben, there’s been a tremendous amount of work on all of these subjects over time. The first e-mail exchange I ever had with Prof. Berri touches on the diminishing returns issue. Read this post for a really nice take on that. Prof. Berri’s doing a FAQ on a lot of these (I may have needled him a bit on this even though in fairness to him this has been covered before in his blog, published article and books) . I’ve done oodles of rebounding posts.  I’ve really tried to look at it from multiple angles. Let me try to state it one more way: when you have an above average rebounder the rest of the team without him on the court does not have to be at Position Average so expecting linear gain is a bit strange  and above average performance from the rest of team  is skewed. The rebounding edge is really going to be a function of the roster. For team A to benefit “fully” (Player + <= Team+) from a players rebounds (or any other statistic for that matter) all his teammates have to produce at a league average level. There are very few teams that if you take off their best player for any stat this’ll happen.

The best test is lineup with and without. Barring that what I did here which is look at the Reb48 of the 2nd best player and the rest of the team as a function of the reb48 of the best player (for every single team since 1979) should have shown something in terms of depressed stats. It didn’t. Rebounding seems to be fairly unelastic on average. In fact, what it really looks like is that you almost need two great rebounders to severely affect the rest of the teams Reb48.

The bigger point is that you’re kinda looking for a unicorn. To illustrate, I’ve worked out that CP3 is a +11.6 Point Margin player per 48 player. Does that mean that the Hornets will win 71 games? No because he plays on a team (which really isn’t very good without him).

That said, I think that I’ve done a decent job of looking for it. You have my promise that I’ll keep looking for it and other unicorns. Just not all the time because then it gets to be too much like work.

One final note,  thanks for reading and commenting. I’m really having fun and I’ll do my best to keep it fun for you too and I promise the next post will have some cool tables.

Posted in: Uncategorized