*I’m going to try science.*

*Words to live by. But trying science isn’t for the faint of heart and the thick skinned.Science is cool and dangerous. *

*Science is Epic but epics traditionally end badly for the participants. Truly trying science means asking the tough question and being willing to put yourself out there. It means not only being willing to risk being wrong but being assured that you will be wrong often. Because Science is as much about disproving something as it is about proving something.*

*Sometimes its about doing cool stuff too :-).*

*Why the long intro? What follows is a guest post from one of my favorite bloggers (Alex Konkel, Sport Skeptic). I both love and hate his articles. I love them for the enjoyment I get as a reader and hate them because of the professional jealousy they inspire. The best and most concise introduction I can give Alex is this: he’s a scientist and I hope he keeps blowing s$%# up for a good long time.*

*For science of course.*

Hey everyone – Arturo asked if I would do a guest post talking about some of the stuff I’ve been doing over at my site, Sport Skeptic. Some of it has to do with fooling around with numbers and seeing what happens; sometimes it’s with actual data and sometimes it’s with made up data that is sort of like real data. Here’s a quick tour through some of the posts along with ideas that have come from the comments. Even when Arturo isn’t writing, we’re all about fanservice.

The second piece I wrote that got any attention was one on how a player metric, or rating system, would only be as reliable as the statistics that went into it. I made up player data for two seasons; the variables correlated across seasons as much as some of the actual NBA stats do. For example, field goal percentage is somewhat noisy or hard to predict from one year to the next while rebounding is more consistent. Then I made up a few metrics that put different weights on these variables. What I find is that the metrics that give relatively high weight to consistent variables are themselves consistent; in my example, the metric that gave a lot of value to shooting was noisy from year to year while the metric that gave more value to rebounding was more consistent. The upside of a consistent metric is that it allows you to have a better idea of how your players will perform next year.

A few good points came out of the comments on this one. For example, consistency alone doesn’t have to be a good thing; you could credit players with wins due to their height and it would be very consistent. That would lead to the next point, which is that consistency doesn’t matter if your model isn’t any good. I would make two points here; first, you obviously have to have a decent model. If you gave credit to players according to their height, you probably wouldn’t do a very good job predicting future wins. That might be a good way to check the quality of your model. Second, consistency is indeed nice – as long as everything else is equal. If your model is closer to the absolute truth and less consistent, it should be preferred over a less accurate but more consistent model. The questions are, how do you know which model is more accurate? and how consistent should we expect players to be? Those questions don’t have easy answers.

A little later I looked at what happens when you leave variables out of your model. This is an especially big problem when variables are correlated with each other, which is often true in sports. One of the strengths of regression is that if you have all the relevant information, you can figure out what weights all of your variables should get. But if you don’t, your weights can dance around like crazy; perhaps just as bad, the errors on those weights will definitely dance around and give you a mistaken impression of how important they are. This issue is a big reason why looking at simple correlations is typically a bad idea. A simple correlation is just a regression with one predictor. If you leave out all those other predictors the correlation can not only be inaccurate but simply wrong. Deciding what variables to use in a model falls under the umbrella of model selection, and there are rarely ‘right’ answers. Should you use true shooting percentage or effective field goal percentage and free throw percentage? If all you’re interested in is the effect of rebounding, should you include the kitchen sink of available variables? Two reasonable people could come to different conclusions, and a single person might use different models depending on what exactly their goal is. But in the complicated world of sports statistics, you probably wouldn’t be too wrong to always include as much information as possible.

I followed that up with a more thorough description of a model in the previous post. Here’s another comparison so you have something new. I used the player data I have, converted to per 48 minutes, and predicted WP48 from position (which essentially means we’re predicting adjusted P48), true shooting percentage, fouls, turnovers, blocks, steals, assists, defensive rebounds, and offensive rebounds, each scaled to normalized scores (not position, obviously, or WP48). I find that rebounds are the top predictors of WP48, followed by TS%, assists, fouls and turnovers (negative), steals, and blocks. It explains 97% of the variance (R squared is .97), meaning that these variables tell us virtually everything about why players vary in their WP48 values. I ran a similar model at the team level (excluding position since teams don’t have positions) predicting win percentage. TS% and turnovers are number one, followed by defensive rebounds, steals, blocks, offensive rebounds, and fouls and assists are actually not significant (I guess they don’t help teams win?). This model only explains about 67% of why teams win. What does this mean? I have no idea. Players and teams have different standard deviations for the same variable, so changing one standard deviation at the player level is not the same as changing one standard deviation at the team level. Also, the correlations between variables change; for example, teams with better TS% tend to have more assists, but TS% and assists are completely unrelated at the player level (whether or not you account for position). So you would probably expect the models to be different. This is relevant to a challenge I took from commenter Guy (who, I will repeat, won the challenge).

Finally, I had a couple run–ins with Phil Birnbaum. Most of this had to do with R squared and model interpretation. We decided that more is better; if possible, have more data. Even if you’re compiling across some variable, you should have as much of that variable as possible. For example, if you look at something across teams (such as win percentage, or salary, or something else), you’ll always have 30 points, one for each team. But you can make sure you have as many observations (such as games played) as possible per team before you run your analysis. This will make sure your estimates are as accurate as possible. But, looking at the R squared is key. It tells you how much of the variable of interest your model explains (just the way I described earlier). Sometimes, especially if you have a lot of data, you can have a significant variable that just doesn’t tell you that much about what you’re interested in. Salary and team wins is one example; salary is indeed significantly correlated with wins, but the R squared is only about .25 (depending on what season you look at). That lets you know that most (75%) of why teams differ in wins is *not* explained by salary. And everyone agrees that salary only explains wins because it stands in a bit for talent; GMs have some ability to pay better players more money, but it isn’t anywhere near perfect. Some of this has to do with the rules (like the rookie salary scale) and some of it has to do with player evaluation (thank you, Joe Dumars).

To pick another fight that will get me in lots of trouble, you could also look at an old post by Eli Witus looking at usage and efficiency. His regression coefficient tells him that a high-usage line-up gains in offensive efficiency beyond expected (that is, when you put five guys on the court who use a lot of possessions, they seem to have a better offensive rating than you would expect from that line-up). And the converse is true, showing that if you put low usage players together, their efficiency drops. The conclusion is taken to be that increasing your usage lowers your efficiency. However, the R squared for the models never gets above .04; he never explains more than 4% of efficiency differences across line-ups. That means you could predict a line-up’s efficiency nearly as well by saying they’re all average. A better conclusion would be that if someone put a gun to your head and made you guess you would say that there is a connection between usage and efficiency, but the evidence is pretty weak.

I always try to end my posts with a summary. I guess for this one I would say 1) statistics, properly used, are your friend; 2) even if you don’t have real-life data you can learn a lot about sports stats by making your own; and 3) always take your vitamins.

*Uncategorized*

John

01/28/2011

Very interesting. Did you get the percentages for r squared correct? For example when you mention a correlation of .25 you say that’s 25%. Aren’t you suppossed to do this: .25 x .25 x 100=6.25%?

Seriously though, I loved the post Alex.

Alex

01/28/2011

I think everything I have in there is reported as an R squared, so the percentages should be fine. I definitely didn’t mix any within an example (like correlation at team level and R squared at player level), so if I did miss something it won’t change the conclusion.

entityabyss

01/28/2011

Yea, alex is the man. I’d be jealous if I were arturo. Lol. I haven’t checked his site up until recently and he has some great stuff.

Like, for the people who talk about large diminishing returns with line-ups with regards to rebounding percentage (particularly defensive), it’s almost ignored that there’s even larger diminishing returns with assist percentage in line-ups. However, both stats remain very consistent.

His site is just great. A lot of great information.

Guy

01/28/2011

Alex says this about Eli Witus” study: “However, the R squared for the models never gets above .04; he never explains more than 4% of efficiency differences across line-ups. That means you could predict a line-up’s efficiency nearly as well by saying they’re all average.”

Unfortunately, this is just the latest example of Alex having no idea what he’s talking about. Witus’ model is not predicting each lineup’s efficiency, he is predicting the DIFFERENCE between its actual efficiency and the effiency one would predict simply by combining the individual players’ efficiency ratings (weighted for each player’s usage). So the claim that you could predict lineup efficiency as well by predicting average performance is irrelevant nonsense, because Witus isn’t trying to predict lineup efficiency. Witus could have greatly increased his R^2 by changing the model to predict efficiency, and using each player’s rating as a predictor variable. Then he would have a much larger R^2, but usage would still have the exact same predictive power.

That just shows that the R^2 is irrelevant to the question at hand, which is trying to understand how usage impacts efficiency. There are many reasons a lineup’s efficiency might differ from what player’s individual rates predict, starting with the fact that the large majority of lineups have a very small sample size. Plus, diferent lineups face different quality defenses, have different proportions of garbage time, etc. So most of the variance is just noise, and we don’t care if usage explains 1%, 4%, 30%, or 70% of this variance. We just want to know if usage impacts efficiency, and if so, what the coefficient is. Witus’ study is an ingenious way to do just that.

Alex has learned some simple statistical rules in graduate school, which he applies in his blog, like:

*High correlations in stats indicate reliability;

*High R^2 is good, low R^2 is bad;

*You can’t use residuals in your predictive model.

As rules of thumb, these are all useful. Unfortunately, he tries to apply them in evaluating very sophisticated analyses by people who completely understand these principles (like Rosenbaum, or Witus, or Birnbaum) , but are doing more advanced work. These grad school rules of thumb are not helpful at this level, and often lead Alex astray. It’s like trying to critique quantum physics based on the physics you learned in 9th grade. The results are not pretty….

Alex

01/28/2011

My apologies; it is compared to expected, not absolute efficiency. I should change the sentence to ‘you would do nearly as well guessing everyone’s expected efficiency’. In either case, the very low fit says that you aren’t learning very much by using relative usage as a predictor.

Eli W

01/28/2011

As Guy mentioned, the low R^2 is mainly due to the very small sample sizes. Many of the lineups are together for only a few possessions, and the randomness of a few makes or misses can be overwhelming. But the R^2 isn’t really important to what I was trying to look at. The key figures are the coefficient estimate on the independent variable (i.e. the slope) along with its standard error. That’s what indicates that low-usage lineups had a lower efficiency than would be expected, and high-usage lineups were more efficient than expected. If you are worried about the significance of that finding, instead of looking at the R^2, look at the SE of the coefficient. In all of the different models I tried, the coefficient value was more than two SE’s away from zero, which means the results were statistically significant at the .05 level.

entityabyss

01/28/2011

Correct me if I’m wrong with what I’m about to say.

I don’t see how what you says proves him wrong. With small sample sizes, you say you have statistically significant evidence showing that raising your usage lowers your efficiency. Doesn’t the R^2 of those small samples suggest that the efficiencies compared to what was expected is not explained by the usage? Even in the small sample, isn’t there still a lot of noise to conclude that usage and efficiency are connected?

Maybe I missed something.

Eli W

01/28/2011

The low R^2 is mainly a function of working on a lineup level. Any regression run using such low possession observations (such as an adjusted plus/minus regression) is going to have a low R^2 due to randomness. No matter what you do you’re not going to get an R^2 of .70 in this context (it is important to remember that R^2 is relative, and what’s considered “good” or “useful” in one context may be “bad” or “useless” in another). But certainly the R^2 could be increased somewhat by including other relevant variables, e.g. quality of the defense faced, offensive fit of the players in the lineup (if you could somehow quantify that), etc. Those things are definitely worth looking into, and adding in such variables could change the sign of the coefficient value I found, or make it no longer significantly different from zero. However, it is still noteworthy that on the regressions I ran, the coefficient was positive and significant.

To step back from the advanced statistical methods, you can just think of it is a simple grouping of lineups into low-usage and high-usage ones. The low-usage lineups were found to be less efficient than one would expect based on the efficiency of the players, while the high-usage lineups were found to be more efficient than one would expect. This is a simple correlation suggesting that players’ efficiency decreases as their usage increases. But it could be that there is some omitted variable that is the real cause of what we see (for instance, maybe the low-usage lineups tend to face tougher defenses than the high-usage ones, and that is why they underperform).

Alex

01/28/2011

Eli – you report three different analyses in that post, each grabbing smaller sets of line-ups based on how many possessions they played together. The R squared does head in the right direction (it goes from .004 for everyone to .04 when only looking at groups with at least 100 possessions together), but it isn’t like you’re heading to a really good predictive value. Yes, if we had to guess, we would lower expected efficiency for low-usage groups. But our guesses would be pretty inaccurate, and we could do nearly as well guessing their efficiency without knowing their usage.

Guy

01/28/2011

Alex/Abyss: Forget about the R^2. It tells you nothing useful here. What you care about is the coefficient. Is it statistically significant? Yes. Does it describe a large impact of usage on efficiency? Well, that depends on what you think is “large.” But that’s what you should focus on. It’s the size of the impact on efficiency we care about. You could have a small effect that explains a lot of the variance (if every lineup got to play 10,000 minutes), or a large effect that explains only a little of the variance. R^2 is irrelevant in this case. And it’s rather shocking that Alex doesn’t know that.

Alex

01/28/2011

Could you give me a non-sports example of an important variable that doesn’t explain a lot of variance? Maybe we’re just quibbling about semantics; to me, ‘important’ means that it increases predictive power some reasonable amount. As Eli says, what that amount is may differ from situation to situation. But I can’t accept 4% as reasonable.

Westy

01/28/2011

Good post. You note, “I ran a similar model at the team level (excluding position since teams don’t have positions) predicting win percentage. TS% and turnovers are number one, followed by defensive rebounds, steals, blocks, offensive rebounds, and fouls and assists.”

If this is the case, why shouldn’t any individual player valuation model also similarly consider these statistics hierarchally ranked the same? Said another way, why should a player valuation model assign greater importance to a statistic than its value at the team level?

Alex

01/28/2011

The quick answer is that optimally we’d want to include defense, which I didn’t do for that model; I wanted to have the same variables at the team and player levels. That might change the order of the statistics. But say we did include that and got player-level measures of defense so we could have fully comparable models; I’m still not entirely happy with a completely empirical measure of value. The model says that at the team level fouls and assists don’t impact winning. Could that possibly be true? The relative values of the statistics also change depending on what you include; if you used effective FG% or true shooting % instead of individual shooting components (threes made, threes missed, etc) you’ll have a different story. A more theory-based (or perhaps a more involved data-driven) model should be able to suss out what statistic has what value. More likely, a combination of theory and data would be best.

Alex

01/28/2011

To complete that thought: what it means is that the first step is to create a team-level model that everyone agrees is “right”. If that were to happen, then we could ask how to transfer that down to the player level. But there are more problems there; if we used normalized variables at the team level, they don’t mean the same thing to players who have different means and variances. If we simply give credit for the statistics a player collects (as in Wins Produced), everyone raises a fuss. Everyone would need to agree on how to transform the team-level effects to the player level. It’s possible that some disproportionate transform is appropriate, I don’t know.

Guy

01/28/2011

Great question, Westy. Tha answer is the hierarchy should generally be the same. Even if we lack data on player defense, for example, the relative ranking of those things we do measure — like rebounds and TS% — should be the same. The fact that they are not reveals a problem with WP.

Nerdnumbers

01/28/2011

Guy and Westy,

You got me. What the heck are you talking about? The Weights used for individual factors are the same used for the team factors. Think fast, can either of you tell me the Wins Produced weights for their corresponding box score stats?

Guy

01/30/2011

“What the heck are you talking about?”

This explains so much…..