I’m a daily reader of Rotographs, a fangraph joint, and the hooplah’s currently centered around a rankings piece (one, two, and three) and the methodology behind Z-Scores.
The working premise is that the players within the fantasy pool are normally distributed, but each position has a different mean and a different standard deviation. This is essentially quantifying the idea of “Player Scarcity” that we’ve dealt with repeatedly. If there are a lot of players around the average, the standard deviation will be lower, the curve lower, and their value relative to their competition lower. Alternatively, if the players are widely spread out, the standard deviation will be higher, the curve of the graph higher, and their value relative to the competition higher.
To compare position players to each other, we use something called Z-Scores. Z-Scores represent the number of standard deviations away from the mean or average. In a normally distributed sample, there’s a little rule called the 68/95/99.7.
68% of the Players should fall within 1 standard deviation away from the mean, 95% of the players should fall within 2 standard deviations and 99.7% of players will fall within 3 standard deviations of the mean. Thus, almost every player will be within 3 standard deviations of the mean.
Using something like this, we can tell that Albert Pujols is more valuable relative to first basemen than Joe Mauer is relative to catchers.
Over at fangraphs, there are a couple of problems with simply adding Z-Scores to determine a player’s overall value relative to the player pool. The first problem: a lot of our samples just aren’t going to be normally distributed. The entire player pool is normally distributed, but the twelve catchers? Maybe if we’re lucky.
I’ll look at shortstops and fangraphs said that 16 of them were drafted:
| R | H | HR | RBI | SB | AVG | |
| Hanley Ramirez | 97 | 175 | 24 | 95 | 32 | 0.312 |
| Troy Tulowitzki | 95 | 165 | 30 | 110 | 12 | 0.295 |
| Derek Jeter | 104 | 184 | 12 | 61 | 19 | 0.301 |
| Alexei Ramirez | 84 | 169 | 18 | 71 | 14 | 0.29 |
| Jose Reyes | 85 | 161 | 11 | 54 | 36 | 0.279 |
| Elvis Andrus | 94 | 170 | 2 | 48 | 37 | 0.282 |
| Jimmy Rollins | 85 | 148 | 15 | 54 | 28 | 0.271 |
| Starlin Castro | 90 | 178 | 4 | 55 | 13 | 0.294 |
| Stephen Drew | 86 | 154 | 15 | 71 | 9 | 0.27 |
| Rafael Furcal | 83 | 146 | 10 | 49 | 22 | 0.286 |
| Yunel Escobar | 83 | 150 | 8 | 53 | 5 | 0.284 |
| Orlando Cabrera | 72 | 152 | 6 | 56 | 13 | 0.284 |
| Jason Bartlett | 78 | 138 | 7 | 43 | 19 | 0.277 |
| Alcides Escobar | 80 | 157 | 4 | 47 | 11 | 0.282 |
| Alex Gonzalez | 66 | 143 | 20 | 80 | 2 | 0.252 |
| Ian Desmond | 70 | 144 | 11 | 59 | 14 | 0.265 |
Here we have the frequency distribution for RUNS scored by the sixteen chosen shortstops.
The large amount of average players, smaller amounts of “good” and “bad” players and finally even smaller amounts of “great” and “awful” players.
The Z-Scores should be fine, wont be skewed and will give us an idea of how each player does relative to his peers.
When it comes to RUNS, we have a Standard Deviation of 10.2 and a mean of about 84.5R. There’s a minimum of 66 runs and a maximum of 104 Runs.
However, when we look at RBIs for shortstops, it’s a completely different ballgame.
The data isn’t anywhere near normally distributed and while we can still calculate the Z-Scores for our fantasy shortstops, they’ll be skewed. There’s a mean of 62.875 and a standard deviation of 18.478.
Combining this data with all of the other data will lead to problems; especially when we start to compare across positions.
After we get past the fact that much of the data in the smaller sets isn’t going to be normally distributed, we’re left with the fact that we’re only comparing a player’s utility against his peers.
Albert Pujols’ SBs are worthy of a tremendous Z-Score, but that really doesn’t mean that much when you compare his 10SB to an outfielder’s stolen base totals.
In the end, this is about counting statistics and points. Sometimes it just doesn’t matter that a first basemen steals 3 times more than the average first basemen.
What to Do?
When you deal with distributions, you need the standard deviation and the mean. One doesn’t mean much without the other. The same holds true for Z-Scores: Pujols can be 2.5 standard deviations away from the mean, but what if that mean is 4 stolen bases? Is that still worthy of 2.5 points in our ranking system? No.
The whole concept of Z-Scores is to properly define a players value against a normal distribution. We’re trying to plan for the future and know what kind of other options are available if we select a certain player. In cases where the distribution isn’t normal or we’re not considering the mean, it makes just as much sense to sit down and rank each player based on his position in that category.
In a future article, I’ll throw some ideas at you involving adding context to our z-scores (or tossing out z-scores all together)



