How To Destroy Perfectly Good Pitch f/x Data
Sometimes you have to step back and think about what the data you’re crunching really means. Shi Davidi’s latest piece using pitch f/x comes to the conclusion that pitchers are “getting the calls” because more balls in the zone are called strikes than the other way around. But let’s think about that for a millisecond, shall we?
There’s an area around on the fringe of the strike zone where blown calls get made. Every strike is by definition no more than 10 inches away from that danger zone (because it’s somewhere in the zone) while balls can be anywhere — bounced, two feet outside. If there a super simple call to be made, it was a ball.
So by using the overall number of balls and strikes (instead of, say, only pitches within 3 inches of the border) that includes all pitches close and not to find a percentage of incorrect calls, all this “study” has really proven is that strikes are more likely to be blown calls, which is something logically obvious and not stunning at all. If you actually do the hard work and concentrate only on fringe pitches (or something — the classic John Walsh article at the Hardball Times graphed the overall % of pitches called strikes and vice versa by inches from the plate to find where they crossed on both planes), you find the real zone is slightly larger than the rulebook zone, and hitters generally get the shaft a little. Just like everybody thought. Shi, you’ve got a bunch of major-leaguers to go back to and unconfuse (while pleading they give the new technology another chance, please).
The one interesting thing to be gleaned from this outrageous mess is a corroboration that the corners are generally clipped — mid/away is where all the “called strikes out of the strike zone” (I would call them “extra strikes”) are, and down and away are where most of the extra balls are. But then Shi ignores that fact when coming to the rest of his conclusions:
Mauer, in fact, gets the fifth fewest ball calls on pitches in the zone at 11.96 per cent. Oddly, Blue Jays slugger Adam Lind, widely credited for his discipline this season, gets the least calls in this category at 10.24 per cent.
What about their drastically different swing rates? What this actually reflects is how many close strikes on the corners a hitter takes (since that’s almost exclusively where ‘extra balls’ come from, as shown by Shi’s own charts). Lind may have been credited with taking more walks this year, but his discipline hasn’t improved much — so he fouls off or puts in play balls placed right on the corner where umps are more likely to not call them, and doesn’t get as many of them to begin with as a player of Mauer’s calibre. Trying to guess how he is being treated by the umps without considering the flip side (how many strikes out of the zone are called on him — could be just as low), or how often he swings the bat, is pointless.
Janssen’s percentage of pitches in the strike zone called for balls is 10.9 per cent, nearly half the big-league average, while his ratio for strikes outside the zone is 15.4 percent, more than double the average.
In much the same way, you have to consider at where these pitches are, instead of assuming that you can lump them together and the distribution is average. That’s not a sign that the umpires are giving him calls, it’s a sign of where Janssen is throwing the ball — he mixes his straight fastball and cutter at the belt and just on or off the plate instead of purely hunting for the corners with sliders, etc. It’s a great approach because it takes advantage of where umpires give the calls — but we’ve got a chicken and egg thing going here. Casey gets better calls because he knows (or his natural approach has blundered across) where to go for them — they aren’t given to him because the umpires respect him.
It goes on and on…one blatant misinterpretation to fit the pre-existing narrative after the other. I know I sound like a statistical crank here, but sometimes you gotta question if the first thing that pops into your mind that makes for a great story is right, or if something else could be causing the phenomenon. I cannot tell you how hair it makes me lose to listen to Zaun completely dismiss the system and it’s “laser-guided gizmos” because the dome vibrates when it gets loud, and then watch Shi absolutely brutalize the data like this. Way to make science look stupid, guys. It’s not.
(Incidentally, the one issue that almost gets a complete pass here is that the strike zone is of course three-dimensional, while the graphic you see on TV is just the front edge. Apparently before giving the results back to the umpires, MLB cleans up the technically-should-be-strikes that just clip the edges/corners of the zone because everyone knows they never really called. On the flip side, there are clearly some pitches that cross the front of the plate just off that curve or tail to clip the back and should rightly be called strikes. That’s what Zaun should be grumbling about.)