Fielding Statistics are Pretty Useless
Fielding statistics are the cutting edge in baseball statistics right now, full of complicated math and constant developments. There’s been a lot of progress since fielding percentage was the best thing out there. First came Range Factor, a simplistic but surprisingly effective way of looking at a player’s range based on how many times they touch the ball. But the big revolution was when STATS started manually (3 staff members record every game independently) tracking every batted ball and recording where it went based on a number of zones.
There have been a ton of defensive metrics developed since then (PMR, UZR and +/- are the most popular), but the data they crunch is all gathered in the same way. There are now two companies, STATS and Baseball Info Solutions (BIS), that collect play-by-play data and sell it for thousands of dollars to teams and organizations. Here’s an article talking about the difference between the two and how much they correlate. It mentions that:
During this year’s MIT Sloan Sports Business Conference, Rob Neyer told attendees that the evaluation of major league player hitting, pitching and fielding performance has been adequately addressed, and Bill James agreed with him.
Ok, so we’re done, right? Problem solved? Hardly! While I agree that the systems are ingenious, you don’t have to poke around much to find some huge inconsistencies between the two sources of data they’re analyzing, and it’s not the sort of thing that you can get around by comparing or weighting both sources or multiple systems. I was hoping that I’d totally overlooked something, so I sent in the following question to the Hardball times (ignore my typo, I meant RZR, and they returned the favour by calling me Jonathan G.)
I have a question about UZR. A lot of sites have Troy Glaus’ zone rating at .737, which is the worst in the American League. That makes sense seeing that he has been hobbled by plantar fasciitis this season. However, Hardball Times has his UZR at .706, which is among the best in the AL. He also has more balls fielded out-of-zone than most players, which makes his range look like the best in the AL other than Brandon Inge. I thought that UZR was just ZR separated into two different components. How could it give such a different impression of a players’ range?
– Jonathan G.
I was rather disappointed that the answer wasn’t just that I was being an idiot. As the Hardball Times said in their reply, the difference in Troy Glaus’ zone ratings is due to STATS and BIS (ESPN uses STATS, the Hardball Times uses BIS) recording very different totals for both the number of balls hit into Troy’s zone and how many he fielded; enough to swing his ranking between the second-worst and the second-best third baseman in the league (now, as opposed to when I asked the question originally).
First, the two companies have significantly different definitions of the size of a player’s fielding zone. STATS gives him a total of 281 chances, while BIS shows 204 balls hit into his zone and 48 plays made outside of it, for a total of 252. That’s a difference of 10%, but the zone doesn’t have to be that much larger; it makes more sense that it’s only a little bigger but a lot of balls were hit just outside BIS’s zone, because under their system Glaus leads the league in balls fielded outside of zone and wasn’t exactly known for his diving plays or lightning-quick first step this year.
As long as everyone used the same zones, using larger ones wouldn’t make a difference for figuring out a player’s relative ability. However, the two systems also differ by 15 on how many plays Glaus made, with STATS crediting him with 207 plays (281 chances, .737 ZR) and BIS 192 (144 in zone, 48 outside of it). So who’s right? In the article by Sean Smith mentioned in their reply, he points out that most putouts (which are mostly fly balls, line drives, etc.) don’t count as “plays” for the purposes of Zone Rating (unless a player fields a grounder and steps on the bag). Troy Glaus had 197 assists this season, so according to STATS he made an additional 10 plays by way of the putout. However, according to BIS, he completed 5 fewer plays than assists. In that case there had to be some unusual assists, such as a deflection to John McDonald that would give Glaus an assist but not credit for a play.
Either there’s a lot of human error, or there’s a really different definition of what counts as making a play. The BIS number is closer to the number of assists, but having watched Troy limp around out there and gaze wistfully at balls he would have dove for last year, I find the excellent ranking given by their system a little suspect, especially the huge number of balls ‘out of zone’ (48) he got to. But who knows? That’s one of the problems with secret, proprietary statistics. Unless someone has 10-20 grand lying around to delve into the raw data, there’s no way to know or to break it down and see what’s causing the difference or whose system could be leading to inaccuracies. And if the only two sources for play-by-play data available can report a player on absolutely opposite sides of the fielding spectrum, how can you take the results of all the fancy analysis based on them seriously? GIGO.