sabermetrics

# The Statcast Revolution

10/06/17 00:07 Filed in: Sports

There are more statistics about hitters than ever before. Thanks to Statcast, a baseball fan can learn how fast a ball comes off a bat from any hit, the angle the ball leaves the bat, an accurate distance the ball travels, etc.

These statistics can help characterize and differentiate hitters. A potential extension to these statistics is if they can predict a hitter's success. For instance, if a hitter averages a higher exit velocity, does that mean he is generally a better hitter?

Fangraphs has kept a database with averages of these Statcast statistics for every hitter. Even though there is some missing data, Jeff Zimmerman made necessary corrections based upon the type of balls in play fielded by certain positions. Using 2016 season data, the variables include:

- Hit speed
- Hit angle
- Hit distance
- A ratio of home runs to fly balls
- An expected value of the same ratio
- Isolated power

It makes intuitive sense for the second half of this list to be relevant to a hitter's success, but what about the first half? To answer that question, I merged this same dataset with other advanced offensive statistics for these same hitters (this data came from Baseball Reference). While it would make sense to choose offensive wins above replacement (oWAR) as my dependent variable, there is a problem. WAR is an aggregate, meaning it can add up with additional plate appearances. Because I am already using averaged statistics for hitters and want to look at the average impact each statistic has to a hitter's overall performance, I divided oWAR by plate appearances and then multiplied by 1,000, so as not to have too many zeroes after the decimal point (this variable is named oWARavg).

The next step is to determine which of the first group of variables is significant at the 95% level. I am using a backward elimination technique, where I start with a regression with all three variables, then remove any of them that are not significant. By executing this approach, the only variable eliminated was speed. In other words, the average exit velocity of a batted ball is not a significant indicator for how successful a hitter is. However, the angle of the batted ball and the distance it travels are significant:

The angle has a negative coefficient, meaning batted balls not hit as steeply tend to be hits. Distance has a positive coefficient, which makes intuitive sense, because the farther a ball travels, the likelier it becomes a hit or maybe even a home run. As accurate as these findings are, the adjusted R-squared is only .1687, meaning only approximately 17% of these two variables can explain the variability of average offensive WAR.

Just for fun, let's see what impact angle and distance have when the second group of variables are included in a regression. Again, using the backward elimination technique, here are the results:

Once again, backward elimination took out exit velocity. It also took out the expected ratio of home runs to fly balls. While it kept the original ratio, the negative coefficient does not make intuitive sense. The logic is the more home runs hit out of fly balls, the more successful a hitter is. Instead, this model suggests the alternative. However, a positive isolated power does make logical sense and the adjusted R-squared is approximately 40%, making for a model that does a better job explaining what makes for a successful hitter.

Obviously there are a lot more advanced offensive variables that could be included in a model like this. At least there is a statistical approach for determining which variables Statcast emphasizes that explain offensive success. A similar study can be conducted when looking at baserunning, pitching, defense, etc.