By: Edward Egros

Predicting Pitching Performance

Image-1Noah Syndergaard made his Major League debut last year for the New York Mets and made an immediate impact (3.24 ERA and 9.96 K/9). While his 9-7 record may not have been overly impressive, there were signs this was only the beginning. Now, Syndergaard has multiple National League player of the week awards and is one of the more reliable hurlers in the game.

But not every pitcher lives up to predictions. How can someone better determine which pitchers will become successful the following season? One of the more intriguing presentations concerning the future of baseball predictions involved creating a pitcher projection system based upon Pitch F/X (to read the paper and/or watch the presentation, click
here). The traditional ways to gauge a successful pitcher do not always perform well when forecasting how he'll do the following year. According to this research, if next season's Earned Run Average (or Runs Averaged/9 innings) is regressed onto one of these traditional metrics, here are the following R^2:

Metric R^2
K% 0.67
SIERA 0.52
xFIP 0.46
BB% 0.45
FIP 0.35
HR% 0.18
ERA 0.14
BABIP 0.04

Strikeout percentage is the most successful traditional metric when determining future success. Here are the top ten pitchers in K% in 2015:

  1. Clayton Kershaw (33.82%)
  2. Chris Sale (32.08%)
  3. Max Scherzer (30.7%)
  4. Carlos Carrasco (29.59%)
  5. Chris Archer (29.03%)
  6. Corey Kluber (27.65%)
  7. Jacob deGrom (27.03%)
  8. Jake Arrieta (27.13%)
  9. Madison Bumgarner (26.93%)
  10. Francisco Liriano (26.52%)

MLB is through 1/4 of the 2016 season. As it stands, here are the top ten pitchers in K% this year:

  1. Jose Fernandez (35.9%)
  2. Clayton Kershaw (33.7%)
  3. Noah Syndergaard (32.6%)
  4. Max Scherzer (31.5%)
  5. Stephen Strasburg (30.9%)
  6. Danny Salazar (30.3%)
  7. David Price (29.4%)
  8. Vincent Velasquez (28.8%)
  9. Drew Smyly (28.4%)
  10. Drew Pomeranz (28.3%)

While many on the 2015 list currently rank just outside of the top ten this year, it shows two things: the difficulty of predicting pitcher success given any traditional metric and it shows just how consistently dominant Clayton Kershaw and Max Scherzer really are.

This paper discussed combining the aforementioned statistics with Arsenal/Zone rating. This metric uses PitchF/X data which tracks the speed, movement and placement of every pitch relative to the strike zone. The idea is, with more data about the specifics of each pitch a pitcher throws, the pitch sequence and which pitches are most sustainable over time, it will be easier to predict success the following season.

Data scientists should always be careful about having too much data because of overfitting. In other words, too much data and too many variables mean watering down the prediction to where it is hard to find actual trends that are meaningful. Still, this is an intriguing paper and hopefully this Arsenal/Zone rating can be more readily available to baseball fans but in an easily digestible way.