By: Edward Egros

May 2016

Predicting Pitching Performance

Image-1Noah Syndergaard made his Major League debut last year for the New York Mets and made an immediate impact (3.24 ERA and 9.96 K/9). While his 9-7 record may not have been overly impressive, there were signs this was only the beginning. Now, Syndergaard has multiple National League player of the week awards and is one of the more reliable hurlers in the game.

But not every pitcher lives up to predictions. How can someone better determine which pitchers will become successful the following season? One of the more intriguing presentations concerning the future of baseball predictions involved creating a pitcher projection system based upon Pitch F/X (to read the paper and/or watch the presentation, click
here). The traditional ways to gauge a successful pitcher do not always perform well when forecasting how he'll do the following year. According to this research, if next season's Earned Run Average (or Runs Averaged/9 innings) is regressed onto one of these traditional metrics, here are the following R^2:

Metric R^2
K% 0.67
SIERA 0.52
xFIP 0.46
BB% 0.45
FIP 0.35
HR% 0.18
ERA 0.14
BABIP 0.04

Strikeout percentage is the most successful traditional metric when determining future success. Here are the top ten pitchers in K% in 2015:

  • Clayton Kershaw (33.82%)
  • Chris Sale (32.08%)
  • Max Scherzer (30.7%)
  • Carlos Carrasco (29.59%)
  • Chris Archer (29.03%)
  • Corey Kluber (27.65%)
  • Jacob deGrom (27.03%)
  • Jake Arrieta (27.13%)
  • Madison Bumgarner (26.93%)
  • Francisco Liriano (26.52%)

MLB is through 1/4 of the 2016 season. As it stands, here are the top ten pitchers in K% this year:

  • Jose Fernandez (35.9%)
  • Clayton Kershaw (33.7%)
  • Noah Syndergaard (32.6%)
  • Max Scherzer (31.5%)
  • Stephen Strasburg (30.9%)
  • Danny Salazar (30.3%)
  • David Price (29.4%)
  • Vincent Velasquez (28.8%)
  • Drew Smyly (28.4%)
  • Drew Pomeranz (28.3%)

While many on the 2015 list currently rank just outside of the top ten this year, it shows two things: the difficulty of predicting pitcher success given any traditional metric and it shows just how consistently dominant Clayton Kershaw and Max Scherzer really are.

This paper discussed combining the aforementioned statistics with Arsenal/Zone rating. This metric uses PitchF/X data which tracks the speed, movement and placement of every pitch relative to the strike zone. The idea is, with more data about the specifics of each pitch a pitcher throws, the pitch sequence and which pitches are most sustainable over time, it will be easier to predict success the following season.

Data scientists should always be careful about having too much data because of overfitting. In other words, too much data and too many variables mean watering down the prediction to where it is hard to find actual trends that are meaningful. Still, this is an intriguing paper and hopefully this Arsenal/Zone rating can be more readily available to baseball fans but in an easily digestible way.

You've Drafted Your Team, Now What?

image1For three days, there is a frenetic pace to the NFL Draft, where specific needs are addressed (or not), value is appropriated for each pick (or not), opponents' draft boards are analyzed and combated (or not) and undrafted free agents are debated and signed. After all of these minuscule details, there comes a bigger picture question: What the heck just happened?

Pundits and perhaps those within each franchise grade these draft classes before anyone attends a rookie minicamp or any single contract gets signed. Though there's always the inexplicable human elements, grading draft classes is becoming easier and more analytically sound.

Last year, the Harvard Sports Analysis Collective posted a blog about
predicting success based upon combine numbers. Some positions are much easier to predict than others. For instance, cornerbacks and outside linebackers with good combine numbers tend to do well in the NFL, whereas quarterbacks and wide receivers are much harder to predict.

Value also matters. Franchises will be crippled if an average player is chosen in the first or second round, while others can be bolstered by picking stellar athletes in later in rounds. Many of these problematic picks happen when a team drafts an offensive skill player.
One research paper suggested 60% of running backs and receivers taken in the 3rd-7th rounds have better average career statistics than those taken in the first and second rounds. The trend with receivers makes sense, given combine numbers not having as much predictive power, but perhaps the trend with running backs is a sign of the times (i.e. the NFL evolving to a passing league). As for quarterbacks, the more highly touted ones tend to have better careers.

So who drafted well in 2016? Without offering clear answers, perhaps surprisingly, the perennial bottom feeders in the Cleveland Browns did well. They took defensive stars with good combine numbers like Emmanuel Ogbah and Carl Nassib and, for the most part, waited on skill players in later rounds, including Ricardo Louis and Jordan Payton. Another possible surprise is the Jacksonville Jaguars. Their first round pick was a cornerback with excellent combine numbers in Jalen Ramsey. They also took defensive risks like Myles Jack and Sheldon Day, also suggesting they are focusing on addressing some of their bigger needs. As for the uncertainties, obviously the Rams and Eagles gave up a lot for quarterbacks who may or may not pan out, but the Cowboys, Texans and Redskins also drafted skill players in the first round who will be tougher to predict if they can translate to wins.

What we're learning with the latest research is the NFL Draft is not the crapshoot it once was. It will never be perfect, but it is also becoming clearer which teams are heeding this research and who prefers considering non-analytical advice.