By: Edward Egros

...One More Thing About the PGA Championship

Pasted Graphic
(Courtesy: Stuart Franklin/Getty Images)

At one point, there was a five-way tie atop the leaderboard during the back nine of the final round of the 99th PGA Championship. Then, Justin Thomas cards a birdie on the 13th hole, enters the Green Mile with a par on 16, a birdie on 17 and an insignificant bogey on 18. While the rest of the field struggled to finish, Thomas blazed through the toughest closing stretch at a major this year, to capture his first Wanamaker Trophy.

My pick to win, Hideki Matsuyama, fared more than respectably, finishing tied for 5th. But as I watched the television coverage of the moments he struggled, one of the commentators pointed out his performance mirrored that of last year's PGA Championship, where he was the best hitter of the golf ball, but could not make any putts. At that point, he finished tied for 4th.

This year, Matsuyama missed a few critical putts, but he was 12th in Strokes Gained: Putting. However, SG: Approach the Green and SG: Around the Green were 20th and 27th, respectively. As for the champion, Thomas was tied for 15th in SG: Approach the Green, 22nd in SG: Around the Green and 4th in SG: Putting. Overall, these numbers are slightly better and equaled a commanding win.

I am reminded of a paper by Dr. George Kondraske of UT Arlington titled: "
General Systems Performance Theory and its Application to Understanding Complex System Performance". In it, Kondraske attempts to explain human systems through complex machines. Regressions have a number components that are often considered additive (which is why we have a lot of "+" signs in our equations). But if one explanatory variable is largely deficient, it is not satisfactory to say the dependent variable decreases by the same amount. The output depends upon everything working together; components are so interconnected that any one piece that does not work or is largely deficient means the entire system might fail to perform.

What does this have to do with golf? If someone cannot putt at all, they will post a high score and have no chance of winning a tournament; they cannot simply overcompensate with a longer drive or a more accurate iron shot. Granted, professional golfers are at least competent in every component of a golf game, but any significant deficiency makes for a bigger setback than simply subtracting odds to win based upon a negative strokes gained metric.

This approach is intuitive to golf enthusiasts. It is why golfers work on everything, not just emphasizing the skills with which they excel. What matters here is when data scientists are putting together models for forecasting winners, perhaps it is important to think less linearly. Maybe it has less to do with the sum of skills coming together and how they fit with a particular course, and more about if every skill is adequate for the demands of a specific tournament. Justin Thomas' skills certainly were.

Who Will Win the 2017 PGA Championship?

Pasted GraphicThis year, the Wanamaker Trophy will be claimed at Quail Hollow Club, the same course that hosts the Wells Fargo Championship (previously the Wachovia Championship). No analysis of this year's PGA Championship would be robust without discussing Rory McIlroy's domination there.

A favorite to win the last major of the season, McIlroy has two victories and once lost in a playoff, in seven appearances there. He also made the cut six of seven times and owns the course record, shooting a 61 in 2015. Also, as I mentioned in a previous article, McIlroy is not only successful in PGA Championships, he is one of the more dominant golfers of any specific event on Tour (even if that major is a hodgepodge of characteristics where no particular abilities stand out). You add to his resume that he has a pair of Top 5 finishes his last two tournaments, and McIlroy seems poised to win for the third time at the PGA Championship.

However, as we have learned with other tournaments,
Strokes Gained statistics have incredible predictive power. When it comes to who has won in North Carolina before, sometimes an already dominant golfer came in and continued his momentum to victory. More recently, Strokes Gained: Around-the-Green has become more crucial to success:

Pasted Graphic 3

There are two periods when a player needed to rank in the Top 40 in SG: Around-the-Green: 2005-2007 and 2014-2016. This season, the Wells Fargo Championship was played elsewhere so Quail Hollow could be redone for a major. The two important changes here are the removal of trees and the adjusting of the front nine to where the final yardage is shorter but likely more challenging. It's possible these two details make SG: Around-the-Green all the more important.

At this point, the players leading in this statistic are: Ian Poulter, Jason Day, Bill Haas, Pat Perez and Cameron Smith. McIlroy barely cracks the Top 80. Jordan Spieth, another favorite who could complete the career Grand Slam at age 24, is 18th. As for Strokes Gained: Off-the-Tee, another stat with some predictive power, the current leaders are Jon Rahm, Dustin Johnson and Sergio Garcia. In terms of skills shown this season, there are several players who are perhaps more suited to win a revamped Quail Hollow than the favorites.

Perhaps the one player that seems to have put it all together, at this point, is Hideki Matsuyama. Fresh off a win at the WGC-Bridgestone Invitational, he is one of only four players with three wins on Tour this season. He also ranks 11th in Strokes Gained: Around-the-Green and 11th in Strokes Gained: Off-the-Tee. Lastly, he finished fourth in last year's PGA Championship and has two Top 20 finishes in the last four seasons. In other words, he overcomes the slightly lower statistical rankings than the aforementioned players with overwhelming momentum and overall success with this specific event. While I expect solid games from the favorites, I am picking Hideki Matsuyama to capture his first major.

The Statcast Revolution

Pasted Graphic
There are more statistics about hitters than ever before. Thanks to Statcast, a baseball fan can learn how fast a ball comes off a bat from any hit, the angle the ball leaves the bat, an accurate distance the ball travels, etc.

These statistics can help characterize and differentiate hitters. A potential extension to these statistics is if they can predict a hitter's success. For instance, if a hitter averages a higher exit velocity, does that mean he is generally a better hitter?

Fangraphs has kept a database with averages of these Statcast statistics for every hitter. Even though there is some missing data, Jeff Zimmerman made necessary corrections based upon the type of balls in play fielded by certain positions. Using 2016 season data, the variables include:


It makes intuitive sense for the second half of this list to be relevant to a hitter's success, but what about the first half? To answer that question, I merged this same dataset with other advanced offensive statistics for these same hitters (this data came from
Baseball Reference). While it would make sense to choose offensive wins above replacement (oWAR) as my dependent variable, there is a problem. WAR is an aggregate, meaning it can add up with additional plate appearances. Because I am already using averaged statistics for hitters and want to look at the average impact each statistic has to a hitter's overall performance, I divided oWAR by plate appearances and then multiplied by 1,000, so as not to have too many zeroes after the decimal point (this variable is named oWARavg).

The next step is to determine which of the first group of variables is significant at the 95% level. I am using a
backward elimination technique, where I start with a regression with all three variables, then remove any of them that are not significant. By executing this approach, the only variable eliminated was speed. In other words, the average exit velocity of a batted ball is not a significant indicator for how successful a hitter is. However, the angle of the batted ball and the distance it travels are significant:

Pasted Graphic 1

The angle has a negative coefficient, meaning batted balls not hit as steeply tend to be hits. Distance has a positive coefficient, which makes intuitive sense, because the farther a ball travels, the likelier it becomes a hit or maybe even a home run. As accurate as these findings are, the adjusted R-squared is only .1687, meaning only approximately 17% of these two variables can explain the variability of average offensive WAR.

Just for fun, let's see what impact angle and distance have when the second group of variables are included in a regression. Again, using the backward elimination technique, here are the results:


Pasted Graphic 2

Once again, backward elimination took out exit velocity. It also took out the expected ratio of home runs to fly balls. While it kept the original ratio, the negative coefficient does not make intuitive sense. The logic is the more home runs hit out of fly balls, the more successful a hitter is. Instead, this model suggests the alternative. However, a positive isolated power does make logical sense and the adjusted R-squared is approximately 40%, making for a model that does a better job explaining what makes for a successful hitter.

Obviously there are a lot more advanced offensive variables that could be included in a model like this. At least there is a statistical approach for determining which variables Statcast emphasizes that explain offensive success. A similar study can be conducted when looking at baserunning, pitching, defense, etc.

Who Will Win the Dean & DeLuca Invitational?

Pasted Graphic 1
Before offering a prediction for who will wear the plaid jacket as the winner of the Dean & DeLuca Invitational, here is a quick recap of the Byron Nelson.

Sergio Garcia, my pick, did have his moments. He did card a 29 for his Back Nine on Saturday. But several mistakes led to an incredible unraveling for his Sunday round. Also, Billy Horschel could have been a more credible dark horse pick, his Strokes Gained: Off-the-Tee, which I concluded was the most telling for the Nelson, had him in the Top 50 on the PGA Tour. He missed the cut in his last four tournaments, but for a course that emphasizes the tee shot, it should not be as big a surprise Horschel won, given the unpredictability of the tournament.

And now, the Tour heads to Colonial. This tournament is much easier to predict because history is a better indicator for success. Jordan Spieth finished 2nd, 14th and 7th there before winning the event last year. Eleven men have won multiple titles at Colonial, compared with the five at the Nelson.

Once again, let's look at the winners from 2004-2016, the years "
Strokes Gained" statistics are readily available using ShotLink data. The most predictive component for the Dean & DeLuca is Strokes Gained: Approach-the-Green. How golfers do on tee shots on Par-3's and approach shots on Par-4's and Par-5's are most predictive. In fact, Spieth is the only player to rank outside of the Top 75 in this statistic when he won last year. He made up for it with his knowledge and previous success on the course. Strokes Gained: Off-the-Tee is also an important indicator, with most players ranking in the Top 50 before competing.

It might be shocking, but the golfer who currently ranks 2nd in Approach-the-Green is Jordan Spieth. Even though he has missed the last two cuts, his approach shots have often not let him down. The next best golfer who is in the tournament field is Webb Simpson. He has only played this event three times. Though he missed the cut his first two appearances, he finished tied for third last year. Spieth has had recent struggles, while Simpson has a couple of Top 20 finishes in two of his last three tournaments. It would not be a surprise for Spieth to repeat as champion, but my pick is Webb Simpson.

Who Will Win the Byron Nelson?

IMG_6351
Last year, Sergio Garcia became just the fifth golfer ever to win multiple titles at the Byron Nelson. Given this tournament has been around since 1944, it shows just how difficult it is to predict this tournament.

It does help the field is stronger than usual; eight of the top 20 golfers in the world will participate, including Dustin Johnson, Jason Day, Jordan Spieth, and of course Sergio. In fact,
Vegas Insider is giving these highly ranked golfers the best odds to win, most notably Johnson at 5/1. On the surface, this mark makes sense, given he has already won three times this year, more than anyone else on Tour.

But as with most golf predictions I have done, I place an emphasis on
strokes gained statistics. These measurements look at how well a golfer does in each phase of his game, compared with the rest of the field. For instance, strokes gained putting looks at how many putts a golfer needs to complete a hole at a specific distance, so if the average golfer needs 1.5 putts to complete a hole from seven feet, 10 inches, the golfer who sinks the putt gains 0.5 strokes, but a two-putt means they lose 0.5 strokes. These totals are then aggregated for the season.

ShotLink data has this information readily available since the 2004 season. Given the renovations TPC Four Seasons made to the course since that year, this time frame may be enough data for us to have a glimpse into what qualities a golfer needs to have to be successful at this particular tournament. I am using four statistics: Strokes Gained: Off-the-Tee, Approach-the-Green, Around-the-Green and Putting.

The statistic with the best ranking for success is Off-the-Tee. In other words, how well a golfer does from the tee box on all par-4's and par-5's is the best predictor for winning the Byron Nelson. Here is how golfers ranked in this statistic just before competing in the Nelson:

Screen Shot 2017-05-15 at 5.56.06 PM

Other than Steven Bowditch in 2015, every golfer ranks in the Top 100, often in the Top 60. As of the end of the PLAYERS Championship, here are the top ten golfers in Strokes Gained: Off-the-Tee

1. Sergio Garcia
2. Dustin Johnson
3. Jon Rahm
4. Tony Finau
5. Bubba Watson
6. Kyle Stanley
7. Patrick Cantlay
8. Justin Rose
9. Hideki Matsuyama
10. Hudson Swafford

Of these ten, only Garcia, Johnson, Finau and Swafford are competing. Finau and Swafford have played this event far fewer times and Swafford has never finished in the Top 30. As for the other two players, Johnson has played at the Nelson seven times and has averaged a score of 68.54, including four "Top Ten" finishes. Garcia has played the event 12 times, has averaged a score of 69.07 and has the same number of "Top Ten" finishes. The difference is, Garcia has won the Byron Nelson twice and also has a third-place finish.

The volatility of this tournament might make this exercise seem foolish, but history does show, three of the five multiple winners won in back-to-back years. I am picking Sergio Garcia to become the fourth to win back-to-back Byron Nelson championships.