By: Edward Egros

Aug 2017

No Need to Establish the Run

David Johnson

Arizona Cardinals running back David Johnson (left) may understand the importance of balancing between rushing and passing about as well as anybody. Last season, he finished with the most touches, all-purpose yards and rushing/rec touchdowns of anyone in the NFL. For an encore, his head coach says he wants Johnson to average 30 touches per game.

It's one thing to strike the right balance between how to use Johnson as a rusher and as a receiver; it's another to make these decision relative to the time of the game. Conventional wisdom in football has always championed the idea of "establishing the run"; meaning no matter how long it takes to create an effective run game, it should be a point of emphasis early in a contest. More recently,
rushing plays are called less frequently, regardless of what the clock reads. Knowing this recent trend, there is a way to explain why, at least analytically, attempting to establish the run is unnecessary.

I took NFL play-by-play data from the 2010 thru the 2015 seasons. This information included which team won and lost. Then, using only rushing plays, I summed up the rushing yards each team had per quarter, per game (in this analysis, I am not including overtime rushing yards because of how infrequently they appeared, but also how much they swayed the results because so many rushing yards will essentially end the game). Using a
logit regression with "win" as a binary dependent variable and rushing yards per quarter as my explanatory variables, here is the output:

=========================================
Deviance Residuals:
Min 1Q Median 3Q Max
-2.8447 -0.9786 -0.5544 1.0545 2.0701
Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept)
-1.747385 0.105946 -16.493 < 2e-16 ***
yards.gained.1
0.006508 0.001922 3.386 0.000708 ***
yards.gained.2
0.007091 0.001953 3.632 0.000282 ***
yards.gained.3
0.015546 0.001910 8.137 4.05e-16 ***
yards.gained.4
0.035783 0.002156 16.594 < 2e-16 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4251.8 on 3066 degrees of freedom
Residual deviance: 3711.2 on 3062 degrees of freedom
AIC: 3721.2 Number of Fisher Scoring iterations: 4
==========================================

First, all of these variables are statistically significant at the 99% level, which makes logical sense. The more yards a team has, no matter the type, the likelier they are to win. Second, there is a direct relationship between the time of the game and the magnitude of the coefficient. In other words, as the game goes on, the more important rushing yards are to the game's outcome. Having the largest coefficient for the fourth quarter makes sense because teams that are leading are trying to take time off the clock, and rushing makes that motive easier to fulfill. However, that the third quarter has a greater magnitude than the first half could suggest there is no statistical advantage to "establishing the run".

It is also important to convert these coefficients to
odds ratios to know how important each rushing yard is to winning. Specifically, an extra first quarter yard increases the odds of winning by a factor of 1.0065. In the second quarter, it's 1.0071, a small difference. In the third quarter, it is 1.0157 and in the fourth, it is 1.0364.

There may be a value to wearing down a defense by running the ball earlier in a game, but from this data and regression, it is not captured. It may also be possible a running back needs several carries before knowing how to dissect a defense later in a game; but again, this idea is not captured aggregately. Again, establishing the run may not be as crucial an idea as originally thought.

However, one conventional bit of wisdom that is reflected is the idea a team controls the game more effectively by running the ball later in the contest. Quantifying how a team controls a game can be captured using a study like this one. In fact, I plan to use this analysis in my weekly Cowboys postgame graphics that explain why Dallas either won or lost a particular contest. I will go over these upgraded graphics in a later blog post.

(Special thanks to
Luke Stanke for providing the data and helping me with the code!)

...One More Thing About the PGA Championship

Pasted Graphic
(Courtesy: Stuart Franklin/Getty Images)

At one point, there was a five-way tie atop the leaderboard during the back nine of the final round of the 99th PGA Championship. Then, Justin Thomas cards a birdie on the 13th hole, enters the Green Mile with a par on 16, a birdie on 17 and an insignificant bogey on 18. While the rest of the field struggled to finish, Thomas blazed through the toughest closing stretch at a major this year, to capture his first Wanamaker Trophy.

My pick to win, Hideki Matsuyama, fared more than respectably, finishing tied for 5th. But as I watched the television coverage of the moments he struggled, one of the commentators pointed out his performance mirrored that of last year's PGA Championship, where he was the best hitter of the golf ball, but could not make any putts. At that point, he finished tied for 4th.

This year, Matsuyama missed a few critical putts, but he was 12th in Strokes Gained: Putting. However, SG: Approach the Green and SG: Around the Green were 20th and 27th, respectively. As for the champion, Thomas was tied for 15th in SG: Approach the Green, 22nd in SG: Around the Green and 4th in SG: Putting. Overall, these numbers are slightly better and equaled a commanding win.

I am reminded of a paper by Dr. George Kondraske of UT Arlington titled: "
General Systems Performance Theory and its Application to Understanding Complex System Performance". In it, Kondraske attempts to explain human systems through complex machines. Regressions have a number components that are often considered additive (which is why we have a lot of "+" signs in our equations). But if one explanatory variable is largely deficient, it is not satisfactory to say the dependent variable decreases by the same amount. The output depends upon everything working together; components are so interconnected that any one piece that does not work or is largely deficient means the entire system might fail to perform.

What does this have to do with golf? If someone cannot putt at all, they will post a high score and have no chance of winning a tournament; they cannot simply overcompensate with a longer drive or a more accurate iron shot. Granted, professional golfers are at least competent in every component of a golf game, but any significant deficiency makes for a bigger setback than simply subtracting odds to win based upon a negative strokes gained metric.

This approach is intuitive to golf enthusiasts. It is why golfers work on everything, not just emphasizing the skills with which they excel. What matters here is when data scientists are putting together models for forecasting winners, perhaps it is important to think less linearly. Maybe it has less to do with the sum of skills coming together and how they fit with a particular course, and more about if every skill is adequate for the demands of a specific tournament. Justin Thomas' skills certainly were.

Who Will Win the 2017 PGA Championship?

Pasted GraphicThis year, the Wanamaker Trophy will be claimed at Quail Hollow Club, the same course that hosts the Wells Fargo Championship (previously the Wachovia Championship). No analysis of this year's PGA Championship would be robust without discussing Rory McIlroy's domination there.

A favorite to win the last major of the season, McIlroy has two victories and once lost in a playoff, in seven appearances there. He also made the cut six of seven times and owns the course record, shooting a 61 in 2015. Also, as I mentioned in a previous article, McIlroy is not only successful in PGA Championships, he is one of the more dominant golfers of any specific event on Tour (even if that major is a hodgepodge of characteristics where no particular abilities stand out). You add to his resume that he has a pair of Top 5 finishes his last two tournaments, and McIlroy seems poised to win for the third time at the PGA Championship.

However, as we have learned with other tournaments,
Strokes Gained statistics have incredible predictive power. When it comes to who has won in North Carolina before, sometimes an already dominant golfer came in and continued his momentum to victory. More recently, Strokes Gained: Around-the-Green has become more crucial to success:

Pasted Graphic 3

There are two periods when a player needed to rank in the Top 40 in SG: Around-the-Green: 2005-2007 and 2014-2016. This season, the Wells Fargo Championship was played elsewhere so Quail Hollow could be redone for a major. The two important changes here are the removal of trees and the adjusting of the front nine to where the final yardage is shorter but likely more challenging. It's possible these two details make SG: Around-the-Green all the more important.

At this point, the players leading in this statistic are: Ian Poulter, Jason Day, Bill Haas, Pat Perez and Cameron Smith. McIlroy barely cracks the Top 80. Jordan Spieth, another favorite who could complete the career Grand Slam at age 24, is 18th. As for Strokes Gained: Off-the-Tee, another stat with some predictive power, the current leaders are Jon Rahm, Dustin Johnson and Sergio Garcia. In terms of skills shown this season, there are several players who are perhaps more suited to win a revamped Quail Hollow than the favorites.

Perhaps the one player that seems to have put it all together, at this point, is Hideki Matsuyama. Fresh off a win at the WGC-Bridgestone Invitational, he is one of only four players with three wins on Tour this season. He also ranks 11th in Strokes Gained: Around-the-Green and 11th in Strokes Gained: Off-the-Tee. Lastly, he finished fourth in last year's PGA Championship and has two Top 20 finishes in the last four seasons. In other words, he overcomes the slightly lower statistical rankings than the aforementioned players with overwhelming momentum and overall success with this specific event. While I expect solid games from the favorites, I am picking Hideki Matsuyama to capture his first major.