By: Edward Egros

football

Gary Patterson is the Most Hated Man in College Football

Pasted Graphic
(Courtesy: Getty Images)

It's not Nick Saban, Urban Meyer or some college football pundit who polarizes fan bases to insanity, just for that monthly paycheck.

It's TCU head coach Gary Patterson, who's led the program since 2000, including a pair of conference transitions and two New Year's Six Bowl victories. Despite few controversial issues within his program, Patterson earns this distinction because of who he is and where he works.

Who he is, is a winner. Perhaps most notable among his accomplishments, his teams are 43-5 when ranked in the Top 10. This record suggests the longevity of having played so many games near the top of the poll du jour, but also a near perfect winning percentage when expected to succeed.

Where he works is a small, private university with
roughly 10,000 students. To compare, this student body is 1/4 the size of Alabama's and roughly 1/5 the size of other highly touted college football schools like Penn State and Ohio State. Also, many of these schools are flagships of their own state, meaning their fan bases extend well beyond those who actually attend the university. Not only can't TCU boast being a flagship, it operates from a state with some of the larger followings in America like Texas and Texas A&M.

Gary Patterson is a successful coach who works for a small school with a smaller fan base trying to get his team into Year 4 of the College Football Playoff. He came close during the inaugural year of the playoff, but was pushed aside for: Ohio State (Baylor also finished ahead of TCU but was also left out, another small private university). Some will argue vindication for the eventual champion Buckeyes, but how TCU would have performed in the playoff that year remains a mystery, even more shrouded given its 39-point victory over 9th-ranked Ole Miss in the Peach Bowl. The gripes only grow louder knowing TCU
controlled games better than Ohio State, had a better defensive efficiency (a metric that predicts success better than offensive efficiency) and the strength of schedule between the Frogs and Buckeyes were roughly the same.

TCU's lone loss that season was to Baylor, and committees historically rank good losses worse than mediocre defeats. The trend seems counterintuitive, but rhetorically serves as an acceptable argument within college football. Also, because the Frogs and Bears split the Big 12 Championship, despite the head-to-head result, they could have "canceled each other out", opening the door for Ohio State.

Still, the only other school with a successful season these last four years most like TCU is Stanford, with an
enrollment roughly 50% larger than the Frogs'. In 2015, they won the Pac-12 Championship, but two losses locked them out. The last two-loss team to win a National Championship was LSU in 2007, so opportunities for those in Stanford's position have always been limited.

Today, TCU is in a more advantageous position than three years ago. The latest College Football Playoff poll has TCU ranked 6th. They will face 5th-ranked Oklahoma and could face the Sooners again in a separate Big 12 Championship Game, something that did not exist during the TCU/Baylor controversy. The conference added this contest because their analytics suggest the game gives a Big 12 team
a greater likelihood of making the Final Four. Two wins over a highly ranked Sooners squad would give the Horned Frogs an undisputed league championship, something that is a statistically significant variable for making the playoff. Their strength of schedule ranking would also increase and defensive efficiency may also rise because a win would include containing Sooner quarterback and Heisman hopeful Baker Mayfield.

Despite the lone loss, if TCU wins its remaining games, the Frogs' resume would be arguably as bulletproof as any one-loss team. The committee admits to wanting geographic diversity, but there would not be another program in that region of the country with a more attractive resume. If TCU is still left out, something should be considered amiss. Having a smaller following could be assumed as a factor for being left out. Gary Patterson would then spotlight a problem with this era of determining a National Champion: he has done virtually everything he can to put his team in a position to play for a title; and yet gets left out for a second year. A conspiracy theory, true or otherwise, that undermines the validity of the selection process, is something the sport and the committee would hate.

The Truth About 3rd Down

Pasted Graphic
Anyone paying attention to stats during an NFL broadcast has noticed 3rd down conversions being reported. It is an easy way for commentators to critique how clutch a team is and if an offense can maintain a drive when the pressure is at its peak. Obviously a team converting on 100% of its 3rd down attempts is probably winning the game, but otherwise it is not nearly as helpful a statistic as suggested.

For this exercise I took 10 seasons' worth of NFL data (2007-2016) and looked at conversion rates for 1st down, 2nd down, 3rd down and the number of regular season wins that team accumulated. Logically, it would make sense to have an increasing percentage with later downs because you often have fewer yards to go before moving the chains. The numbers reflect this trend: on 1st down, teams on average convert 20% of the time, on 2nd down it's 30.3% and on 3rd down it's 38.1%.

To make things simple, I then calculated a linear regression, treating wins as my dependent variable and keeping it continuous
so as not to lose information. Here are the results:

Pasted Graphic 1

As expected, every down is significant to wins at the 99% level, because the more you convert, the greater your chances of success. The degree to which each down matters does go up, as reflected by the coefficients increasing with each successive down. And, even though later downs should be easier to convert, the coefficient is still increasing, perhaps suggesting third down conversions do matter more than first and second.

However, the
R-squared and adjusted R-squared only hover around 28%. In other words, conversion rates only account for 28% of why a team wins or loses, so a 3rd down conversion percentage by itself is less that figure (22% if 3rd down rate is the only explanatory variable). While these rates are statistically significant (especially on 3rd down) they are also noisy.

In previous blog posts, I have outlined which factors best determine the outcome of football games (
and they are detailed in my Cowboys data visualizations). One reason why I never brought up 3rd down conversion rates is because of how noisy the variable is and how it takes away from 1st and 2nd down. Many others have their own ways of determining success based upon the down, but also the distance. I would suggest, for sake of ease, promoting the discussion of 1st and 2nd down success rates, both as a pair, but also as a bridge to what is a reasonable 3rd down to convert when those plays occur.

A New Explanation of Cowboys Graphics

Pasted Graphic
For the second-straight year, after every Dallas Cowboys game, I will post a recap of the game with an analytic visualization. Once again, these metrics sum up all of the important factors that determine the outcome of a football game. Some of the metrics are the same, while others are more refined and better reflect certain concepts.

Going from the top and working down, once again I will chart turnovers, one of the more impactful statistics in the game. The numbers reflect the turnover margin and the bars reflect how many turnovers were committed.

The next box will look at how the quarterbacks performed, often looking at
net yards per pass attempt. This metric is highly predictive; and while others may be more predictive, it is also far easier to calculate.

Perhaps the biggest change comes where it is labeled "Time of Possession/Rushing Yards". This metric was designed to determine who "controlled" the game. It has since been updated to look at how many rushing yards a team had per quarter.
As noted in a previous blog post, the more rushing yards a team scores later in the game, the likelier they are to win. The larger the number, the better that team "controlled" the game.

Overachiever/Underachiever refers to what the Cowboys' record should be, relative to their point differential for the whole season. In baseball, this idea is referred to as the
Pythagorean Expectation. In football, there is debate as to how to calculate such a record, but here, the exponent is 2.37: ((Points for^2.37) / (Points for^2.37 + Points Against^2.37)) * 16.

Finally, scoring efficiency has been tweaked. The idea here is to see how many points teams scored, relative to the number of yards they needed. The larger the bar and the bigger the number, the more efficient the team was. Simply put, it's points divided by yards, then multiplied by 15.457886 so that average is approximately 1. Using data from 2009-2016, we can also see if a team was overall good, average or bad in its efficiency. If the result is less than .949394, the team was inefficient. If the result is between .949395 and 1.057116, the team was average and gets a blue bar. If the result is greater than the aforementioned range, they were efficient and get a green bar.

Again, these metrics are meant to capture nearly everything that happened in a game that pertained to the result. Some of these metrics can also be used to forecast future games, but the intent is solely inference.

No Need to Establish the Run

David Johnson

Arizona Cardinals running back David Johnson (left) may understand the importance of balancing between rushing and passing about as well as anybody. Last season, he finished with the most touches, all-purpose yards and rushing/rec touchdowns of anyone in the NFL. For an encore, his head coach says he wants Johnson to average 30 touches per game.

It's one thing to strike the right balance between how to use Johnson as a rusher and as a receiver; it's another to make these decision relative to the time of the game. Conventional wisdom in football has always championed the idea of "establishing the run"; meaning no matter how long it takes to create an effective run game, it should be a point of emphasis early in a contest. More recently,
rushing plays are called less frequently, regardless of what the clock reads. Knowing this recent trend, there is a way to explain why, at least analytically, attempting to establish the run is unnecessary.

I took NFL play-by-play data from the 2010 thru the 2015 seasons. This information included which team won and lost. Then, using only rushing plays, I summed up the rushing yards each team had per quarter, per game (in this analysis, I am not including overtime rushing yards because of how infrequently they appeared, but also how much they swayed the results because so many rushing yards will essentially end the game). Using a
logit regression with "win" as a binary dependent variable and rushing yards per quarter as my explanatory variables, here is the output:

=========================================
Deviance Residuals:
Min 1Q Median 3Q Max
-2.8447 -0.9786 -0.5544 1.0545 2.0701
Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept)
-1.747385 0.105946 -16.493 < 2e-16 ***
yards.gained.1
0.006508 0.001922 3.386 0.000708 ***
yards.gained.2
0.007091 0.001953 3.632 0.000282 ***
yards.gained.3
0.015546 0.001910 8.137 4.05e-16 ***
yards.gained.4
0.035783 0.002156 16.594 < 2e-16 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4251.8 on 3066 degrees of freedom
Residual deviance: 3711.2 on 3062 degrees of freedom
AIC: 3721.2 Number of Fisher Scoring iterations: 4
==========================================

First, all of these variables are statistically significant at the 99% level, which makes logical sense. The more yards a team has, no matter the type, the likelier they are to win. Second, there is a direct relationship between the time of the game and the magnitude of the coefficient. In other words, as the game goes on, the more important rushing yards are to the game's outcome. Having the largest coefficient for the fourth quarter makes sense because teams that are leading are trying to take time off the clock, and rushing makes that motive easier to fulfill. However, that the third quarter has a greater magnitude than the first half could suggest there is no statistical advantage to "establishing the run".

It is also important to convert these coefficients to
odds ratios to know how important each rushing yard is to winning. Specifically, an extra first quarter yard increases the odds of winning by a factor of 1.0065. In the second quarter, it's 1.0071, a small difference. In the third quarter, it is 1.0157 and in the fourth, it is 1.0364.

There may be a value to wearing down a defense by running the ball earlier in a game, but from this data and regression, it is not captured. It may also be possible a running back needs several carries before knowing how to dissect a defense later in a game; but again, this idea is not captured aggregately. Again, establishing the run may not be as crucial an idea as originally thought.

However, one conventional bit of wisdom that is reflected is the idea a team controls the game more effectively by running the ball later in the contest. Quantifying how a team controls a game can be captured using a study like this one. In fact, I plan to use this analysis in my weekly Cowboys postgame graphics that explain why Dallas either won or lost a particular contest. I will go over these upgraded graphics in a later blog post.

(Special thanks to
Luke Stanke for providing the data and helping me with the code!)

The Art of the Comeback

Pasted GraphicLast November, arguably five million people attended the Chicago Cubs victory parade, celebrating the team's first World Series Championship since 1908.

Last Summer,
Cleveland hosted hundreds of thousands of Cavaliers fans to celebrate that franchise's first title and the city's first pro championship in more than half a century.

This year in New England, they constantly win. We move on.

The common storyline among these three winners is "The Comeback". The Cubs overcame a 3-1 deficit in the World Series to claim their championship in an extra-inning Game 7, the Cavaliers also stormed back from down 3-1 in the NBA Finals and the Patriots trailed Atlanta by 25 in the second half of Super Bowl LI, to win in overtime. These comebacks were also nearly unprecedented.
Only five teams had come back from down 3-1 to win the World Series before the Cubs. Cleveland became the first NBA team to overcome a 3-1 deficit in the Finals to win. And, New England's 25-point comeback win is the largest in Super Bowl history. The second largest ever is merely ten points.

This confluence of sports drama may seem like supernatural intervention, but perhaps it can be explained in earthlier terms. In 2011, Brian Skinner published "
Scoring Strategies for the Underdog: A General, Quantitative Method for Determining Optimal Sports Strategies". Skinner explained how underdogs must call riskier plays to have a chance at success. In this case, we can refer to teams significantly trailing in series and games as underdogs when their probability of winning is significantly below 50%. Calling riskier plays might mean getting shellacked, but by finding specifically how much riskier a team should get, it might be the only way for those trailing to win.

Baseball closers are niche pitchers, often asked to pitch only one inning, with his team holding the lead. Aroldis Chapman, the Cubs' closer, came in to pitch 2.2 innings in Game 5, 1.1 innings in Game 6 and 1.1 innings in Game 7. Chapman had one day of rest and pitched Game 5, another day of rest before Game 6 and no days off in Game 7. While he did allow three earned runs in the last two games, Maddon believed the risky strategy of extending his closer was the only way to overcome his 3-1 deficit. Chapman did allow runs, but it left other relievers fresh for longer games. Hitters were also asked to swing for home runs, not mere singles or doubles. The Cubs ranked 13th in home runs last season, but in the World Series, they recorded at least one home run in games five, six and seven, en route to their title.

In basketball, Skinner's paper discussed two key concepts pertinent to the Cavs: how often to shoot 3's and when to stall. The logic in the first case is, depending upon how many possessions are left in the game, a team should resort to shooting triples when reaching its critical threshold. In the regular season, Cleveland ranked 7th in the NBA in three-point shooting percentage and 3rd in three-point shooting attempts, but going up against the Golden State Warriors who ranked first in both categories. The Cavs' two of the three highest rates of three-point shooting in that series
happened in games 6 and 7, two must-win games. As for pace, while Golden State had the second most possessions per 48 minutes in the NBA, Cleveland ranked 27th out of 30 teams. However, the Cavs played a faster pace for games 5 and 6, both resorting to a style more like the Warriors and not shortening the game like it is suggested for underdogs. It is worth noting there was a slower pace for Game 7, the most dramatic in the entire series.

Lastly, the Patriots helped themselves and the Falcons maimed themselves because of risk-taking.
Once Atlanta led 28-3, New England resorted to 40 pass plays (including sacks) and just 10 rushes. Before the deficit, the Patriots passed the ball 34 times and ran it 15 times, relying significantly more on the ground attack. Also, some of Brady's longest completions occurred in the 4th quarter during the comeback. Defensively, Matt Ryan and the Falcons leaned towards passing more frequently in the final minutes than sticking to the ground game, which would have taken more time off the clock. Perhaps the most egregious example was when Atlanta had the ball at the New England 22-yard line with 4:40 left in the game and leading by eight. Instead of running the ball three times and going for a two-possession lead, a sack, a pass (wiped away by offensive holding) and an incompletion took the Falcons out of field goal range AND gave Tom Brady 3:30 to tie the game. Overall, even play-count disparity factored into the outcome; Brady kept the Falcons' defense on the field and Ryan could not give his teammates a break.

Teams in any sport can calculate when it is time to run riskier plays. Many recent and high-profile examples suggest comebacks are more possible than ever before, when the right tactics are implemented.

There is a postscript: win probability charts have become more popular than ever. But these games and series show something seemingly calculated to have a .7% probability of happening can occur. Because underdogs can increase their own variance with their playcalling, perhaps these charts need to be updated in some way. Fortunately, this discussion is ongoing.

Who is the NFL MVP?

Pasted Graphic
This year's NFL MVP race is uniquely interesting. Many believe New England Patriots' quarterback Tom Brady deserves this honor, despite missing four games for a controversial deflated football scandal from a few years ago. No matter your opinion as to if Brady deserved to be suspended, it is worth noting, few MVPs have missed games during the regular-season. Players like Emmitt Smith and Aaron Rodgers missed a game or two, but four games is a full quarter of the season and requires a number of assumptions as to if Brady would have played as well as anyone during the stretch he missed.

Before going over these assumptions, let's first look at the history of the award and who else are viable candidates this season. Since the Associated Press began handing out MVP honors, 18 of the recipients were running backs, 40 were quarterbacks and 3 played other positions. The most accomplished running back this season was Ezekiel Elliott. Not only does his 1,631 rushing yards and 15 touchdowns outshine other running backs this year, they outdo others who were proclaimed MVP. Because no one at any other position seemed to stand out, Zeke is the only "non quarterback" worth mentioning.

As for the gunslingers, if you go by passer rating, QBR (
quarterback rating), yards per pass attempt (as well as net yards and adjusted net yards per pass attempt), and passing touchdown percentage, the winner is Atlanta Falcons' quarterback Matt Ryan. New Orleans Saints' QB Drew Brees does have an edge over Ryan in terms of total passing yards and completed passes, but efficiency metrics almost always list Ryan higher than Brees. Brees also did not "lead" his team to the playoffs, something nearly every MVP has done in the past. But this exercise is about Tom Brady and if his numbers would have been superior to Ryan's had he played the entire season.

The simplest way to answer this question is to take proportions of Brady's stats and add them to what he did accomplish and see how they measure up:


Pasted Graphic 1

Pasted Graphic 2

Pasted Graphic 3


By these proportions, Ryan would've still had more passing yards and touchdowns than Brady, though the Patriot would've still had fewer interceptions. However, this exercise assumes opponents are of equal quality, which we know is not true. What we should do is examine the opponents Brady did not play and project what his numbers would have been in those games. New England's first four opponents were Arizona, Miami, Houston and Buffalo. Their combined records are 33-30-1 (Atlanta's first four opponents were Tampa Bay, Oakland, New Orleans and Carolina, with a combined record of 34-30, just one tie better). Brady missed out on the 2nd, 4th, 6th and 15th best passing defenses in the NFL, using passing yards defended as the barometer. Averaging their defensive numbers, that group allowed 219.5 yards and 1.4 touchdowns. To put those numbers in perspective, for the dozen opponents Brady did face, that group allowed 233.4 yards and 1.6 touchdowns.

In other words, the foursome Tom Brady did not play featured significantly better passing defenses than the dozen he did go up against. Given this logic, it is safe to lower Brady's numbers even more than what was projected, which was worse than Matt Ryan's.

Two more things to consider when comparing these two quarterbacks. First,
Pro Football Reference says the Falcons' strength of schedule was significantly tougher than the Patriots' (18th vs 32nd, respectively). It also has its own way of determining Approximate Value of each player as an attempt to show how important they were to a team's overall success. Without getting into the specifics, Ryan led the NFL with 21, Brady was 13, and he would have had to achieve a lot to make up that ground in the four games he missed.

Again, no matter if you believed Tom Brady was unjustly punished for Deflategate, it is unlikely he would have posted better statistics than Matt Ryan. Even though Ezekiel Elliott did have a stellar rookie campaign, his numbers were not historic for any running back. It is Matt Ryan who deserves to be this year's Most Valuable Player.

How Predictive Is Scoring Differential?

Pasted GraphicHow important is an impenetrable goalie in the NHL? How much better is it to outscore opponents throughout the season, as opposed to dominating them defensively? Overall, how important is point differential to overall success?

In an earlier blog post, I discussed
playoff unpredictability when it comes to determining who will win a championship based upon how many games that team won. There, the NBA was the most predictable, then the NHL, NFL, then MLB is the most unpredictable (unless, of course, you are the 2016 Chicago Cubs). But how does point differential (or run differential in baseball or goal differential in hockey) translate to winning championships? And which league is most predictable when looking at that specific metric?

Once again, I am using
logistic regressions using one explanatory variable and if that team won a championship as the dependent variable. However, this time I am using three per sport: offensive output, defensive output and scoring differential. Also once again, here is what is noteworthy with our datasets:

- All data used begins with the 1989-90 season because the NFL had the biggest chance to its playoff format at the turn of the new decade.

- Any season in any sport where a lockout shortened the number of games played considerably was removed (e.g., the 1998-99 NBA season, the 2012-13 NHL season, etc.)

- Though the NHL played 80 and 84 games in a few of these seasons, these numbers are not significantly different from the 82 played the rest of the dataset, so they are still used.

Each explanatory variable has the appropriate and logical coefficient. In other words, scoring variables have a positive coefficient, defensive variables have a negative coefficient and scoring differential variables have a much larger positive coefficient. All of this equates to a better probability of winning a championship. Each variable is also statistically significant with 95% confidence, which is to be expected. A better offense, defense and scoring differential will obviously increase the likelihood of winning a championship. What is not clear is which of these indicators is most predictive.
A goodness-of-fit measure called AIC (Akaike Information Criterion) can shed some light. As this number gets smaller, the model has a better fit, explaining away more of the randomness of that sport.

The first chart is points (or runs or goals) modeled against championships:

Pasted Graphic 1

Before analyzing this chart, it is important to note the value of each point, goal and run, compared with the other sports. In 2016, the average MLB team scored 726 runs for the season. This number is different from the 325 points scored, on average, for an NFL team in 2015, the 8419 points scored for an NBA team for last season and the 222 goals scored for an NHL team for last season. Fortunately, the variation across each league is not so substantially different to where comparison becomes impossible.

In the chart, we see goals in hockey as being the best predictor for winning its championship, with football being slightly more random, then basketball, then baseball finishing as the most random. So far, these results are consistent with the previous study where MLB's postseason was the toughest to predict, based upon number of wins during the regular season. Basketball makes intuitive sense because teams play at different paces, and it is not conclusive if playing at a faster rate—which scores more points but not necessarily more points per possession—is the best way to win a title.

The next chart illustrates runs, points, and goals allowed, modeled against winning a championship:

Pasted Graphic 2

Comparatively, the trends are almost the same as they are with offensive output: Major League Baseball is the most random, followed by the NBA. However, an NFL scoring defense is now a better indicator than an NHL scoring defense, but only slightly so.

Now, let's combine these two charts into scoring differential, modeled against a championship:

Pasted Graphic 3

Here, we learn point differential is more predictive in basketball than in any other sport. Remember how different teams playing at different paces obscures the importance of points alone? Including the defensive component erases pace of play and gives a clearer predictor. It also coincides with how a win total in basketball is most predictive for winning a championship. Football and hockey are nearly equal in predictive ability and baseball is a distant fourth.

There are more trends to uncover if we combine all of these charts:

Pasted Graphic 4

In nearly every sport, scoring defense is more predictive than offense (with hockey being the lone exception). Scoring differential is predictably better for analysis than offense or defense by itself, but the degree to which it takes away the randomness is different for each sport. It is only a slight improvement in the NFL, but a drastic improvement for basketball.

Overall, these proportions could prove helpful when determining if a team is going in the right direction when devoting resources to offense and defense. Both are necessary, but perhaps more money should be proportionally allocated to the areas that best predict who will win a championship.

A Unique Cowboys Perspective

Screen Shot 2016-10-30 at 2.59.44 PM
The Dallas Cowboys are constantly watching film and studying the playbook for that added edge. Their fans also want to know anything that can help explain why their favorite team won or lost, and if there is a way to forecast how they will do and where they need to improve. Our newest data visualizations hope to do all of the above.

Before and during every Cowboys game, I will post on my various social media accounts some analytics that explain what is going on and predict what will happen. After the game, I will have one summary detailing what happened, using explanatory variables that are the best indicators for the outcome of any football game. Here is some extra information for each highlighted variable:

  • Turnovers are perhaps self-explanatory and the team with the better turnover ratio has a significant advantage.
  • Scoring efficiency goes beyond just the scoreboard. It's a ratio of (offensive yards/points). A team may have moved the ball but failed to score many points when near the end zone, so they were inefficient. Not only can each team's efficiency be compared, but each bar has a color: red for bad, blue for average and green for good. Respectively, these quality ranges are: 0-12, 12.01-18.5, 18.51-. These ranges came from the last ten years of NFL data, provided by Pro Football Reference.
  • The ratio (time of possession/rushing yards) looks at who was controlling the game effectively. Time of possession is not an effective indicator for success, but how well a team controls the ball while on offense is. The team with the better ratio earns the checkmark.
  • Overachiever/underachiever is a way to look at how well a team is doing for the season, relative to its point differential. In other words, if a team is has a strong record but all of their wins are close, they are overachieving. If they suffered a number of losses but they have been close, they are underachieving. This idea is calculated using a Pythagorean Expectation formula, something more commonly used in football: ((Points for^2.37)/(Points for^2.37 + Points against^2.37)). This winning percentage can then be multiplied by the number of games played to show where a team "should" be with its record.

Periodically there will be additional metrics to explain why the Cowboys won or lost, such as net passing yards/attempt, which takes into account sacks and incompletions as well as how many passing yards each quarterback is able to accrue. As more metrics become readily available, this summary will include them. To see these visualizations in real time, follow me:


Special thanks to
Fuzzy Red Panda for putting together these beautiful images and programs that advance sports analytics in such creative ways.

Screen Shot 2016-10-30 at 3.22.31 PM

A New Journalism Feature

Pasted GraphicEach week, I will air a segment on Good Day on Fox 4 in Dallas/Fort Worth that takes an analytic look inside college football. First, I look at a statistical trend inferring something we saw from the weekend before, the challenges predicting games and the secrets to being a more informed fan. Second, I use data and modeling to forecast games featuring some of the favorite teams from north Texas.

I will then post these segments to YouTube and share the links on the Journalism section here. You can click Journalism at the top of the page or
click here.

Who Do You Trust in the 4th Quarter?

Pasted GraphicSince being named the starting quarterback for the Dallas Cowboys, Tony Romo has been in the NFL spotlight for ten seasons and 127 games. While he has put up some of the more prolific statistics of any quarterback during this time, many argue he is the most scrutinized veteran gunslinger in the 21st century. One reason is anti-analytical: blown opportunities to win games in the 4th quarter. While many of these games have been the most critical for his team's championship aspirations, it does bring up the bigger question of which quarterbacks have been the most reliable for winning a game in the 4th quarter.

In a later article we will apply analytics and look at what constitutes a "clutch" quarterback. But first, let's look at the raw statistics. The data features 42 quarterbacks spanning all eras of the NFL but who can be considered, at a minimum, marginally successful (e.g. Peyton Manning, Warren Moon, Roger Staubach, Colin Kaepernick, etc.). The 4th quarter variables are: comeback attempts, comeback wins, comeback rate and career blown leads by the QB's own defense.

First, here is a graph of the comeback success rates:

Pasted Graphic 1

Of the quarterbacks analyzed, Andrew Luck has the best 4th quarter comeback rate of anyone (63%). However, he also had the fewest attempts, so it is too soon to call him the most clutch we have ever seen. In second place is Joe Montana (56%), who many might be more willing to admit is the best in close games. Peyton Manning had the most attempts of anyone (94), but his rate is 47%.

Then comes the aforementioned Tony Romo. His rate matches is only slightly worse than Manning's. While it is below half, only five of the 42 quarterbacks studied finished better than 50%. In fact, Romo's rate is 11th best out of 42. At the other end, the worst rate among active quarterbacks belongs to Aaron Rodgers (27%). Don Meredith has the lowest success rate of anyone at 25%.

Some of these rates can be explained by analyzing blown leads by that quarterback's defense:


Pasted Graphic 2

The quarterback dealt the least clutch defense is Drew Brees, where on 31 occasions, his "D" has blown a 4th quarter lead. Fran Tarkenton ranks second with 27. Tony Romo is tied for 10th with 17. This mark is slightly above the average among the 42 quarterback studied. As for those who have fewer reasons to be upset with their defense, there is Kurt Warner (6) and, as expected, Andrew Luck (2).

Visually and expectedly, there is already a direct correlation between 4th quarter comeback rates and blown leads by defense. Still, it is worth discovering if there are statistics for each quarterback that can help explain why some successful quarterbacks are better than others at the end of football games. I will report my findings in a future article.

Special thanks to Mark Lane for putting this data together. You can follow him on Twitter
@therealmarklane.

Yes! Go for Two!

unknownIt's an odd feeling for football fans. After scoring a touchdown, the exhilaration must be contained just as quickly as it erupted, as this same offense, grinding down the field and travailing through the defensive puzzles presented, decides to go for two. The decision is rare: during the 2015 NFL season, 1,217 extra points were attempted, but only 94 times did a team go for two (7%). In fact, five teams never attempted a two-point conversion.

Pittsburgh Steelers quarterback Ben Roethlisberger suggested this week his team should go for two, every time. Though his team attempted more two-point tries than anyone else, fewer than one-fourth of the time did the Steeler offense return to the field after a touchdown.

Traditionally, this idea is irreverent. But analytically, this idea carries merit. Because 94% of extra points were converted last year, if a team always goes for two, they only need to convert 47% of the time to push. It is worth noting, a defense can return the football the length of the field for two points no matter what is being attempted. Though this happened only once and during an extra point, it could fractionally affect this expected value even if it statistically insignificant. Lifetime, teams convert their two-point attempts roughly 50% of the time, almost exactly what they need for it to be a push.

So why always go for two if it is a push and risk injury to more valuable players? And, perhaps more importantly, would this 50% success rate hold if teams went for two more frequently? Aside from the fact there is an obvious trend NFL offenses are improving and kickers are worsening (mainly because the distance of an extra point was moved back 15 yards), the following chart illustrates two-point tries:

Pasted Graphic

As expected, the 50% success rate remains relatively consistent regardless of how many times teams go for two. However, as stated before, this is a small sample size compared with the number of times a team could have gone for two, but elected for the extra point. Usually teams go for two when almost absolutely necessary. When it is not absolutely necessary, will the success rate be the same?

It's worth finding out.

You've Drafted Your Team, Now What?

image1For three days, there is a frenetic pace to the NFL Draft, where specific needs are addressed (or not), value is appropriated for each pick (or not), opponents' draft boards are analyzed and combated (or not) and undrafted free agents are debated and signed. After all of these minuscule details, there comes a bigger picture question: What the heck just happened?

Pundits and perhaps those within each franchise grade these draft classes before anyone attends a rookie minicamp or any single contract gets signed. Though there's always the inexplicable human elements, grading draft classes is becoming easier and more analytically sound.

Last year, the Harvard Sports Analysis Collective posted a blog about
predicting success based upon combine numbers. Some positions are much easier to predict than others. For instance, cornerbacks and outside linebackers with good combine numbers tend to do well in the NFL, whereas quarterbacks and wide receivers are much harder to predict.

Value also matters. Franchises will be crippled if an average player is chosen in the first or second round, while others can be bolstered by picking stellar athletes in later in rounds. Many of these problematic picks happen when a team drafts an offensive skill player.
One research paper suggested 60% of running backs and receivers taken in the 3rd-7th rounds have better average career statistics than those taken in the first and second rounds. The trend with receivers makes sense, given combine numbers not having as much predictive power, but perhaps the trend with running backs is a sign of the times (i.e. the NFL evolving to a passing league). As for quarterbacks, the more highly touted ones tend to have better careers.

So who drafted well in 2016? Without offering clear answers, perhaps surprisingly, the perennial bottom feeders in the Cleveland Browns did well. They took defensive stars with good combine numbers like Emmanuel Ogbah and Carl Nassib and, for the most part, waited on skill players in later rounds, including Ricardo Louis and Jordan Payton. Another possible surprise is the Jacksonville Jaguars. Their first round pick was a cornerback with excellent combine numbers in Jalen Ramsey. They also took defensive risks like Myles Jack and Sheldon Day, also suggesting they are focusing on addressing some of their bigger needs. As for the uncertainties, obviously the Rams and Eagles gave up a lot for quarterbacks who may or may not pan out, but the Cowboys, Texans and Redskins also drafted skill players in the first round who will be tougher to predict if they can translate to wins.

What we're learning with the latest research is the NFL Draft is not the crapshoot it once was. It will never be perfect, but it is also becoming clearer which teams are heeding this research and who prefers considering non-analytical advice.

Playoff Unpredictability

Pasted GraphicUntil recently, the Los Angeles Lakers were one of the fixtures of the NBA Playoffs, and in many seasons, the Finals. They have put together dynasties in different generations of the sport, from Magic Johnson's teams to the Shaq and Kobe era. When the Lakers were not winning titles, chances are another team was enjoying its own dynasty, like the Boston Celtics, Chicago Bulls or San Antonio Spurs. Dynasties are so commonplace in the NBA, 15 franchises in the sport's history do not have a championship (and seven of those still in existence never even made it to the Finals).

The NBA is unique in this regard: championships are won in bulk. Other leagues offer more parity, where there is a larger pool of contenders vying for a title. There may be dynasties in other sports, but there seems to be fewer of them, each shorter in duration and there stood a better chance someone unexpected can claim the sport's top prize.

Which of the four top professional sports leagues (NFL, NBA, MLB and NHL) offers the most playoff unpredictability? Is the NBA truly the most predictable? Is it significantly more predictable or marginally so?

One approach to answering these questions is by using a statistical model for each sport. Here, we will use
logistic regressions, where we will look at only wins (or points in hockey) and see how well it predicts whether a team won a championship that year. Here are some other notes for setting up this project:

- All data used begins with the 1989-90 season because
the NFL had the biggest chance to its playoff format at the turn of the new decade.

- Any season in any sport where a lockout shortened the number of games played considerably was removed (e.g., the 1998-99 NBA season, the 2012-13 NHL season, etc.)

- Though the NHL played 80 and 84 games in a few of these seasons, these numbers are not significantly different from the 82 played the rest of the dataset, so they are still used.

At first glance, every variable representing wins is statistically significant with 99% confidence, which should be obvious because you need so many wins just to make the playoffs. What matters is how well wins alone predicts championships. In statistical parlance, we will use a goodness-of-fit measure called
AIC (Akaike Information Criterion) to answer this question. As this number gets smaller, the model has a better fit. The following shows how well each model performs:

Screen Shot 2016-04-17 at 7.47.11 AM
The larger the bar, the more unpredictable the league is. Again, as expected, the NBA is the most predictable, and by a considerable margin. This model also suggests Major League Baseball is the most unpredictable, with the NFL as a close second and the NHL as a close third.

There are a number of other variables that could be added to these models to help determine who will win a championship, but the simplicity of these models makes for an easier comparison across sports.

Special Teams Not as Special as They Used to Be

GoalpostsVirtually any football fan has heard cliche after cliche about the importance of special teams.  After all, why would they be called "special" if they were anything but?  There are too many instances of momentum being seized and lost because of an impressive kickoff return, devastating injuries affecting a team and the excitement caused by a game-winning field goal.  However, analytics suggest this phase of the game may not be as special as it once was.

Many data scientists have put together linear regressions weighting the importance of a team's offense, defense and special teams for the outcome of a game.  These models say special teams account for less than 20% of the overall effect to the outcome of a game.  
Some models suggest even less.  Winston (2009) put together a regression excluding any special teams variables in his book, Mathletics, and had an R^2 of .8733 and an adjusted-R^2 of .8577 (p. 129).

These models have been around for years, but only recently are we starting to see NFL teams deemphasize special teams:


Screen Shot 2016-03-04 at 12.04.02 AM

This figure represents the touchdowns scored from kickoff returns (red) and punt returns (blue) in the NFL since 2005.  Especially in the last three years, there have been fewer kickoff returns for touchdowns.  Some of this downward trend can be attributed to the league moving the ball to the 35-yard line to promote touchbacks.  Punt return touchdowns had a spike in 2011 and 2012, but have since leveled and do not have a discernible trend over time, positive or negative.  It still does not detract from the overall notion there are fewer points scored from this phase of the game.

What about extra points and field goals?  This past offseason, the league moved the extra point back 13 yards.  
It resulted in a reduction in successful extra point attempts, from 99.3% to 94.2%.  However, this amounts approximately to 80 missed extra point attempts over the course of an entire season for the entire league.  There are even fewer examples of this move affecting the outcome of a game, though one can make an argument with a notable example in the latest AFC Championship Game.  As for going for three, many agree it behooves teams not to kick field goals as frequently as they do.  Lately, there have been fewer field goal attempts.

Again, most of the theoretical research here has been around for a few years, but many successful NFL teams have now heeded the findings and do not invest as much in special teams as they once did.  While many will still pay for top-notch kickers and punt returners and have important reasons for doing so, we are seeing the NFL evolving to a more analytically based approach to the not-as-special special teams.