By: Edward Egros

football

It's a Must Win for Alabama

Pasted Graphic
Let's suppose Oklahoma wins the Big 12 Championship, Ohio State wins the Big Ten, Clemson takes the ACC and, wait for it, wait for it, Georgia claims the SEC title. In other words, Alabama is the only team in the Top 6 to lose AND, aside from Notre Dame, the only team not to have a conference championship.

Seth Walder of ESPN says
Alabama would still make the College Football Playoff, per their Playoff Predictor. This metric offers a likelihood a team makes the playoff, given its resume, taking into account five variables: Strength of Record, Football Power Index, Number of Losses, Conference Championship and Independent Status. This model says Bama would have a 43% chance to make it, while Ohio State would have just a 37% shot and Oklahoma at 28%.

With all due respect to the model, the 6% difference is not large enough to feel comfortable about the prediction (confidence intervals and comparisons are not readily available). Also, if there are messages or lessons the committee is trying to teach college football fans, in 2015 we learned conference championships are hugely significant, unless that team suffered two losses, like 2016 when the two-loss Big Ten Champion Penn State Nittany Lions missed out to one-loss Ohio State, or 2017 when one-loss Alabama edged the two-loss Big Ten Champion Ohio State Buckeyes.

This year, if Alabama loses in this scenario, they would be compared with a one-loss Big 12 champion and a one-loss Big Ten winner for one available spot. Again, this is unlike last year when the committee compared Bama with a two-loss team. There are three teams instead of two to consider, and each have one loss.

If consistency is something to be strived for, and conference championships deserve added weight, and if
geographic diversity is still a consideration, then Alabama, despite everything accomplished this year, is in a "must win" game Saturday afternoon to make the College Football Playoff.

Biggest Snubs for the Cowboys' Ring of Honor

Pasted Graphic
There was a time when virtually the only way a Dallas Cowboy could make the Ring of Honor was to win a Super Bowl. All but two current members had at least one championship (Don Meredith and Don Perkins). However, this week Cowboys owner Jerry Jones affirmed Tony Romo would be inducted into the Ring of Honor. Not only did the former quarterback fail to reach a Super Bowl, he would be the first Cowboy in franchise history to be inducted without even having won a conference title.

Individually, Romo may not have had stellar a career as Ring of Honorees Troy Aikman or Roger Staubach, but he does surpass the efforts of Meredith, and for being a part of the Cowboys for a dozen years, he likely deserve a place in north Texas immortality. By including Romo, the Cowboys introduce the idea that championships should not be weighted as much when determining who belongs, perhaps opening the door for others.

This idea leads to a question: Who is the most deserving Dallas Cowboy for the Ring of Honor who has yet to make it? One way to evaluate individual performances is with
Approximate Value by Pro Football Reference. The top eight players in Cowboys history have already been inducted, from Emmitt Smith (#1) to Staubach (#8).

The highest Approximate Value not to have his name on the ring is Cornell Green, a cornerback who played for 13 seasons, including for the 1971 Super Bowl team. With 34 interceptions, 171 games started and five Pro Bowl invitations, Green has a better case to make it than anyone else not there, per this metric. His 9th best Approximate Value is better than Aikman, Romo, Lee Roy Jordan, Larry Allen, et al.

Two other players who finish in the Top 20 but who are not in the Ring Honor include Ralph Neely, a left tackle as part of the '71 Super Bowl champions and Nate Newton, the left guard who played during the Cowboys dynasty of the 1990's. While this metric may not be the perfect way to compare players, it does highlight some inconsistency for why some players have already been inducted and why others have had to wait.

2018 Cowboys Postgame Reports

Pasted Graphic
For the third-straight year, after every Dallas Cowboys game, I will provide an analytical graphic to begin the conversation as to why the Cowboys won or lost that particular game. However, this year features a new look and simplified visualizations so it's easier to follow and compare what happened. Our graphic is an example from the Cowboys preseason game against the Cardinals.

There are four factors:

- Turnover Margin
- Scoring Efficiency
- Net Yards/Pass Attempt
- Game Control

Our intelligent readers already know what Turnover Margin is, so we move on to Scoring Efficiency, which is essentially points divided by yards. Here, we include percentages, so the more efficient team earns the 100% margin, and the less efficient team shows the fraction of its efficiency compared with its opponent.

Net Yards/Pass Attempt is (passing yards - sack yards) / (passing attempts + times sacked). Because of the reliability of this metric not just to evaluate quarterback performance but also its consistency over time, this serves as an important metric to include.

Lastly, Game Control is based upon a regression where each explanatory variable is the number of rushing yards per quarter and the dependent variable is the likelihood of winning. My research found, predictably, that rushing yards in later quarters matter more to winning than earlier in games. Here, we add up each team's rushing yards and multiply by a factor for each quarter they were rushed in. We then take those results as a proportion to see how much each team controlled the game.

As always, feedback is appreciated!

It May Seem Like Mayhem, But...

Pasted Graphic
Though a few schools decided to start the college football season one week early, the heavyweights, the blue chippers, the ones who are constantly atop any set of rankings you can find and are in contention for that trophy…begin this weekend.

As before, we can use parts of our
college football prediction model to determine who is likeliest to have the most talent and the most favorable schedule, including who has the toughest games at home and if the toughest games are on days with ample rest and preparation.

Using all of this information, my prediction for who will make this year's College Football Playoff are:

Alabama
Ohio State
USC
Florida State

Virtually every year, there is a surprise team sparingly chosen that charges from
outside the Top 10 to the Final Four. This year, I am picking two. First, while many say Washington will represent the West coast, I like USC because of more highly ranked sophomore and junior classes (per 247 Sports) and Washington begins the season in Auburn (a Top 10 team in many metrics including ours), while USC's toughest non-conference opponent is at Texas (not as strong as Auburn), and the Huskies are likelier to lose than the Trojans while USC still earns solid strength of schedule numbers. The Trojans also boast one of the better receiving corps which should help a true freshman quarterback in JT Daniels feel comfortable.

The other outsider is Florida State, edging a perennial contender in Clemson. Again, the Seminoles have more highly ranked second-year and third-year classes and Clemson plays at Florida State. Last season, the Seminoles were ranked third in the AP Preseason. You can make the argument: had they not lost starting
quarterback Deondre Francois for the season with an injured patella tendon in his left knee, they would have been in contention. The running game also carried that offense, and with Cam Akers and Jacques Patrick providing depth in the backfield, this offense should not be overlooked.

This playoff is entering its fifth season. Even though USC and Florida State are outside of the AP Top 10, the Seminoles have been in the playoff before, and the Trojans are the defending Pac-12 champions. It may seem like mayhem, but it's not.

Ohio State's Less Important Question

Pasted Graphic
Ohio State head coach Urban Meyer continues to face the possibility he will not coach the Buckeyes ever again. The school placed him on paid administrative leave as it investigates if he failed to report (or do anything about) an assistant coach allegedly committing domestic violence. This assistant may have exhibited a pattern of horrific behavior, yet remained on Meyer's coaching staff at Florida and Ohio State for years after reported incidents. The school announced it would like to end its investigation in the coming days.

What matters far less than potentially covering up violent crime is football itself. There exists the serious reality an entire football team will have to scramble to organize, practice and get through a gauntlet of a season, all because its leader exhibited incredibly poor judgment. There also exists an unfortunate reality if no reasonable explanations can be uncovered during this investigation: doing the right thing has consequences.

Other college football programs have parted ways with its head coach within a couple of months of the season's kickoff. In 2017, Ole Miss head coach Hugh Freeze resigned
after questions were raised about phone calls made to a female escort service. One year earlier, Baylor fired head coach Art Briles after a couple of his players were convicted of sexual assault and many more women came forward alleging some within the football team committed multiple acts of violence against them. Lastly, in 2012, Arkansas fired head coach Bobby Petrino for unfairly hiring a mistress, not disclosing the nature of that relationship to his boss and not admitting to authorities she was present when Petrino suffered a motorcycle accident.

In each case, I looked at how many wins each team was projected to win prior to each scandal,
according to our prediction model. This model takes into account recruiting rankings of the sophomore and junior seasons from 247sports (the classes we found to be statistically significant), home and away schedules and if any games were played other than on Saturdays. Here are the results:

Pasted Graphic 2


For Ole Miss, near the end of the season the Rebels had four games decided by one possession. In each game we projected them to win; however, they went 2-2. An 8-4 possibility became a 6-6 performance. For Baylor, there was a three-game stretch near the end of the season where things seemed to fall apart (i.e. losses to Kansas State, Texas Tech and West Virginia). The Bears could have gone 10-3, but instead finished 7-6. Lastly, for Arkansas, we suspected a dip in performance after coming off an appearance in the Sugar Bowl, but the downtick turned out to be more severe. Instead of perhaps going 7-5, they went 4-8.

Several other factors could have caused an underperformance of these projections, so it cannot be definitively concluded the departure of the head coach caused the unforeseen losses. However, intuitively it might make sense that a coaching change late in the offseason could mean two or three additional losses. If, indeed, Ohio State decides to fire Urban Meyer, and if it does mean the Buckeyes narrowly miss out on championships, only Meyer is to blame.

The Patriots...and Now the Tide

Pasted Graphic
Anyone looking to tease the Atlanta Falcons mercilessly might write that 28-3 score somewhere prominently or even wear that scoreboard on a t-shirt.

Expect Alabama fans to do the same whenever they need to remind Georgia fans about their team's collapse.

Alabama's Nick Saban once worked for Bill Belichick, so knowingly or not, Saban reflected his former boss in terms of how he engineered coming back from a two-possession deficit in the second half of the National Championship Game. He switched quarterbacks at halftime (opting with a true freshman quarterback with little experience up to that game), he wanted more throws down the field (Alabama completed four passes of 15+ yards in the second half, compared with completed none in the first half) and he demanded his defense take more gambles getting to the backfield (nine tackles for loss in the second half versus only three in the first half).

Belichick also took more risks in Super Bowl LI, knowing that these gambles were the only ways he could possibly win the game. If they failed and the Patriots fell into a deeper hole, it didn't matter; they were going to lose anyway, the size of the deficit does not matter.

In a previous post, I talked about a paper from Brian Skinner: "
Scoring Strategies for the Underdog: A General, Quantitative Method for Determining Optimal Sports Strategies". Skinner explained how underdogs must call riskier plays to have a chance at success. It might mean getting shellacked, but by finding specifically how much riskier a team should get, it might be the only way for those trailing to win.

Nick Saban had to make the change at quarterback because there was almost no chance he was going to win with Jalen Hurts. He had to take more throws down the field because he did not have the time to grind it out with the rushing attack. Lastly, he had to take more chances defensively because if Georgia mounted lengthy drives, there wouldn't be any time left for Alabama to have a chance to complete the comeback.

Some fans still seem surprised by these comebacks, calling them "improbable" or "unbelievable". While they are fantastic for football, it is not a coincidence that the coaches mounting these comebacks not only have won championships, they have been with their respective employer for years, with job security that seems undeniably stable. It is possible coaches who do not have this kind of job security are nervous to be blown out in any game, must less a contest for a championship. Any boss insinuating that the margin of defeat matters can have devastating consequences to the likelihood of a comeback.

Hopefully coaches will be more confidence in a deficit, take more risks, and football fans can watch even more competitive games.

Georgia or Alabama?

Pasted Graphic
The field is set inside Mercedes-Benz Stadium in Atlanta for college football's national championship game. Aside from the playoff logo in the center, it looks a lot like what the SEC Championship will probably look like for years to come. Alabama has shown few signs of slowing down from its dynastic pace, while Georgia's achievements on the field and in recruiting suggest they may be that next major program to become a staple of the playoff.

Those games in the future will never have the stakes of tonight. So who will win?

As previously mentioned, Charles South and I put together a prediction model using advanced analytical techniques (you can see our
poster presentation here). Quick warning: you are about to see a long list. The significant variables—pertinent to tonight—that determine the outcome of a football game are:

- Yards per Pass Attempt
- Yards per Rush Attempt
- Rush Attempts
- Total Yards
- Yards per Play
- Turnovers
- Opponent Points Scored
- Opponent Yards per Rush Attempt
- Opponent Total Yards
- Opponent Turnovers
- Opponent Penalty Yards
- Average Point Differential
- Opponent Offense Passing Yds
- Opponent Offense Yards per Rush Att
- Opponent Offense Total Yards
- Opponent Offense YPP
- Opponent Def Total Rush Yds
- Opponent Defense YPRA
- Opponent Defense Total Yards
- Opponent Def Yards Per Play
- Opponent Defense TO
- Opponent Avg Points Differential
- Difference in Win %
- Recruiting Rankings

If you survived reading that long, congratulations! What's important to learn is the Bulldogs and Crimson Tide excel in just about every category. The difference in yards, points and statistical increments are razor thin, no matter your perspective. Without going into every variable, we can summarize several of them into overall offense, defense, schedule and recruiting.

Georgia's rushing attack with Sony Michel and Nick Chubb comprise most of its offense. They overcame the massive deficit in the Rose Bowl, they make the game manageable for a freshman quarterback and, as part of the backfield, they average more yards per carry and rushing attempts than Alabama. Neither team throws it much, though Georgia is more efficient through the air, by roughly one-third of a yard per attempt. Though Alabama is less efficient overall, some of that fact can be attributed to having big leads early in games, then cruising the rest of the way; it is why the Tide have more total yards than the Dawgs and Bama quarterback Jalen Hurts is the second-leading rusher on his own team, to preserve those leads.

Defensively, there seems to be few weaknesses with Alabama, though outside linebacker Anfernee Jennings will not play because of a knee injury. Near the end of the regular-season the injury problems mounted, but were under control in the Sugar Bowl, limiting the number-one ranked team to just six points and 188 total yards. Its rushing defense is best in America, allowing 2.7 yards per carry. The team passing efficiency defense also gives Bama the edge. Led by safety Minkah Fitzpatrick, they've allowed just seven passing touchdowns and has an efficiency mark a full 17 points better than Georgia (1st in college football vs 13th nationally).

These statistics can be misleading given the small sample sizes in college football. Georgia did play an additional game, and often another contest can help a team historically. Alabama has only a slightly better point differential this season than Georgia. The Bulldogs faced the best offense when it comes to passing efficiency (Oklahoma). The best Alabama went up against was Auburn at 13th; a game they lost (Georgia split the two meetings). The Bulldogs got to face a Top 10 rushing attack in Notre Dame, while the Tide never faced anyone in the Top 25. The best passing efficiency defense Alabama faced was in the Sugar Bowl (5th) while the best Georgia saw was 19th (Auburn). The schedule favors Alabama but only slightly.

Finally, our study used
247Sports Composite Class Rankings to determine who has the best talent. Our study highlights the second-year and third-year classes, but also analyzes the average ranking of the first three classes. In this case, Alabama had the top class the past three years, though Georgia consistency fielded a Top 10 group.

Again, it is clear how evenly matched these teams are and how similar they are in terms of their approaches and philosophies. It promises to be an exciting game, and while the unpredictable like turnovers or missed field goal attempts prove all of the difference, if what's controllable decides this game, Alabama should have a narrow victory.

Predicting the College Football Committee

Pasted Graphic
The penultimate College Football Playoff rankings are out and those conceivably in the running are:

1. Clemson
2. Auburn
3. Oklahoma
4. Wisconsin
5. Alabama
6. Georgia
7. Miami
8. Ohio State

Before predicting how the playoff will develop, it is important to keep a couple of things in mind. First, the College Football Playoff committee has
outlined some of the things they hope to accomplish picking the four teams. Among the most relevant items:

- Consider geography
- Avoid rematches in the regular-season
- Consider strength of schedule
- Consider conference championships won

It is also important some of the things the committee has never done in three years:

- Taken two teams from one conference
- Taken a two-loss team
- Taken three teams from the same region of the country

Using these guidelines, here is how the playoff will be decided:

- The winner of the ACC Championship between Clemson and Miami gets in, the loser is out.
- The winner of the SEC Championship between Georgia and Auburn gets in, the loser is out.
- Oklahoma gets in if they win the Big 12 Championship, TCU cannot get in.
- Wisconsin gets in if they win the Big Ten Championship. If Ohio State wins, they get in if TCU wins.
- Alabama gets in if Oklahoma loses OR Wisconsin loses.

It is impossible point differential matters in any of these league championship games (it is the committee, it is omnipotent). But chances are, we have our blueprint for who will compete for the national title in January.

Forced Into Success

Pasted Graphic
(Courtesy: Getty Images)

An odd thing happened to the Dallas Cowboys in their last couple of games: their opponents' starting kickers exited their games early with injuries. Philadelphia's kicker Jake Elliott suffered a head injury and Los Angeles' kicker Nick Novak experienced back problems. Both teams had to resort to emergency backups during the game, with less than ideal results. Each backup was seen missing the practice the net on the sideline while warming up.

The difference between the Eagles and Chargers is how they adjusted to losing their kickers. Philadelphia opted to avoid kicking all together, not attempting field goals and going for two instead of extra point tries. Los Angeles remained conventional, playing as if they had its kicker. The results are drastically different. The Eagles went for 2-point conversions on four occasions, converting three of them. They also faced a fourth-and-5 from the Dallas 17-yard line, scoring a touchdown on the play. Even if you assume Philadelphia would have made that field goal (and every extra point attempt), by not using a kicker, the team gained five points. As for the Chargers, Drew Kaser missed two extra points and still had Novak make one more attempt, which he missed. Had Los Angeles gone for two after all four of its second-half touchdowns, and if we assume they would have converted half of them (the league average), they would have netted three points.

As a result, Los Angeles' conventional wisdom cost them three points, while Philadelphia gained five points with aggressive play calling. In other words, the Eagles were eight points better with their approach.

There is plenty of analytical research suggested NFL teams
kick fewer field goals or attempt more 2-point conversions. While these findings have been perpetually published for years, it hasn't changed the sport very much. Teams are still attempting roughly as many field goals and extra points as ever, even though offenses have improved and extra points have become more difficult. While teams refuse to implement this research, a real life example happened in the span of one week where one team put itself in a better position by kicking less. It doesn't explain everything, but it can spotlight one reason why Philadelphia has the best record in the NFL, while Los Angeles is on the fringe of the playoffs.

Gary Patterson is the Most Hated Man in College Football

Pasted Graphic
(Courtesy: Getty Images)

It's not Nick Saban, Urban Meyer or some college football pundit who polarizes fan bases to insanity, just for that monthly paycheck.

It's TCU head coach Gary Patterson, who's led the program since 2000, including a pair of conference transitions and two New Year's Six Bowl victories. Despite few controversial issues within his program, Patterson earns this distinction because of who he is and where he works.

Who he is, is a winner. Perhaps most notable among his accomplishments, his teams are 43-5 when ranked in the Top 10. This record suggests the longevity of having played so many games near the top of the poll du jour, but also a near perfect winning percentage when expected to succeed.

Where he works is a small, private university with
roughly 10,000 students. To compare, this student body is 1/4 the size of Alabama's and roughly 1/5 the size of other highly touted college football schools like Penn State and Ohio State. Also, many of these schools are flagships of their own state, meaning their fan bases extend well beyond those who actually attend the university. Not only can't TCU boast being a flagship, it operates from a state with some of the larger followings in America like Texas and Texas A&M.

Gary Patterson is a successful coach who works for a small school with a smaller fan base trying to get his team into Year 4 of the College Football Playoff. He came close during the inaugural year of the playoff, but was pushed aside for: Ohio State (Baylor also finished ahead of TCU but was also left out, another small private university). Some will argue vindication for the eventual champion Buckeyes, but how TCU would have performed in the playoff that year remains a mystery, even more shrouded given its 39-point victory over 9th-ranked Ole Miss in the Peach Bowl. The gripes only grow louder knowing TCU
controlled games better than Ohio State, had a better defensive efficiency (a metric that predicts success better than offensive efficiency) and the strength of schedule between the Frogs and Buckeyes were roughly the same.

TCU's lone loss that season was to Baylor, and committees historically rank good losses worse than mediocre defeats. The trend seems counterintuitive, but rhetorically serves as an acceptable argument within college football. Also, because the Frogs and Bears split the Big 12 Championship, despite the head-to-head result, they could have "canceled each other out", opening the door for Ohio State.

Still, the only other school with a successful season these last four years most like TCU is Stanford, with an
enrollment roughly 50% larger than the Frogs'. In 2015, they won the Pac-12 Championship, but two losses locked them out. The last two-loss team to win a National Championship was LSU in 2007, so opportunities for those in Stanford's position have always been limited.

Today, TCU is in a more advantageous position than three years ago. The latest College Football Playoff poll has TCU ranked 6th. They will face 5th-ranked Oklahoma and could face the Sooners again in a separate Big 12 Championship Game, something that did not exist during the TCU/Baylor controversy. The conference added this contest because their analytics suggest the game gives a Big 12 team
a greater likelihood of making the Final Four. Two wins over a highly ranked Sooners squad would give the Horned Frogs an undisputed league championship, something that is a statistically significant variable for making the playoff. Their strength of schedule ranking would also increase and defensive efficiency may also rise because a win would include containing Sooner quarterback and Heisman hopeful Baker Mayfield.

Despite the lone loss, if TCU wins its remaining games, the Frogs' resume would be arguably as bulletproof as any one-loss team. The committee admits to wanting geographic diversity, but there would not be another program in that region of the country with a more attractive resume. If TCU is still left out, something should be considered amiss. Having a smaller following could be assumed as a factor for being left out. Gary Patterson would then spotlight a problem with this era of determining a National Champion: he has done virtually everything he can to put his team in a position to play for a title; and yet gets left out for a second year. A conspiracy theory, true or otherwise, that undermines the validity of the selection process, is something the sport and the committee would hate.

The Truth About 3rd Down

Pasted Graphic
Anyone paying attention to stats during an NFL broadcast has noticed 3rd down conversions being reported. It is an easy way for commentators to critique how clutch a team is and if an offense can maintain a drive when the pressure is at its peak. Obviously a team converting on 100% of its 3rd down attempts is probably winning the game, but otherwise it is not nearly as helpful a statistic as suggested.

For this exercise I took 10 seasons' worth of NFL data (2007-2016) and looked at conversion rates for 1st down, 2nd down, 3rd down and the number of regular season wins that team accumulated. Logically, it would make sense to have an increasing percentage with later downs because you often have fewer yards to go before moving the chains. The numbers reflect this trend: on 1st down, teams on average convert 20% of the time, on 2nd down it's 30.3% and on 3rd down it's 38.1%.

To make things simple, I then calculated a linear regression, treating wins as my dependent variable and keeping it continuous
so as not to lose information. Here are the results:

Pasted Graphic 1

As expected, every down is significant to wins at the 99% level, because the more you convert, the greater your chances of success. The degree to which each down matters does go up, as reflected by the coefficients increasing with each successive down. And, even though later downs should be easier to convert, the coefficient is still increasing, perhaps suggesting third down conversions do matter more than first and second.

However, the
R-squared and adjusted R-squared only hover around 28%. In other words, conversion rates only account for 28% of why a team wins or loses, so a 3rd down conversion percentage by itself is less that figure (22% if 3rd down rate is the only explanatory variable). While these rates are statistically significant (especially on 3rd down) they are also noisy.

In previous blog posts, I have outlined which factors best determine the outcome of football games (
and they are detailed in my Cowboys data visualizations). One reason why I never brought up 3rd down conversion rates is because of how noisy the variable is and how it takes away from 1st and 2nd down. Many others have their own ways of determining success based upon the down, but also the distance. I would suggest, for sake of ease, promoting the discussion of 1st and 2nd down success rates, both as a pair, but also as a bridge to what is a reasonable 3rd down to convert when those plays occur.

A New Explanation of Cowboys Graphics

Pasted Graphic
For the second-straight year, after every Dallas Cowboys game, I will post a recap of the game with an analytic visualization. Once again, these metrics sum up all of the important factors that determine the outcome of a football game. Some of the metrics are the same, while others are more refined and better reflect certain concepts.

Going from the top and working down, once again I will chart turnovers, one of the more impactful statistics in the game. The numbers reflect the turnover margin and the bars reflect how many turnovers were committed.

The next box will look at how the quarterbacks performed, often looking at
net yards per pass attempt. This metric is highly predictive; and while others may be more predictive, it is also far easier to calculate.

Perhaps the biggest change comes where it is labeled "Time of Possession/Rushing Yards". This metric was designed to determine who "controlled" the game. It has since been updated to look at how many rushing yards a team had per quarter.
As noted in a previous blog post, the more rushing yards a team scores later in the game, the likelier they are to win. The larger the number, the better that team "controlled" the game.

Overachiever/Underachiever refers to what the Cowboys' record should be, relative to their point differential for the whole season. In baseball, this idea is referred to as the
Pythagorean Expectation. In football, there is debate as to how to calculate such a record, but here, the exponent is 2.37: ((Points for^2.37) / (Points for^2.37 + Points Against^2.37)) * 16.

Finally, scoring efficiency has been tweaked. The idea here is to see how many points teams scored, relative to the number of yards they needed. The larger the bar and the bigger the number, the more efficient the team was. Simply put, it's points divided by yards, then multiplied by 15.457886 so that average is approximately 1. Using data from 2009-2016, we can also see if a team was overall good, average or bad in its efficiency. If the result is less than .949394, the team was inefficient. If the result is between .949395 and 1.057116, the team was average and gets a blue bar. If the result is greater than the aforementioned range, they were efficient and get a green bar.

Again, these metrics are meant to capture nearly everything that happened in a game that pertained to the result. Some of these metrics can also be used to forecast future games, but the intent is solely inference.

No Need to Establish the Run

David Johnson

Arizona Cardinals running back David Johnson (left) may understand the importance of balancing between rushing and passing about as well as anybody. Last season, he finished with the most touches, all-purpose yards and rushing/rec touchdowns of anyone in the NFL. For an encore, his head coach says he wants Johnson to average 30 touches per game.

It's one thing to strike the right balance between how to use Johnson as a rusher and as a receiver; it's another to make these decision relative to the time of the game. Conventional wisdom in football has always championed the idea of "establishing the run"; meaning no matter how long it takes to create an effective run game, it should be a point of emphasis early in a contest. More recently,
rushing plays are called less frequently, regardless of what the clock reads. Knowing this recent trend, there is a way to explain why, at least analytically, attempting to establish the run is unnecessary.

I took NFL play-by-play data from the 2010 thru the 2015 seasons. This information included which team won and lost. Then, using only rushing plays, I summed up the rushing yards each team had per quarter, per game (in this analysis, I am not including overtime rushing yards because of how infrequently they appeared, but also how much they swayed the results because so many rushing yards will essentially end the game). Using a
logit regression with "win" as a binary dependent variable and rushing yards per quarter as my explanatory variables, here is the output:

=========================================
Deviance Residuals:
Min 1Q Median 3Q Max
-2.8447 -0.9786 -0.5544 1.0545 2.0701
Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept)
-1.747385 0.105946 -16.493 < 2e-16 ***
yards.gained.1
0.006508 0.001922 3.386 0.000708 ***
yards.gained.2
0.007091 0.001953 3.632 0.000282 ***
yards.gained.3
0.015546 0.001910 8.137 4.05e-16 ***
yards.gained.4
0.035783 0.002156 16.594 < 2e-16 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4251.8 on 3066 degrees of freedom
Residual deviance: 3711.2 on 3062 degrees of freedom
AIC: 3721.2 Number of Fisher Scoring iterations: 4
==========================================

First, all of these variables are statistically significant at the 99% level, which makes logical sense. The more yards a team has, no matter the type, the likelier they are to win. Second, there is a direct relationship between the time of the game and the magnitude of the coefficient. In other words, as the game goes on, the more important rushing yards are to the game's outcome. Having the largest coefficient for the fourth quarter makes sense because teams that are leading are trying to take time off the clock, and rushing makes that motive easier to fulfill. However, that the third quarter has a greater magnitude than the first half could suggest there is no statistical advantage to "establishing the run".

It is also important to convert these coefficients to
odds ratios to know how important each rushing yard is to winning. Specifically, an extra first quarter yard increases the odds of winning by a factor of 1.0065. In the second quarter, it's 1.0071, a small difference. In the third quarter, it is 1.0157 and in the fourth, it is 1.0364.

There may be a value to wearing down a defense by running the ball earlier in a game, but from this data and regression, it is not captured. It may also be possible a running back needs several carries before knowing how to dissect a defense later in a game; but again, this idea is not captured aggregately. Again, establishing the run may not be as crucial an idea as originally thought.

However, one conventional bit of wisdom that is reflected is the idea a team controls the game more effectively by running the ball later in the contest. Quantifying how a team controls a game can be captured using a study like this one. In fact, I plan to use this analysis in my weekly Cowboys postgame graphics that explain why Dallas either won or lost a particular contest. I will go over these upgraded graphics in a later blog post.

(Special thanks to
Luke Stanke for providing the data and helping me with the code!)

The Art of the Comeback

Pasted GraphicLast November, arguably five million people attended the Chicago Cubs victory parade, celebrating the team's first World Series Championship since 1908.

Last Summer,
Cleveland hosted hundreds of thousands of Cavaliers fans to celebrate that franchise's first title and the city's first pro championship in more than half a century.

This year in New England, they constantly win. We move on.

The common storyline among these three winners is "The Comeback". The Cubs overcame a 3-1 deficit in the World Series to claim their championship in an extra-inning Game 7, the Cavaliers also stormed back from down 3-1 in the NBA Finals and the Patriots trailed Atlanta by 25 in the second half of Super Bowl LI, to win in overtime. These comebacks were also nearly unprecedented.
Only five teams had come back from down 3-1 to win the World Series before the Cubs. Cleveland became the first NBA team to overcome a 3-1 deficit in the Finals to win. And, New England's 25-point comeback win is the largest in Super Bowl history. The second largest ever is merely ten points.

This confluence of sports drama may seem like supernatural intervention, but perhaps it can be explained in earthlier terms. In 2011, Brian Skinner published "
Scoring Strategies for the Underdog: A General, Quantitative Method for Determining Optimal Sports Strategies". Skinner explained how underdogs must call riskier plays to have a chance at success. In this case, we can refer to teams significantly trailing in series and games as underdogs when their probability of winning is significantly below 50%. Calling riskier plays might mean getting shellacked, but by finding specifically how much riskier a team should get, it might be the only way for those trailing to win.

Baseball closers are niche pitchers, often asked to pitch only one inning, with his team holding the lead. Aroldis Chapman, the Cubs' closer, came in to pitch 2.2 innings in Game 5, 1.1 innings in Game 6 and 1.1 innings in Game 7. Chapman had one day of rest and pitched Game 5, another day of rest before Game 6 and no days off in Game 7. While he did allow three earned runs in the last two games, Maddon believed the risky strategy of extending his closer was the only way to overcome his 3-1 deficit. Chapman did allow runs, but it left other relievers fresh for longer games. Hitters were also asked to swing for home runs, not mere singles or doubles. The Cubs ranked 13th in home runs last season, but in the World Series, they recorded at least one home run in games five, six and seven, en route to their title.

In basketball, Skinner's paper discussed two key concepts pertinent to the Cavs: how often to shoot 3's and when to stall. The logic in the first case is, depending upon how many possessions are left in the game, a team should resort to shooting triples when reaching its critical threshold. In the regular season, Cleveland ranked 7th in the NBA in three-point shooting percentage and 3rd in three-point shooting attempts, but going up against the Golden State Warriors who ranked first in both categories. The Cavs' two of the three highest rates of three-point shooting in that series
happened in games 6 and 7, two must-win games. As for pace, while Golden State had the second most possessions per 48 minutes in the NBA, Cleveland ranked 27th out of 30 teams. However, the Cavs played a faster pace for games 5 and 6, both resorting to a style more like the Warriors and not shortening the game like it is suggested for underdogs. It is worth noting there was a slower pace for Game 7, the most dramatic in the entire series.

Lastly, the Patriots helped themselves and the Falcons maimed themselves because of risk-taking.
Once Atlanta led 28-3, New England resorted to 40 pass plays (including sacks) and just 10 rushes. Before the deficit, the Patriots passed the ball 34 times and ran it 15 times, relying significantly more on the ground attack. Also, some of Brady's longest completions occurred in the 4th quarter during the comeback. Defensively, Matt Ryan and the Falcons leaned towards passing more frequently in the final minutes than sticking to the ground game, which would have taken more time off the clock. Perhaps the most egregious example was when Atlanta had the ball at the New England 22-yard line with 4:40 left in the game and leading by eight. Instead of running the ball three times and going for a two-possession lead, a sack, a pass (wiped away by offensive holding) and an incompletion took the Falcons out of field goal range AND gave Tom Brady 3:30 to tie the game. Overall, even play-count disparity factored into the outcome; Brady kept the Falcons' defense on the field and Ryan could not give his teammates a break.

Teams in any sport can calculate when it is time to run riskier plays. Many recent and high-profile examples suggest comebacks are more possible than ever before, when the right tactics are implemented.

There is a postscript: win probability charts have become more popular than ever. But these games and series show something seemingly calculated to have a .7% probability of happening can occur. Because underdogs can increase their own variance with their playcalling, perhaps these charts need to be updated in some way. Fortunately, this discussion is ongoing.

Who is the NFL MVP?

Pasted Graphic
This year's NFL MVP race is uniquely interesting. Many believe New England Patriots' quarterback Tom Brady deserves this honor, despite missing four games for a controversial deflated football scandal from a few years ago. No matter your opinion as to if Brady deserved to be suspended, it is worth noting, few MVPs have missed games during the regular-season. Players like Emmitt Smith and Aaron Rodgers missed a game or two, but four games is a full quarter of the season and requires a number of assumptions as to if Brady would have played as well as anyone during the stretch he missed.

Before going over these assumptions, let's first look at the history of the award and who else are viable candidates this season. Since the Associated Press began handing out MVP honors, 18 of the recipients were running backs, 40 were quarterbacks and 3 played other positions. The most accomplished running back this season was Ezekiel Elliott. Not only does his 1,631 rushing yards and 15 touchdowns outshine other running backs this year, they outdo others who were proclaimed MVP. Because no one at any other position seemed to stand out, Zeke is the only "non quarterback" worth mentioning.

As for the gunslingers, if you go by passer rating, QBR (
quarterback rating), yards per pass attempt (as well as net yards and adjusted net yards per pass attempt), and passing touchdown percentage, the winner is Atlanta Falcons' quarterback Matt Ryan. New Orleans Saints' QB Drew Brees does have an edge over Ryan in terms of total passing yards and completed passes, but efficiency metrics almost always list Ryan higher than Brees. Brees also did not "lead" his team to the playoffs, something nearly every MVP has done in the past. But this exercise is about Tom Brady and if his numbers would have been superior to Ryan's had he played the entire season.

The simplest way to answer this question is to take proportions of Brady's stats and add them to what he did accomplish and see how they measure up:


Pasted Graphic 1

Pasted Graphic 2

Pasted Graphic 3


By these proportions, Ryan would've still had more passing yards and touchdowns than Brady, though the Patriot would've still had fewer interceptions. However, this exercise assumes opponents are of equal quality, which we know is not true. What we should do is examine the opponents Brady did not play and project what his numbers would have been in those games. New England's first four opponents were Arizona, Miami, Houston and Buffalo. Their combined records are 33-30-1 (Atlanta's first four opponents were Tampa Bay, Oakland, New Orleans and Carolina, with a combined record of 34-30, just one tie better). Brady missed out on the 2nd, 4th, 6th and 15th best passing defenses in the NFL, using passing yards defended as the barometer. Averaging their defensive numbers, that group allowed 219.5 yards and 1.4 touchdowns. To put those numbers in perspective, for the dozen opponents Brady did face, that group allowed 233.4 yards and 1.6 touchdowns.

In other words, the foursome Tom Brady did not play featured significantly better passing defenses than the dozen he did go up against. Given this logic, it is safe to lower Brady's numbers even more than what was projected, which was worse than Matt Ryan's.

Two more things to consider when comparing these two quarterbacks. First,
Pro Football Reference says the Falcons' strength of schedule was significantly tougher than the Patriots' (18th vs 32nd, respectively). It also has its own way of determining Approximate Value of each player as an attempt to show how important they were to a team's overall success. Without getting into the specifics, Ryan led the NFL with 21, Brady was 13, and he would have had to achieve a lot to make up that ground in the four games he missed.

Again, no matter if you believed Tom Brady was unjustly punished for Deflategate, it is unlikely he would have posted better statistics than Matt Ryan. Even though Ezekiel Elliott did have a stellar rookie campaign, his numbers were not historic for any running back. It is Matt Ryan who deserves to be this year's Most Valuable Player.

How Predictive Is Scoring Differential?

Pasted GraphicHow important is an impenetrable goalie in the NHL? How much better is it to outscore opponents throughout the season, as opposed to dominating them defensively? Overall, how important is point differential to overall success?

In an earlier blog post, I discussed
playoff unpredictability when it comes to determining who will win a championship based upon how many games that team won. There, the NBA was the most predictable, then the NHL, NFL, then MLB is the most unpredictable (unless, of course, you are the 2016 Chicago Cubs). But how does point differential (or run differential in baseball or goal differential in hockey) translate to winning championships? And which league is most predictable when looking at that specific metric?

Once again, I am using
logistic regressions using one explanatory variable and if that team won a championship as the dependent variable. However, this time I am using three per sport: offensive output, defensive output and scoring differential. Also once again, here is what is noteworthy with our datasets:

- All data used begins with the 1989-90 season because the NFL had the biggest chance to its playoff format at the turn of the new decade.

- Any season in any sport where a lockout shortened the number of games played considerably was removed (e.g., the 1998-99 NBA season, the 2012-13 NHL season, etc.)

- Though the NHL played 80 and 84 games in a few of these seasons, these numbers are not significantly different from the 82 played the rest of the dataset, so they are still used.

Each explanatory variable has the appropriate and logical coefficient. In other words, scoring variables have a positive coefficient, defensive variables have a negative coefficient and scoring differential variables have a much larger positive coefficient. All of this equates to a better probability of winning a championship. Each variable is also statistically significant with 95% confidence, which is to be expected. A better offense, defense and scoring differential will obviously increase the likelihood of winning a championship. What is not clear is which of these indicators is most predictive.
A goodness-of-fit measure called AIC (Akaike Information Criterion) can shed some light. As this number gets smaller, the model has a better fit, explaining away more of the randomness of that sport.

The first chart is points (or runs or goals) modeled against championships:

Pasted Graphic 1

Before analyzing this chart, it is important to note the value of each point, goal and run, compared with the other sports. In 2016, the average MLB team scored 726 runs for the season. This number is different from the 325 points scored, on average, for an NFL team in 2015, the 8419 points scored for an NBA team for last season and the 222 goals scored for an NHL team for last season. Fortunately, the variation across each league is not so substantially different to where comparison becomes impossible.

In the chart, we see goals in hockey as being the best predictor for winning its championship, with football being slightly more random, then basketball, then baseball finishing as the most random. So far, these results are consistent with the previous study where MLB's postseason was the toughest to predict, based upon number of wins during the regular season. Basketball makes intuitive sense because teams play at different paces, and it is not conclusive if playing at a faster rate—which scores more points but not necessarily more points per possession—is the best way to win a title.

The next chart illustrates runs, points, and goals allowed, modeled against winning a championship:

Pasted Graphic 2

Comparatively, the trends are almost the same as they are with offensive output: Major League Baseball is the most random, followed by the NBA. However, an NFL scoring defense is now a better indicator than an NHL scoring defense, but only slightly so.

Now, let's combine these two charts into scoring differential, modeled against a championship:

Pasted Graphic 3

Here, we learn point differential is more predictive in basketball than in any other sport. Remember how different teams playing at different paces obscures the importance of points alone? Including the defensive component erases pace of play and gives a clearer predictor. It also coincides with how a win total in basketball is most predictive for winning a championship. Football and hockey are nearly equal in predictive ability and baseball is a distant fourth.

There are more trends to uncover if we combine all of these charts:

Pasted Graphic 4

In nearly every sport, scoring defense is more predictive than offense (with hockey being the lone exception). Scoring differential is predictably better for analysis than offense or defense by itself, but the degree to which it takes away the randomness is different for each sport. It is only a slight improvement in the NFL, but a drastic improvement for basketball.

Overall, these proportions could prove helpful when determining if a team is going in the right direction when devoting resources to offense and defense. Both are necessary, but perhaps more money should be proportionally allocated to the areas that best predict who will win a championship.

A Unique Cowboys Perspective

Screen Shot 2016-10-30 at 2.59.44 PM
The Dallas Cowboys are constantly watching film and studying the playbook for that added edge. Their fans also want to know anything that can help explain why their favorite team won or lost, and if there is a way to forecast how they will do and where they need to improve. Our newest data visualizations hope to do all of the above.

Before and during every Cowboys game, I will post on my various social media accounts some analytics that explain what is going on and predict what will happen. After the game, I will have one summary detailing what happened, using explanatory variables that are the best indicators for the outcome of any football game. Here is some extra information for each highlighted variable:

  • Turnovers are perhaps self-explanatory and the team with the better turnover ratio has a significant advantage.
  • Scoring efficiency goes beyond just the scoreboard. It's a ratio of (offensive yards/points). A team may have moved the ball but failed to score many points when near the end zone, so they were inefficient. Not only can each team's efficiency be compared, but each bar has a color: red for bad, blue for average and green for good. Respectively, these quality ranges are: 0-12, 12.01-18.5, 18.51-. These ranges came from the last ten years of NFL data, provided by Pro Football Reference.
  • The ratio (time of possession/rushing yards) looks at who was controlling the game effectively. Time of possession is not an effective indicator for success, but how well a team controls the ball while on offense is. The team with the better ratio earns the checkmark.
  • Overachiever/underachiever is a way to look at how well a team is doing for the season, relative to its point differential. In other words, if a team is has a strong record but all of their wins are close, they are overachieving. If they suffered a number of losses but they have been close, they are underachieving. This idea is calculated using a Pythagorean Expectation formula, something more commonly used in football: ((Points for^2.37)/(Points for^2.37 + Points against^2.37)). This winning percentage can then be multiplied by the number of games played to show where a team "should" be with its record.

Periodically there will be additional metrics to explain why the Cowboys won or lost, such as net passing yards/attempt, which takes into account sacks and incompletions as well as how many passing yards each quarterback is able to accrue. As more metrics become readily available, this summary will include them. To see these visualizations in real time, follow me:


Special thanks to
Fuzzy Red Panda for putting together these beautiful images and programs that advance sports analytics in such creative ways.

Screen Shot 2016-10-30 at 3.22.31 PM

A New Journalism Feature

Pasted GraphicEach week, I will air a segment on Good Day on Fox 4 in Dallas/Fort Worth that takes an analytic look inside college football. First, I look at a statistical trend inferring something we saw from the weekend before, the challenges predicting games and the secrets to being a more informed fan. Second, I use data and modeling to forecast games featuring some of the favorite teams from north Texas.

I will then post these segments to YouTube and share the links on the Journalism section here. You can click Journalism at the top of the page or
click here.

Who Do You Trust in the 4th Quarter?

Pasted GraphicSince being named the starting quarterback for the Dallas Cowboys, Tony Romo has been in the NFL spotlight for ten seasons and 127 games. While he has put up some of the more prolific statistics of any quarterback during this time, many argue he is the most scrutinized veteran gunslinger in the 21st century. One reason is anti-analytical: blown opportunities to win games in the 4th quarter. While many of these games have been the most critical for his team's championship aspirations, it does bring up the bigger question of which quarterbacks have been the most reliable for winning a game in the 4th quarter.

In a later article we will apply analytics and look at what constitutes a "clutch" quarterback. But first, let's look at the raw statistics. The data features 42 quarterbacks spanning all eras of the NFL but who can be considered, at a minimum, marginally successful (e.g. Peyton Manning, Warren Moon, Roger Staubach, Colin Kaepernick, etc.). The 4th quarter variables are: comeback attempts, comeback wins, comeback rate and career blown leads by the QB's own defense.

First, here is a graph of the comeback success rates:

Pasted Graphic 1

Of the quarterbacks analyzed, Andrew Luck has the best 4th quarter comeback rate of anyone (63%). However, he also had the fewest attempts, so it is too soon to call him the most clutch we have ever seen. In second place is Joe Montana (56%), who many might be more willing to admit is the best in close games. Peyton Manning had the most attempts of anyone (94), but his rate is 47%.

Then comes the aforementioned Tony Romo. His rate matches is only slightly worse than Manning's. While it is below half, only five of the 42 quarterbacks studied finished better than 50%. In fact, Romo's rate is 11th best out of 42. At the other end, the worst rate among active quarterbacks belongs to Aaron Rodgers (27%). Don Meredith has the lowest success rate of anyone at 25%.

Some of these rates can be explained by analyzing blown leads by that quarterback's defense:


Pasted Graphic 2

The quarterback dealt the least clutch defense is Drew Brees, where on 31 occasions, his "D" has blown a 4th quarter lead. Fran Tarkenton ranks second with 27. Tony Romo is tied for 10th with 17. This mark is slightly above the average among the 42 quarterback studied. As for those who have fewer reasons to be upset with their defense, there is Kurt Warner (6) and, as expected, Andrew Luck (2).

Visually and expectedly, there is already a direct correlation between 4th quarter comeback rates and blown leads by defense. Still, it is worth discovering if there are statistics for each quarterback that can help explain why some successful quarterbacks are better than others at the end of football games. I will report my findings in a future article.

Special thanks to Mark Lane for putting this data together. You can follow him on Twitter
@therealmarklane.

Yes! Go for Two!

unknownIt's an odd feeling for football fans. After scoring a touchdown, the exhilaration must be contained just as quickly as it erupted, as this same offense, grinding down the field and travailing through the defensive puzzles presented, decides to go for two. The decision is rare: during the 2015 NFL season, 1,217 extra points were attempted, but only 94 times did a team go for two (7%). In fact, five teams never attempted a two-point conversion.

Pittsburgh Steelers quarterback Ben Roethlisberger suggested this week his team should go for two, every time. Though his team attempted more two-point tries than anyone else, fewer than one-fourth of the time did the Steeler offense return to the field after a touchdown.

Traditionally, this idea is irreverent. But analytically, this idea carries merit. Because 94% of extra points were converted last year, if a team always goes for two, they only need to convert 47% of the time to push. It is worth noting, a defense can return the football the length of the field for two points no matter what is being attempted. Though this happened only once and during an extra point, it could fractionally affect this expected value even if it statistically insignificant. Lifetime, teams convert their two-point attempts roughly 50% of the time, almost exactly what they need for it to be a push.

So why always go for two if it is a push and risk injury to more valuable players? And, perhaps more importantly, would this 50% success rate hold if teams went for two more frequently? Aside from the fact there is an obvious trend NFL offenses are improving and kickers are worsening (mainly because the distance of an extra point was moved back 15 yards), the following chart illustrates two-point tries:

Pasted Graphic

As expected, the 50% success rate remains relatively consistent regardless of how many times teams go for two. However, as stated before, this is a small sample size compared with the number of times a team could have gone for two, but elected for the extra point. Usually teams go for two when almost absolutely necessary. When it is not absolutely necessary, will the success rate be the same?

It's worth finding out.

You've Drafted Your Team, Now What?

image1For three days, there is a frenetic pace to the NFL Draft, where specific needs are addressed (or not), value is appropriated for each pick (or not), opponents' draft boards are analyzed and combated (or not) and undrafted free agents are debated and signed. After all of these minuscule details, there comes a bigger picture question: What the heck just happened?

Pundits and perhaps those within each franchise grade these draft classes before anyone attends a rookie minicamp or any single contract gets signed. Though there's always the inexplicable human elements, grading draft classes is becoming easier and more analytically sound.

Last year, the Harvard Sports Analysis Collective posted a blog about
predicting success based upon combine numbers. Some positions are much easier to predict than others. For instance, cornerbacks and outside linebackers with good combine numbers tend to do well in the NFL, whereas quarterbacks and wide receivers are much harder to predict.

Value also matters. Franchises will be crippled if an average player is chosen in the first or second round, while others can be bolstered by picking stellar athletes in later in rounds. Many of these problematic picks happen when a team drafts an offensive skill player.
One research paper suggested 60% of running backs and receivers taken in the 3rd-7th rounds have better average career statistics than those taken in the first and second rounds. The trend with receivers makes sense, given combine numbers not having as much predictive power, but perhaps the trend with running backs is a sign of the times (i.e. the NFL evolving to a passing league). As for quarterbacks, the more highly touted ones tend to have better careers.

So who drafted well in 2016? Without offering clear answers, perhaps surprisingly, the perennial bottom feeders in the Cleveland Browns did well. They took defensive stars with good combine numbers like Emmanuel Ogbah and Carl Nassib and, for the most part, waited on skill players in later rounds, including Ricardo Louis and Jordan Payton. Another possible surprise is the Jacksonville Jaguars. Their first round pick was a cornerback with excellent combine numbers in Jalen Ramsey. They also took defensive risks like Myles Jack and Sheldon Day, also suggesting they are focusing on addressing some of their bigger needs. As for the uncertainties, obviously the Rams and Eagles gave up a lot for quarterbacks who may or may not pan out, but the Cowboys, Texans and Redskins also drafted skill players in the first round who will be tougher to predict if they can translate to wins.

What we're learning with the latest research is the NFL Draft is not the crapshoot it once was. It will never be perfect, but it is also becoming clearer which teams are heeding this research and who prefers considering non-analytical advice.

Playoff Unpredictability

Pasted GraphicUntil recently, the Los Angeles Lakers were one of the fixtures of the NBA Playoffs, and in many seasons, the Finals. They have put together dynasties in different generations of the sport, from Magic Johnson's teams to the Shaq and Kobe era. When the Lakers were not winning titles, chances are another team was enjoying its own dynasty, like the Boston Celtics, Chicago Bulls or San Antonio Spurs. Dynasties are so commonplace in the NBA, 15 franchises in the sport's history do not have a championship (and seven of those still in existence never even made it to the Finals).

The NBA is unique in this regard: championships are won in bulk. Other leagues offer more parity, where there is a larger pool of contenders vying for a title. There may be dynasties in other sports, but there seems to be fewer of them, each shorter in duration and there stood a better chance someone unexpected can claim the sport's top prize.

Which of the four top professional sports leagues (NFL, NBA, MLB and NHL) offers the most playoff unpredictability? Is the NBA truly the most predictable? Is it significantly more predictable or marginally so?

One approach to answering these questions is by using a statistical model for each sport. Here, we will use
logistic regressions, where we will look at only wins (or points in hockey) and see how well it predicts whether a team won a championship that year. Here are some other notes for setting up this project:

- All data used begins with the 1989-90 season because
the NFL had the biggest chance to its playoff format at the turn of the new decade.

- Any season in any sport where a lockout shortened the number of games played considerably was removed (e.g., the 1998-99 NBA season, the 2012-13 NHL season, etc.)

- Though the NHL played 80 and 84 games in a few of these seasons, these numbers are not significantly different from the 82 played the rest of the dataset, so they are still used.

At first glance, every variable representing wins is statistically significant with 99% confidence, which should be obvious because you need so many wins just to make the playoffs. What matters is how well wins alone predicts championships. In statistical parlance, we will use a goodness-of-fit measure called
AIC (Akaike Information Criterion) to answer this question. As this number gets smaller, the model has a better fit. The following shows how well each model performs:

Screen Shot 2016-04-17 at 7.47.11 AM
The larger the bar, the more unpredictable the league is. Again, as expected, the NBA is the most predictable, and by a considerable margin. This model also suggests Major League Baseball is the most unpredictable, with the NFL as a close second and the NHL as a close third.

There are a number of other variables that could be added to these models to help determine who will win a championship, but the simplicity of these models makes for an easier comparison across sports.

Special Teams Not as Special as They Used to Be

GoalpostsVirtually any football fan has heard cliche after cliche about the importance of special teams.  After all, why would they be called "special" if they were anything but?  There are too many instances of momentum being seized and lost because of an impressive kickoff return, devastating injuries affecting a team and the excitement caused by a game-winning field goal.  However, analytics suggest this phase of the game may not be as special as it once was.

Many data scientists have put together linear regressions weighting the importance of a team's offense, defense and special teams for the outcome of a game.  These models say special teams account for less than 20% of the overall effect to the outcome of a game.  
Some models suggest even less.  Winston (2009) put together a regression excluding any special teams variables in his book, Mathletics, and had an R^2 of .8733 and an adjusted-R^2 of .8577 (p. 129).

These models have been around for years, but only recently are we starting to see NFL teams deemphasize special teams:


Screen Shot 2016-03-04 at 12.04.02 AM

This figure represents the touchdowns scored from kickoff returns (red) and punt returns (blue) in the NFL since 2005.  Especially in the last three years, there have been fewer kickoff returns for touchdowns.  Some of this downward trend can be attributed to the league moving the ball to the 35-yard line to promote touchbacks.  Punt return touchdowns had a spike in 2011 and 2012, but have since leveled and do not have a discernible trend over time, positive or negative.  It still does not detract from the overall notion there are fewer points scored from this phase of the game.

What about extra points and field goals?  This past offseason, the league moved the extra point back 13 yards.  
It resulted in a reduction in successful extra point attempts, from 99.3% to 94.2%.  However, this amounts approximately to 80 missed extra point attempts over the course of an entire season for the entire league.  There are even fewer examples of this move affecting the outcome of a game, though one can make an argument with a notable example in the latest AFC Championship Game.  As for going for three, many agree it behooves teams not to kick field goals as frequently as they do.  Lately, there have been fewer field goal attempts.

Again, most of the theoretical research here has been around for a few years, but many successful NFL teams have now heeded the findings and do not invest as much in special teams as they once did.  While many will still pay for top-notch kickers and punt returners and have important reasons for doing so, we are seeing the NFL evolving to a more analytically based approach to the not-as-special special teams.