By: Edward Egros

Sports

Gary Patterson is the Most Hated Man in College Football

Pasted Graphic
(Courtesy: Getty Images)

It's not Nick Saban, Urban Meyer or some college football pundit who polarizes fan bases to insanity, just for that monthly paycheck.

It's TCU head coach Gary Patterson, who's led the program since 2000, including a pair of conference transitions and two New Year's Six Bowl victories. Despite few controversial issues within his program, Patterson earns this distinction because of who he is and where he works.

Who he is, is a winner. Perhaps most notable among his accomplishments, his teams are 43-5 when ranked in the Top 10. This record suggests the longevity of having played so many games near the top of the poll du jour, but also a near perfect winning percentage when expected to succeed.

Where he works is a small, private university with
roughly 10,000 students. To compare, this student body is 1/4 the size of Alabama's and roughly 1/5 the size of other highly touted college football schools like Penn State and Ohio State. Also, many of these schools are flagships of their own state, meaning their fan bases extend well beyond those who actually attend the university. Not only can't TCU boast being a flagship, it operates from a state with some of the larger followings in America like Texas and Texas A&M.

Gary Patterson is a successful coach who works for a small school with a smaller fan base trying to get his team into Year 4 of the College Football Playoff. He came close during the inaugural year of the playoff, but was pushed aside for: Ohio State (Baylor also finished ahead of TCU but was also left out, another small private university). Some will argue vindication for the eventual champion Buckeyes, but how TCU would have performed in the playoff that year remains a mystery, even more shrouded given its 39-point victory over 9th-ranked Ole Miss in the Peach Bowl. The gripes only grow louder knowing TCU
controlled games better than Ohio State, had a better defensive efficiency (a metric that predicts success better than offensive efficiency) and the strength of schedule between the Frogs and Buckeyes were roughly the same.

TCU's lone loss that season was to Baylor, and committees historically rank good losses worse than mediocre defeats. The trend seems counterintuitive, but rhetorically serves as an acceptable argument within college football. Also, because the Frogs and Bears split the Big 12 Championship, despite the head-to-head result, they could have "canceled each other out", opening the door for Ohio State.

Still, the only other school with a successful season these last four years most like TCU is Stanford, with an
enrollment roughly 50% larger than the Frogs'. In 2015, they won the Pac-12 Championship, but two losses locked them out. The last two-loss team to win a National Championship was LSU in 2007, so opportunities for those in Stanford's position have always been limited.

Today, TCU is in a more advantageous position than three years ago. The latest College Football Playoff poll has TCU ranked 6th. They will face 5th-ranked Oklahoma and could face the Sooners again in a separate Big 12 Championship Game, something that did not exist during the TCU/Baylor controversy. The conference added this contest because their analytics suggest the game gives a Big 12 team
a greater likelihood of making the Final Four. Two wins over a highly ranked Sooners squad would give the Horned Frogs an undisputed league championship, something that is a statistically significant variable for making the playoff. Their strength of schedule ranking would also increase and defensive efficiency may also rise because a win would include containing Sooner quarterback and Heisman hopeful Baker Mayfield.

Despite the lone loss, if TCU wins its remaining games, the Frogs' resume would be arguably as bulletproof as any one-loss team. The committee admits to wanting geographic diversity, but there would not be another program in that region of the country with a more attractive resume. If TCU is still left out, something should be considered amiss. Having a smaller following could be assumed as a factor for being left out. Gary Patterson would then spotlight a problem with this era of determining a National Champion: he has done virtually everything he can to put his team in a position to play for a title; and yet gets left out for a second year. A conspiracy theory, true or otherwise, that undermines the validity of the selection process, is something the sport and the committee would hate.

The Truth About 3rd Down

Pasted Graphic
Anyone paying attention to stats during an NFL broadcast has noticed 3rd down conversions being reported. It is an easy way for commentators to critique how clutch a team is and if an offense can maintain a drive when the pressure is at its peak. Obviously a team converting on 100% of its 3rd down attempts is probably winning the game, but otherwise it is not nearly as helpful a statistic as suggested.

For this exercise I took 10 seasons' worth of NFL data (2007-2016) and looked at conversion rates for 1st down, 2nd down, 3rd down and the number of regular season wins that team accumulated. Logically, it would make sense to have an increasing percentage with later downs because you often have fewer yards to go before moving the chains. The numbers reflect this trend: on 1st down, teams on average convert 20% of the time, on 2nd down it's 30.3% and on 3rd down it's 38.1%.

To make things simple, I then calculated a linear regression, treating wins as my dependent variable and keeping it continuous
so as not to lose information. Here are the results:

Pasted Graphic 1

As expected, every down is significant to wins at the 99% level, because the more you convert, the greater your chances of success. The degree to which each down matters does go up, as reflected by the coefficients increasing with each successive down. And, even though later downs should be easier to convert, the coefficient is still increasing, perhaps suggesting third down conversions do matter more than first and second.

However, the
R-squared and adjusted R-squared only hover around 28%. In other words, conversion rates only account for 28% of why a team wins or loses, so a 3rd down conversion percentage by itself is less that figure (22% if 3rd down rate is the only explanatory variable). While these rates are statistically significant (especially on 3rd down) they are also noisy.

In previous blog posts, I have outlined which factors best determine the outcome of football games (
and they are detailed in my Cowboys data visualizations). One reason why I never brought up 3rd down conversion rates is because of how noisy the variable is and how it takes away from 1st and 2nd down. Many others have their own ways of determining success based upon the down, but also the distance. I would suggest, for sake of ease, promoting the discussion of 1st and 2nd down success rates, both as a pair, but also as a bridge to what is a reasonable 3rd down to convert when those plays occur.

A New Explanation of Cowboys Graphics

Pasted Graphic
For the second-straight year, after every Dallas Cowboys game, I will post a recap of the game with an analytic visualization. Once again, these metrics sum up all of the important factors that determine the outcome of a football game. Some of the metrics are the same, while others are more refined and better reflect certain concepts.

Going from the top and working down, once again I will chart turnovers, one of the more impactful statistics in the game. The numbers reflect the turnover margin and the bars reflect how many turnovers were committed.

The next box will look at how the quarterbacks performed, often looking at
net yards per pass attempt. This metric is highly predictive; and while others may be more predictive, it is also far easier to calculate.

Perhaps the biggest change comes where it is labeled "Time of Possession/Rushing Yards". This metric was designed to determine who "controlled" the game. It has since been updated to look at how many rushing yards a team had per quarter.
As noted in a previous blog post, the more rushing yards a team scores later in the game, the likelier they are to win. The larger the number, the better that team "controlled" the game.

Overachiever/Underachiever refers to what the Cowboys' record should be, relative to their point differential for the whole season. In baseball, this idea is referred to as the
Pythagorean Expectation. In football, there is debate as to how to calculate such a record, but here, the exponent is 2.37: ((Points for^2.37) / (Points for^2.37 + Points Against^2.37)) * 16.

Finally, scoring efficiency has been tweaked. The idea here is to see how many points teams scored, relative to the number of yards they needed. The larger the bar and the bigger the number, the more efficient the team was. Simply put, it's points divided by yards, then multiplied by 15.457886 so that average is approximately 1. Using data from 2009-2016, we can also see if a team was overall good, average or bad in its efficiency. If the result is less than .949394, the team was inefficient. If the result is between .949395 and 1.057116, the team was average and gets a blue bar. If the result is greater than the aforementioned range, they were efficient and get a green bar.

Again, these metrics are meant to capture nearly everything that happened in a game that pertained to the result. Some of these metrics can also be used to forecast future games, but the intent is solely inference.

No Need to Establish the Run

David Johnson

Arizona Cardinals running back David Johnson (left) may understand the importance of balancing between rushing and passing about as well as anybody. Last season, he finished with the most touches, all-purpose yards and rushing/rec touchdowns of anyone in the NFL. For an encore, his head coach says he wants Johnson to average 30 touches per game.

It's one thing to strike the right balance between how to use Johnson as a rusher and as a receiver; it's another to make these decision relative to the time of the game. Conventional wisdom in football has always championed the idea of "establishing the run"; meaning no matter how long it takes to create an effective run game, it should be a point of emphasis early in a contest. More recently,
rushing plays are called less frequently, regardless of what the clock reads. Knowing this recent trend, there is a way to explain why, at least analytically, attempting to establish the run is unnecessary.

I took NFL play-by-play data from the 2010 thru the 2015 seasons. This information included which team won and lost. Then, using only rushing plays, I summed up the rushing yards each team had per quarter, per game (in this analysis, I am not including overtime rushing yards because of how infrequently they appeared, but also how much they swayed the results because so many rushing yards will essentially end the game). Using a
logit regression with "win" as a binary dependent variable and rushing yards per quarter as my explanatory variables, here is the output:

=========================================
Deviance Residuals:
Min 1Q Median 3Q Max
-2.8447 -0.9786 -0.5544 1.0545 2.0701
Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept)
-1.747385 0.105946 -16.493 < 2e-16 ***
yards.gained.1
0.006508 0.001922 3.386 0.000708 ***
yards.gained.2
0.007091 0.001953 3.632 0.000282 ***
yards.gained.3
0.015546 0.001910 8.137 4.05e-16 ***
yards.gained.4
0.035783 0.002156 16.594 < 2e-16 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4251.8 on 3066 degrees of freedom
Residual deviance: 3711.2 on 3062 degrees of freedom
AIC: 3721.2 Number of Fisher Scoring iterations: 4
==========================================

First, all of these variables are statistically significant at the 99% level, which makes logical sense. The more yards a team has, no matter the type, the likelier they are to win. Second, there is a direct relationship between the time of the game and the magnitude of the coefficient. In other words, as the game goes on, the more important rushing yards are to the game's outcome. Having the largest coefficient for the fourth quarter makes sense because teams that are leading are trying to take time off the clock, and rushing makes that motive easier to fulfill. However, that the third quarter has a greater magnitude than the first half could suggest there is no statistical advantage to "establishing the run".

It is also important to convert these coefficients to
odds ratios to know how important each rushing yard is to winning. Specifically, an extra first quarter yard increases the odds of winning by a factor of 1.0065. In the second quarter, it's 1.0071, a small difference. In the third quarter, it is 1.0157 and in the fourth, it is 1.0364.

There may be a value to wearing down a defense by running the ball earlier in a game, but from this data and regression, it is not captured. It may also be possible a running back needs several carries before knowing how to dissect a defense later in a game; but again, this idea is not captured aggregately. Again, establishing the run may not be as crucial an idea as originally thought.

However, one conventional bit of wisdom that is reflected is the idea a team controls the game more effectively by running the ball later in the contest. Quantifying how a team controls a game can be captured using a study like this one. In fact, I plan to use this analysis in my weekly Cowboys postgame graphics that explain why Dallas either won or lost a particular contest. I will go over these upgraded graphics in a later blog post.

(Special thanks to
Luke Stanke for providing the data and helping me with the code!)

...One More Thing About the PGA Championship

Pasted Graphic
(Courtesy: Stuart Franklin/Getty Images)

At one point, there was a five-way tie atop the leaderboard during the back nine of the final round of the 99th PGA Championship. Then, Justin Thomas cards a birdie on the 13th hole, enters the Green Mile with a par on 16, a birdie on 17 and an insignificant bogey on 18. While the rest of the field struggled to finish, Thomas blazed through the toughest closing stretch at a major this year, to capture his first Wanamaker Trophy.

My pick to win, Hideki Matsuyama, fared more than respectably, finishing tied for 5th. But as I watched the television coverage of the moments he struggled, one of the commentators pointed out his performance mirrored that of last year's PGA Championship, where he was the best hitter of the golf ball, but could not make any putts. At that point, he finished tied for 4th.

This year, Matsuyama missed a few critical putts, but he was 12th in Strokes Gained: Putting. However, SG: Approach the Green and SG: Around the Green were 20th and 27th, respectively. As for the champion, Thomas was tied for 15th in SG: Approach the Green, 22nd in SG: Around the Green and 4th in SG: Putting. Overall, these numbers are slightly better and equaled a commanding win.

I am reminded of a paper by Dr. George Kondraske of UT Arlington titled: "
General Systems Performance Theory and its Application to Understanding Complex System Performance". In it, Kondraske attempts to explain human systems through complex machines. Regressions have a number components that are often considered additive (which is why we have a lot of "+" signs in our equations). But if one explanatory variable is largely deficient, it is not satisfactory to say the dependent variable decreases by the same amount. The output depends upon everything working together; components are so interconnected that any one piece that does not work or is largely deficient means the entire system might fail to perform.

What does this have to do with golf? If someone cannot putt at all, they will post a high score and have no chance of winning a tournament; they cannot simply overcompensate with a longer drive or a more accurate iron shot. Granted, professional golfers are at least competent in every component of a golf game, but any significant deficiency makes for a bigger setback than simply subtracting odds to win based upon a negative strokes gained metric.

This approach is intuitive to golf enthusiasts. It is why golfers work on everything, not just emphasizing the skills with which they excel. What matters here is when data scientists are putting together models for forecasting winners, perhaps it is important to think less linearly. Maybe it has less to do with the sum of skills coming together and how they fit with a particular course, and more about if every skill is adequate for the demands of a specific tournament. Justin Thomas' skills certainly were.

Who Will Win the 2017 PGA Championship?

Pasted GraphicThis year, the Wanamaker Trophy will be claimed at Quail Hollow Club, the same course that hosts the Wells Fargo Championship (previously the Wachovia Championship). No analysis of this year's PGA Championship would be robust without discussing Rory McIlroy's domination there.

A favorite to win the last major of the season, McIlroy has two victories and once lost in a playoff, in seven appearances there. He also made the cut six of seven times and owns the course record, shooting a 61 in 2015. Also, as I mentioned in a previous article, McIlroy is not only successful in PGA Championships, he is one of the more dominant golfers of any specific event on Tour (even if that major is a hodgepodge of characteristics where no particular abilities stand out). You add to his resume that he has a pair of Top 5 finishes his last two tournaments, and McIlroy seems poised to win for the third time at the PGA Championship.

However, as we have learned with other tournaments,
Strokes Gained statistics have incredible predictive power. When it comes to who has won in North Carolina before, sometimes an already dominant golfer came in and continued his momentum to victory. More recently, Strokes Gained: Around-the-Green has become more crucial to success:

Pasted Graphic 3

There are two periods when a player needed to rank in the Top 40 in SG: Around-the-Green: 2005-2007 and 2014-2016. This season, the Wells Fargo Championship was played elsewhere so Quail Hollow could be redone for a major. The two important changes here are the removal of trees and the adjusting of the front nine to where the final yardage is shorter but likely more challenging. It's possible these two details make SG: Around-the-Green all the more important.

At this point, the players leading in this statistic are: Ian Poulter, Jason Day, Bill Haas, Pat Perez and Cameron Smith. McIlroy barely cracks the Top 80. Jordan Spieth, another favorite who could complete the career Grand Slam at age 24, is 18th. As for Strokes Gained: Off-the-Tee, another stat with some predictive power, the current leaders are Jon Rahm, Dustin Johnson and Sergio Garcia. In terms of skills shown this season, there are several players who are perhaps more suited to win a revamped Quail Hollow than the favorites.

Perhaps the one player that seems to have put it all together, at this point, is Hideki Matsuyama. Fresh off a win at the WGC-Bridgestone Invitational, he is one of only four players with three wins on Tour this season. He also ranks 11th in Strokes Gained: Around-the-Green and 11th in Strokes Gained: Off-the-Tee. Lastly, he finished fourth in last year's PGA Championship and has two Top 20 finishes in the last four seasons. In other words, he overcomes the slightly lower statistical rankings than the aforementioned players with overwhelming momentum and overall success with this specific event. While I expect solid games from the favorites, I am picking Hideki Matsuyama to capture his first major.

The Statcast Revolution

Pasted Graphic
There are more statistics about hitters than ever before. Thanks to Statcast, a baseball fan can learn how fast a ball comes off a bat from any hit, the angle the ball leaves the bat, an accurate distance the ball travels, etc.

These statistics can help characterize and differentiate hitters. A potential extension to these statistics is if they can predict a hitter's success. For instance, if a hitter averages a higher exit velocity, does that mean he is generally a better hitter?

Fangraphs has kept a database with averages of these Statcast statistics for every hitter. Even though there is some missing data, Jeff Zimmerman made necessary corrections based upon the type of balls in play fielded by certain positions. Using 2016 season data, the variables include:


It makes intuitive sense for the second half of this list to be relevant to a hitter's success, but what about the first half? To answer that question, I merged this same dataset with other advanced offensive statistics for these same hitters (this data came from
Baseball Reference). While it would make sense to choose offensive wins above replacement (oWAR) as my dependent variable, there is a problem. WAR is an aggregate, meaning it can add up with additional plate appearances. Because I am already using averaged statistics for hitters and want to look at the average impact each statistic has to a hitter's overall performance, I divided oWAR by plate appearances and then multiplied by 1,000, so as not to have too many zeroes after the decimal point (this variable is named oWARavg).

The next step is to determine which of the first group of variables is significant at the 95% level. I am using a
backward elimination technique, where I start with a regression with all three variables, then remove any of them that are not significant. By executing this approach, the only variable eliminated was speed. In other words, the average exit velocity of a batted ball is not a significant indicator for how successful a hitter is. However, the angle of the batted ball and the distance it travels are significant:

Pasted Graphic 1

The angle has a negative coefficient, meaning batted balls not hit as steeply tend to be hits. Distance has a positive coefficient, which makes intuitive sense, because the farther a ball travels, the likelier it becomes a hit or maybe even a home run. As accurate as these findings are, the adjusted R-squared is only .1687, meaning only approximately 17% of these two variables can explain the variability of average offensive WAR.

Just for fun, let's see what impact angle and distance have when the second group of variables are included in a regression. Again, using the backward elimination technique, here are the results:


Pasted Graphic 2

Once again, backward elimination took out exit velocity. It also took out the expected ratio of home runs to fly balls. While it kept the original ratio, the negative coefficient does not make intuitive sense. The logic is the more home runs hit out of fly balls, the more successful a hitter is. Instead, this model suggests the alternative. However, a positive isolated power does make logical sense and the adjusted R-squared is approximately 40%, making for a model that does a better job explaining what makes for a successful hitter.

Obviously there are a lot more advanced offensive variables that could be included in a model like this. At least there is a statistical approach for determining which variables Statcast emphasizes that explain offensive success. A similar study can be conducted when looking at baserunning, pitching, defense, etc.

Who Will Win the Dean & DeLuca Invitational?

Pasted Graphic 1
Before offering a prediction for who will wear the plaid jacket as the winner of the Dean & DeLuca Invitational, here is a quick recap of the Byron Nelson.

Sergio Garcia, my pick, did have his moments. He did card a 29 for his Back Nine on Saturday. But several mistakes led to an incredible unraveling for his Sunday round. Also, Billy Horschel could have been a more credible dark horse pick, his Strokes Gained: Off-the-Tee, which I concluded was the most telling for the Nelson, had him in the Top 50 on the PGA Tour. He missed the cut in his last four tournaments, but for a course that emphasizes the tee shot, it should not be as big a surprise Horschel won, given the unpredictability of the tournament.

And now, the Tour heads to Colonial. This tournament is much easier to predict because history is a better indicator for success. Jordan Spieth finished 2nd, 14th and 7th there before winning the event last year. Eleven men have won multiple titles at Colonial, compared with the five at the Nelson.

Once again, let's look at the winners from 2004-2016, the years "
Strokes Gained" statistics are readily available using ShotLink data. The most predictive component for the Dean & DeLuca is Strokes Gained: Approach-the-Green. How golfers do on tee shots on Par-3's and approach shots on Par-4's and Par-5's are most predictive. In fact, Spieth is the only player to rank outside of the Top 75 in this statistic when he won last year. He made up for it with his knowledge and previous success on the course. Strokes Gained: Off-the-Tee is also an important indicator, with most players ranking in the Top 50 before competing.

It might be shocking, but the golfer who currently ranks 2nd in Approach-the-Green is Jordan Spieth. Even though he has missed the last two cuts, his approach shots have often not let him down. The next best golfer who is in the tournament field is Webb Simpson. He has only played this event three times. Though he missed the cut his first two appearances, he finished tied for third last year. Spieth has had recent struggles, while Simpson has a couple of Top 20 finishes in two of his last three tournaments. It would not be a surprise for Spieth to repeat as champion, but my pick is Webb Simpson.

Who Will Win the Byron Nelson?

IMG_6351
Last year, Sergio Garcia became just the fifth golfer ever to win multiple titles at the Byron Nelson. Given this tournament has been around since 1944, it shows just how difficult it is to predict this tournament.

It does help the field is stronger than usual; eight of the top 20 golfers in the world will participate, including Dustin Johnson, Jason Day, Jordan Spieth, and of course Sergio. In fact,
Vegas Insider is giving these highly ranked golfers the best odds to win, most notably Johnson at 5/1. On the surface, this mark makes sense, given he has already won three times this year, more than anyone else on Tour.

But as with most golf predictions I have done, I place an emphasis on
strokes gained statistics. These measurements look at how well a golfer does in each phase of his game, compared with the rest of the field. For instance, strokes gained putting looks at how many putts a golfer needs to complete a hole at a specific distance, so if the average golfer needs 1.5 putts to complete a hole from seven feet, 10 inches, the golfer who sinks the putt gains 0.5 strokes, but a two-putt means they lose 0.5 strokes. These totals are then aggregated for the season.

ShotLink data has this information readily available since the 2004 season. Given the renovations TPC Four Seasons made to the course since that year, this time frame may be enough data for us to have a glimpse into what qualities a golfer needs to have to be successful at this particular tournament. I am using four statistics: Strokes Gained: Off-the-Tee, Approach-the-Green, Around-the-Green and Putting.

The statistic with the best ranking for success is Off-the-Tee. In other words, how well a golfer does from the tee box on all par-4's and par-5's is the best predictor for winning the Byron Nelson. Here is how golfers ranked in this statistic just before competing in the Nelson:

Screen Shot 2017-05-15 at 5.56.06 PM

Other than Steven Bowditch in 2015, every golfer ranks in the Top 100, often in the Top 60. As of the end of the PLAYERS Championship, here are the top ten golfers in Strokes Gained: Off-the-Tee

1. Sergio Garcia
2. Dustin Johnson
3. Jon Rahm
4. Tony Finau
5. Bubba Watson
6. Kyle Stanley
7. Patrick Cantlay
8. Justin Rose
9. Hideki Matsuyama
10. Hudson Swafford

Of these ten, only Garcia, Johnson, Finau and Swafford are competing. Finau and Swafford have played this event far fewer times and Swafford has never finished in the Top 30. As for the other two players, Johnson has played at the Nelson seven times and has averaged a score of 68.54, including four "Top Ten" finishes. Garcia has played the event 12 times, has averaged a score of 69.07 and has the same number of "Top Ten" finishes. The difference is, Garcia has won the Byron Nelson twice and also has a third-place finish.

The volatility of this tournament might make this exercise seem foolish, but history does show, three of the five multiple winners won in back-to-back years. I am picking Sergio Garcia to become the fourth to win back-to-back Byron Nelson championships.

The Cleveland Browns Won the Draft

Pasted Graphic
You may already be thinking: "Of course the Cleveland Browns had a great draft! They had the number one pick! Myles Garrett was the obvious move! You can't screw that up!"

You haven't been keeping up with the Browns, have you?

Cleveland picked a defensive end from Texas A&M who was so respected in College Station, two assistant coaches came to his draft party in Arlington to present him with a framed jersey (Garrett is also the Aggies' first-ever number one overall pick). During the combine, as
NFL Research pointed out, Garrett is:

  • Taller than Julio Jones
  • Heavier than Rob Gronkowski
  • Quicker than Devonta Freeman
  • Faster than Jarvis Landry
Cleveland could have drafted a quarterback like Mitch Trubisky or DeShaun Watson, but instead went with the pass rusher. Nothing is a guarantee when it comes to who will have the best NFL career, and the Browns have had failures with top picks in the last several years (i.e. Trent Richardson, Johnny Manziel, Justin Gilbert, etc.) What matters here is how much value the Browns acquired simply with moves they made in the draft.

NFL Draft charts have been around since
Jimmy Johnson and the Dallas Cowboys popularized their own in the 1990s. As sports analytics have become more commonplace, others have come out with their own. But one that is worth noting is a chart by Michael Schuckers of St. Lawrence University. Using games started, Schuckers used a LOESS function to assign value to each pick (to read his entire paper, click here). Here is the table he came up with:

Pasted Graphic 1

What Schuckers extrapolated from his study was that teams tend to overvalue earlier picks and undervalue later ones. The Cleveland Browns seemed to believe the same thing, and stockpiled multiple draft picks in the last couple of years. Here are the trades they made and how much value they acquired, using the chart:

Pasted Graphic 2

Note: + is a second round pick to be determined
++ is a first round pick to be determined

Because two of these picks are undetermined, I used the lowest possible value and added that to the Minimum Known Value Added column, when applicable. Even by doing that, every move the Browns made added value to their draft class. Here is who the Browns drafted last year and how many games they started, in parenthesis:

  • WR Corey Coleman (10)
  • DE Emmanuel Ogbah (16)
  • DE Carl Nassib (3)
  • OT Shon Coleman (10)
  • QB Cody Kessler (8)
  • LB Joe Schobert (4)
  • WR Ricardo Louis (3)
  • S Derrick Kindred (5)
  • TE Seth DeValve (2)
  • WR Jordan Payton (0)
  • OT Spencer Drango (0)
  • WR Rashard Higgins (0)
  • CB Trey Caldwell (0)
  • ILB Scooby Wright III (0)
Combined, this draft class has 61 starts. Yes, this draft class was part of a 1-15 team, bad enough to acquire the top pick in the 2017 draft, but these rookies beat out more experienced players, so it might be safe to say Cleveland did not have much talent before this approach.

The Browns drafted 10 players this year, and currently have a dozen picks for next year's draft. Myles Garrett can be a complete bust, and the Browns have enough insurance, in the form of younger players, to keep going. But if Garrett is as advertised, not only will the Browns have won this year's NFL Draft, they will start winning a lot more games.

Updates to the Site

Pasted Graphic
In the coming weeks you will see a few minor design changes to Inside Sports Analytics. There are a couple of things we have done already. The first is we have added a lot of new photos to the Photo Album that features my journey covering the Dallas Cowboys, NASCAR, college sports, that Browns mascot, etc.

The other change is more of a Call to Action. We are always looking to promote good analytic research. Already we are including
QuantCoach, a site devoted to analytics in NFL coaching. If there is a series of white papers, blog, anything you would like for us to include in our Resources page, please send me an email under Contact Edward or send me a tweet @EdwardEgrosFox4.

Thanks again for visiting Inside Sports Analytics! We'll return to journalism in our next post!

Which Golfers Dominate Where

Pasted Graphic 1
Jordan Spieth was bound to win the plaid jacket at Colonial Country Club. In the three previous times he played the Dean & Deluca Invitational, he finished in the top 15 every time, including a second-place finish in 2015. Spieth mentioned how much the win meant to him because it was a course and tournament he grew up attending.

Outside of Tiger Woods’ heyday, there often seems to be some randomness at the top of the leaderboard of any event. However, like with Spieth at Colonial, some golfers dominate specific courses and tournaments because they simply know it better.

I looked at 15 of the more lucrative tournaments in the world and analyzed how the top 25 in the Official World Golf Ranking faired at each one for their entire careers (I will analyze 46-year-old Phil Mickelson later because he has played much longer than everyone else in the group). Using a top ten finish as the qualification for success, here are six of the more current dominant performances:

Pasted Graphic 3


By this ranking, the most current dominant performance at particular course belongs to Dustin Johnson when he plays at the Genesis Open (at Riviera). Out of ten appearances, he’s had a top ten finish seven times (and won it outright this year).

What should also stand out is how frequently Rory McIlroy appears on this chart. He has become one of more successful golfers in the world by consistently performing well at specific tournaments, including the Wells Fargo Championship, the WGC-HSBC Champions and the PGA Championship. He has also had a high rate of top ten’s at the U.S. Open, WGC-Dell Match Play and Bridgestone Invitational.

It is important to note this chart groups tournaments together, not necessarily the courses. It makes Jason Day’s work at the U.S. Open perhaps more impressive, considering every top ten finish for that major has happened at a different course.

As for Lefty, his favorite tournament might be Wells Fargo, where he’s had top ten finishes 69% of the time. His second-most dominant is the Masters, at 63%. While much is made of his oh-so-close victories at the U.S. Open, only 38% of the time he cracks the top ten.

You may be wondering why Jordan Spieth failed to make the chart. After all, he’s finished first or second in every Masters appearance. In all of the lucrative tournaments analyzed, he has far fewer starts than most everyone else. However, at many of these events, he is on pace to be as dominant at the Masters, Tour Championship and WGC-Bridgestone Invitational, as he already is at Colonial.

(Special thanks to ShotLink for providing the data)

Are We Witnessing the Best Golf Ever?

Last January, Adam Hadwin shot a 13-under 59 at the CareerBuilder Challenge in California. Though it’s a dream scorecard, sub-60 is no longer a rarity. Just in the week prior, Justin Thomas posted a 59 at the Sony Open. Last August, Jim Furyk carded 58 at the Travelers Championship. Of the nine sub-60 round in PGA Tour history, three of them have happened in the span of roughly six months, out of 87 years of pro golf (in more than 1.5 million rounds of play, last I counted).

Because the odds are infinitesimally small these low rounds are by chance, it is safe to say golfers are improving. Equipment, athletic ability and coaching all play a part. But with several months left in the season, can we predict, right now, we are about to witness the best golf ever played?

Let’s first consider scoring average over the last 20 years, specifically, the median scoring on Tour:

Pasted Graphic 1

We had been seeing a significant decline in scoring beginning in 2007—with some fluctuation—but overall lower figures as recently as last year; however, so far this season, an uptick. What makes the higher median score so interesting is how much easier the early tournaments are, compared with the rest of the schedule.

Even for individual seasons, it will be difficult for anyone to match what Tiger Woods accomplished in 2000 and 2007. In both years, he finished with the lowest scoring and adjusted scoring average, ever, with a 67.79. This year, after the CareerBuilder Challenge and all of those historically low scores, even with the 59’s, the lowest scoring average was 68.715, roughly one stroke worse than Tiger’s.

Of course, devious course designers can always stay one step ahead and adjust conditions to keep scores from approaching zero (e.g. Tiger-proofing). Other statistics could better highlight if today’s golfers are indeed the best ever. However, metrics off the tee like driving distance has remained relatively steady over the last several years, though some tournaments show professional golfers are becoming more aggressive than ever before.

Where there might be significant improvement involves the less glamorous approaches and short game. Though the top Greens in Regulation percentages have hovered around 72% each season, this year the best is 75.69%, held by Jordan Spieth. More golfers can finish a hole with one putt. The best could have roughly 44% one putts for a season. In 2017, seven golfers have more than 44% success rate with one putts. But again, it is worth noting how much easier the start to a season is; these golfers have not faced the toughest challenges like The Players, the Barclays and any major championship.

What seems to be happening is not the next coming of 2000 Tiger, but rather, more golfers improving at roughly the same time at roughly the same rate. There are still milestones yet to be reached, like someone shooting a 62 for one round at a major, or less notably, a golfer carding 254 for a 72-hole tournament. There have been more golfers flirting with breaking these records in recent years, but no one has broken through. Sub-60 rounds are happening at easier courses where scores are lower and competition is not as fierce. But because fields are becoming saturated with similarly talented players, some of the better golfers still have to find other events to play. When they do, the occasional golfer could be poised to achieve that coveted 59.

If you believe talented playing partners and deeper tournament fields naturally make an individual golfer better, then the play we will witness this season could very well be the best we have ever seen. There may not be the lone star of golf, but a hodgepodge of pros who will make 2017 something to behold.

Will Jordan Spieth Win a Major in 2017?

Pasted Graphic 4
Leave it up to the U.S. Open’s official twitter handle to place tongue firmly in cheek when it comes Jordan Spieth’s victory at the Australian Open being a sign of things to come: “We all know what came after @JordanSpieth’s first #AusOpenGolf win...” followed by a photo of him holding the major’s championship trophy. In other words, only in the years he won the Australian Open did he win majors.


At the time of publication, no major tournament participants have withdrawn based upon this logic.

There are sounder ways to predict if Jordan Spieth will earn his 3
rd career major this year like momentum. Perhaps surprisingly, in a few ways, Spieth performed better in 2016 than he did in 2015, despite not winning any majors last year. We can illustrate this idea using “Strokes Gained” statistics:

Pasted Graphic 3

For those new to “Strokes Gained”, it simply means how many strokes a player gained or lost, compared with the rest of the field, based upon how they played in the four areas: off the tee, approaching the green, around the green and putting. Spieth was actually a better putter in 2016, it was primarily his iron and hybrid clubs letting him down. Fortunately for Spieth, putting is a better predictor for overall success than other phases of the game, so as long as he can continue improving in close range, he has opportunities.

Next, let’s look at each individual major, beginning with the Masters. When looking at a host of variables, there is no better predictor for future performance than past success. It is why I publicly predicted Spieth to win the green jacket last year, and I would have gotten away with it had it not been for that pesky Amen Corner. Still, nobody has played better at Augusta National the last three years than Spieth, so he is in the best position to win there again.

This year’s U.S. Open will be at Erin Hills. It is listed as 7,823 yards, which would be longer than any PGA Tour event played last season. Though Spieth is not one of the longer drivers on Tour, his U.S. Open win was at Chambers Bay, almost as long as this year’s event. Spieth’s advantage was he knew how to putt on the unique fescue greens better than most everyone else. This setup might pose problems.

Royal Birkdale will host The Open, a shorter links course. Perhaps one of the more underrated qualities of Spieth’s is his ability to play links courses well, compared with other Americans. As long as the momentum is there over the summer, Spieth can also contend there.

Finally, the site of the PGA Championship is Quail Hollow Club. It has hosted the Wells Fargo Championship since 2003. Predictably, familiarity with a course has helped Spieth over the years, but he has only played that tournament once, in 2013 when he finished tied for 32
nd. There may simply be too many other golfers with more knowledge of the course for Spieth to have a realistic chance.

Spieth already has a few Top 10 finishes in 2017, including a victory at Pebble Beach. In the last few months, he helped the Americans claim Ryder Cup win, earned an Australian Open victory and is 2nd on the Tour in greens in regulation percentage (one of the areas that was in need of improvement). His Strokes Gained: Putting has not been as strong this year, ranking 37
th, but a few golfers ahead of him have played more tournaments, so it remains too early in the season to suggest there might be a problem.

Because of the deep fields of majors, the odds are better “not” to predict any one golfer to win one of the big four. But for Jordan Spieth, there are enough reasons to believe he can capture another green jacket, win his first Claret Jug, or both.

Subscribers of the Aussie Open theory would agree.

2017 Sloan Sports Analytics Conference

Pasted Graphic
Another installment of the Sloan Sports Analytics Conference has come and gone. More than 3,500 were estimated attending the proceedings, learning and offering their latest research in the sports analytics world. While football and basketball are often the most popular sports here, there seemed to be a noticeable effort to highlight the quantitative strides made in other sports.

One panel featured golf analytics, led by Golfweek's
David Dusek, who highlighted the success stories of these quantitative tools. Jeff Price, Chief Commercial Officer of the PGA, offered an example of Team USA at the Ryder Cup. At Hazeltine National Golf Club, long par 5's meant emphasizing wedge play. It's this discovery that helped the Stars and Stripes to a decisive 17-11 victory.

On a more personal level, current professional golfer Jason Gore explained how to turn research into actionable results.

"When I talked to a sports psychologist, Fred Astaire would [practice and] put chalk on the floor," said Gore. "But once he grabbed Ginger's hand, he never thought about the chalk on the floor."

There is still room for growth.

"We're in the first inning of the data revolution in golf," said
Arccos Golf CEO Sal Syed.

Dusek pointed out some major tournaments like the Masters and United States Open still do not provide the media with advanced statistics.
15th Club CEO Blake Wooster says the potential is there to analyze how golfers perform under pressure. Lastly, the group seemed to agree lasers should be used to measure distance more accurately. Even Gore believed lasers used by caddies could speed up pace of play.

Pasted Graphic 1

The seminal football panel of this year's conference was unabashedly endorsing the concepts of its own sport's analytic revolution. It was even subtitled "Please Stop Punting", a concept where going for it on 4th down
yields more expected points and discounts a more traditional idea valuing field position.

Almost immediately, Baltimore Ravens offensive lineman John Urschel, who is pursuing a Ph.D. in mathematics from MIT, discussed a common situation he says coaches get wrong. When a team trails by 14 late in a game and score a touchdown, he says it is better to go for two than kick the extra point. The reasoning is, two points essentially give you the win with another touchdown, but even if unsuccessful, you can go for two again and achieve a tie, and because most teams convert two-point attempts 50% of the time, you are at least giving yourself a better chance at winning, with a small chance at needing a third score of some kind.

Mike Lombardi, former football executive and current analyst for Fox Sports, says analytics help with time allocation throughout the week, knowing what coaches should communicate with players and which statistics are important in determining the outcome of a game, such as 3rd down red zone defense.

"You don't establish the run, you establish the lead," said Lombardi. "Teams with the lead at halftime frequently go on to win," citing last year's Super Bowl champion Patriots as the top team with wins after leading at halftime, then citing the second-place team, the Super Bowl runner-up Falcons. The players, which included former Patriot Tedy Bruschi, explained how halftime is all about adjustments, but that they should take fewer than five minutes to implement.

From a front office perspective, analytics can help decipher if trading players and draft picks make fiscal and qualitative sense.

"The toughest thing to do in sports is to know what you're trading. It's why the Patriots won't trade [backup quarterback] Jimmy Garoppolo," said Lombardi.

Football discussion was not confined just to that panel. A couple of talks featured fantasy football and if there are things to give analytic players an advantage. Here are some tips from Tauhid Zaman, the KDD Career Development Professor in Communications and Technology at MIT, and Renee Miller, a neuroscientist at the University of Rochester:

- When picking a quarterback, get one or two of his receivers as well.

- Avoid players who cancel each other out, like a defense against one of your offensive players.

- We weigh football players' performances at the start of the year too heavily. Instead, looking at the bigger picture of their performance.

- Be careful of overconfidence: "The more data we have, the more confident we become in our decision making."

Pasted Graphic 2

Lastly, in basketball, while guys like Luis Scola seemed to get most of the attention from hoops fans, maybe the most direct knowledge given came from Seth Partnow, Director of Basketball Research for the Milwaukee Bucks. In his talk, "Truths and Myths of the Three Point Revolution in Basketball," Partnow offered the following bulletpoints:

- Defensive three-point shooting percentage is a useless stat because of the noise involved (good defenses prevent the shot).

- Long range shots in the NBA do not lead to fast breaks, it's shots around the rim that cause these.

- Ten of the last 12 NBA champions ranked in the top ten in three-point shooting.

As robust as this research might be, it does not offer a glimpse into the future of basketball analytics. However, one panel discussed solely how the sport will evolve thanks to quantitative tools. There may still be blowback from coaches and those who approach the sport more traditionally.

"When you're working with the [NBA] Draft…you end up trying to convince coaches," said Dean Oliver, a statistician who worked in the front offices of the Sacramento Kings, Seattle SuperSonics and Denver Nuggets. "You don't expect to win 100% of these arguments and that's fine."

Using analytics, a couple of panelists offered simple suggestions for improving the game. Former NBA player and coach Vinny Del Negro wants the league to add a fourth referee because the pace of the game has gone up and it is getting tougher for officials to keep up. WNBA point guard Sue Bird wants to get rid of the shootaround because of the rest players need and the lack of proof shooters develop a rhythm because of this routine. She also wants the analytics to assist in the psychology of a team.

"If I were a general manager, I'd want to know if [players] retain information well and how they handle things under pressure," said Bird.

The flexibility of these tools, spanning different sports and perhaps different fields of expertise, perhaps proves why this conference has lasted as long as it has.

Pasted Graphic 3

(All photos courtesy of Sloan Sports Analytics Conference).

The Art of the Comeback

Pasted GraphicLast November, arguably five million people attended the Chicago Cubs victory parade, celebrating the team's first World Series Championship since 1908.

Last Summer,
Cleveland hosted hundreds of thousands of Cavaliers fans to celebrate that franchise's first title and the city's first pro championship in more than half a century.

This year in New England, they constantly win. We move on.

The common storyline among these three winners is "The Comeback". The Cubs overcame a 3-1 deficit in the World Series to claim their championship in an extra-inning Game 7, the Cavaliers also stormed back from down 3-1 in the NBA Finals and the Patriots trailed Atlanta by 25 in the second half of Super Bowl LI, to win in overtime. These comebacks were also nearly unprecedented.
Only five teams had come back from down 3-1 to win the World Series before the Cubs. Cleveland became the first NBA team to overcome a 3-1 deficit in the Finals to win. And, New England's 25-point comeback win is the largest in Super Bowl history. The second largest ever is merely ten points.

This confluence of sports drama may seem like supernatural intervention, but perhaps it can be explained in earthlier terms. In 2011, Brian Skinner published "
Scoring Strategies for the Underdog: A General, Quantitative Method for Determining Optimal Sports Strategies". Skinner explained how underdogs must call riskier plays to have a chance at success. In this case, we can refer to teams significantly trailing in series and games as underdogs when their probability of winning is significantly below 50%. Calling riskier plays might mean getting shellacked, but by finding specifically how much riskier a team should get, it might be the only way for those trailing to win.

Baseball closers are niche pitchers, often asked to pitch only one inning, with his team holding the lead. Aroldis Chapman, the Cubs' closer, came in to pitch 2.2 innings in Game 5, 1.1 innings in Game 6 and 1.1 innings in Game 7. Chapman had one day of rest and pitched Game 5, another day of rest before Game 6 and no days off in Game 7. While he did allow three earned runs in the last two games, Maddon believed the risky strategy of extending his closer was the only way to overcome his 3-1 deficit. Chapman did allow runs, but it left other relievers fresh for longer games. Hitters were also asked to swing for home runs, not mere singles or doubles. The Cubs ranked 13th in home runs last season, but in the World Series, they recorded at least one home run in games five, six and seven, en route to their title.

In basketball, Skinner's paper discussed two key concepts pertinent to the Cavs: how often to shoot 3's and when to stall. The logic in the first case is, depending upon how many possessions are left in the game, a team should resort to shooting triples when reaching its critical threshold. In the regular season, Cleveland ranked 7th in the NBA in three-point shooting percentage and 3rd in three-point shooting attempts, but going up against the Golden State Warriors who ranked first in both categories. The Cavs' two of the three highest rates of three-point shooting in that series
happened in games 6 and 7, two must-win games. As for pace, while Golden State had the second most possessions per 48 minutes in the NBA, Cleveland ranked 27th out of 30 teams. However, the Cavs played a faster pace for games 5 and 6, both resorting to a style more like the Warriors and not shortening the game like it is suggested for underdogs. It is worth noting there was a slower pace for Game 7, the most dramatic in the entire series.

Lastly, the Patriots helped themselves and the Falcons maimed themselves because of risk-taking.
Once Atlanta led 28-3, New England resorted to 40 pass plays (including sacks) and just 10 rushes. Before the deficit, the Patriots passed the ball 34 times and ran it 15 times, relying significantly more on the ground attack. Also, some of Brady's longest completions occurred in the 4th quarter during the comeback. Defensively, Matt Ryan and the Falcons leaned towards passing more frequently in the final minutes than sticking to the ground game, which would have taken more time off the clock. Perhaps the most egregious example was when Atlanta had the ball at the New England 22-yard line with 4:40 left in the game and leading by eight. Instead of running the ball three times and going for a two-possession lead, a sack, a pass (wiped away by offensive holding) and an incompletion took the Falcons out of field goal range AND gave Tom Brady 3:30 to tie the game. Overall, even play-count disparity factored into the outcome; Brady kept the Falcons' defense on the field and Ryan could not give his teammates a break.

Teams in any sport can calculate when it is time to run riskier plays. Many recent and high-profile examples suggest comebacks are more possible than ever before, when the right tactics are implemented.

There is a postscript: win probability charts have become more popular than ever. But these games and series show something seemingly calculated to have a .7% probability of happening can occur. Because underdogs can increase their own variance with their playcalling, perhaps these charts need to be updated in some way. Fortunately, this discussion is ongoing.

A New NCAA Tournament

UNADJUSTEDNONRAW_thumb_10d3
There's no doubting the increased awareness of analytics in predicting the NCAA tournament field in college basketball. Instead of just diagnosing a team's record against the Top 50, it's Rating Percentage Index or Ken Pomeroy rankings, that are becoming more commonplace. It has gotten to where data scientists are actually meeting with the NCAA to determine if one metric should be used above all others to pick tournament teams.

Perhaps surprisingly, data scientists want simpler criteria for picking teams: who wins, who loses and who have you played. This is opposed to other explanatory variables used in more advanced metrics, like margin of victory and offensive/defensive efficiency. Coaches, on the other hand, would prefer more complex formulae for determining the tournament field. Logically, this approach makes more sense from their perspective, because of competition. If a coach has figured out a style of play or way to schedule opponents that increases the likelihood of making the tournament, they develop a competitive advantage. Data scientists want to keep it simple for fans, coaches want a figure out a competitive advantage.

Perhaps in this same spirit of transparency, the tournament selection committee released "in-season" projections for the first time ever, one month before Selection Sunday. It only has the top four seeds of every region, but it is added information for where highly ranked teams really sit. As with any analytic project, more data "usually" means more robust forecasts. Already, it is easier to make more accurate assumptions and offer a better glimpse as to what the committee is looking for.

However, these in-season projections do not include the full field of 68, and what usually causes the most consternation is simply who does and does not make the dance. While it makes sense not to include the full field because you have to assume certain conference champions in mid-major conferences, something that would include all "at large" teams would provide even more information as to the criteria for inclusion.

Nothing is easy about picking 68 teams to play in a tournament, and while analytics may be helpful in forecasting a Final Four, easy-to-understand criteria can help teams and fans quell any controversy.

Who is the NFL MVP?

Pasted Graphic
This year's NFL MVP race is uniquely interesting. Many believe New England Patriots' quarterback Tom Brady deserves this honor, despite missing four games for a controversial deflated football scandal from a few years ago. No matter your opinion as to if Brady deserved to be suspended, it is worth noting, few MVPs have missed games during the regular-season. Players like Emmitt Smith and Aaron Rodgers missed a game or two, but four games is a full quarter of the season and requires a number of assumptions as to if Brady would have played as well as anyone during the stretch he missed.

Before going over these assumptions, let's first look at the history of the award and who else are viable candidates this season. Since the Associated Press began handing out MVP honors, 18 of the recipients were running backs, 40 were quarterbacks and 3 played other positions. The most accomplished running back this season was Ezekiel Elliott. Not only does his 1,631 rushing yards and 15 touchdowns outshine other running backs this year, they outdo others who were proclaimed MVP. Because no one at any other position seemed to stand out, Zeke is the only "non quarterback" worth mentioning.

As for the gunslingers, if you go by passer rating, QBR (
quarterback rating), yards per pass attempt (as well as net yards and adjusted net yards per pass attempt), and passing touchdown percentage, the winner is Atlanta Falcons' quarterback Matt Ryan. New Orleans Saints' QB Drew Brees does have an edge over Ryan in terms of total passing yards and completed passes, but efficiency metrics almost always list Ryan higher than Brees. Brees also did not "lead" his team to the playoffs, something nearly every MVP has done in the past. But this exercise is about Tom Brady and if his numbers would have been superior to Ryan's had he played the entire season.

The simplest way to answer this question is to take proportions of Brady's stats and add them to what he did accomplish and see how they measure up:


Pasted Graphic 1

Pasted Graphic 2

Pasted Graphic 3


By these proportions, Ryan would've still had more passing yards and touchdowns than Brady, though the Patriot would've still had fewer interceptions. However, this exercise assumes opponents are of equal quality, which we know is not true. What we should do is examine the opponents Brady did not play and project what his numbers would have been in those games. New England's first four opponents were Arizona, Miami, Houston and Buffalo. Their combined records are 33-30-1 (Atlanta's first four opponents were Tampa Bay, Oakland, New Orleans and Carolina, with a combined record of 34-30, just one tie better). Brady missed out on the 2nd, 4th, 6th and 15th best passing defenses in the NFL, using passing yards defended as the barometer. Averaging their defensive numbers, that group allowed 219.5 yards and 1.4 touchdowns. To put those numbers in perspective, for the dozen opponents Brady did face, that group allowed 233.4 yards and 1.6 touchdowns.

In other words, the foursome Tom Brady did not play featured significantly better passing defenses than the dozen he did go up against. Given this logic, it is safe to lower Brady's numbers even more than what was projected, which was worse than Matt Ryan's.

Two more things to consider when comparing these two quarterbacks. First,
Pro Football Reference says the Falcons' strength of schedule was significantly tougher than the Patriots' (18th vs 32nd, respectively). It also has its own way of determining Approximate Value of each player as an attempt to show how important they were to a team's overall success. Without getting into the specifics, Ryan led the NFL with 21, Brady was 13, and he would have had to achieve a lot to make up that ground in the four games he missed.

Again, no matter if you believed Tom Brady was unjustly punished for Deflategate, it is unlikely he would have posted better statistics than Matt Ryan. Even though Ezekiel Elliott did have a stellar rookie campaign, his numbers were not historic for any running back. It is Matt Ryan who deserves to be this year's Most Valuable Player.

How Predictive Is Scoring Differential?

Pasted GraphicHow important is an impenetrable goalie in the NHL? How much better is it to outscore opponents throughout the season, as opposed to dominating them defensively? Overall, how important is point differential to overall success?

In an earlier blog post, I discussed
playoff unpredictability when it comes to determining who will win a championship based upon how many games that team won. There, the NBA was the most predictable, then the NHL, NFL, then MLB is the most unpredictable (unless, of course, you are the 2016 Chicago Cubs). But how does point differential (or run differential in baseball or goal differential in hockey) translate to winning championships? And which league is most predictable when looking at that specific metric?

Once again, I am using
logistic regressions using one explanatory variable and if that team won a championship as the dependent variable. However, this time I am using three per sport: offensive output, defensive output and scoring differential. Also once again, here is what is noteworthy with our datasets:

- All data used begins with the 1989-90 season because the NFL had the biggest chance to its playoff format at the turn of the new decade.

- Any season in any sport where a lockout shortened the number of games played considerably was removed (e.g., the 1998-99 NBA season, the 2012-13 NHL season, etc.)

- Though the NHL played 80 and 84 games in a few of these seasons, these numbers are not significantly different from the 82 played the rest of the dataset, so they are still used.

Each explanatory variable has the appropriate and logical coefficient. In other words, scoring variables have a positive coefficient, defensive variables have a negative coefficient and scoring differential variables have a much larger positive coefficient. All of this equates to a better probability of winning a championship. Each variable is also statistically significant with 95% confidence, which is to be expected. A better offense, defense and scoring differential will obviously increase the likelihood of winning a championship. What is not clear is which of these indicators is most predictive.
A goodness-of-fit measure called AIC (Akaike Information Criterion) can shed some light. As this number gets smaller, the model has a better fit, explaining away more of the randomness of that sport.

The first chart is points (or runs or goals) modeled against championships:

Pasted Graphic 1

Before analyzing this chart, it is important to note the value of each point, goal and run, compared with the other sports. In 2016, the average MLB team scored 726 runs for the season. This number is different from the 325 points scored, on average, for an NFL team in 2015, the 8419 points scored for an NBA team for last season and the 222 goals scored for an NHL team for last season. Fortunately, the variation across each league is not so substantially different to where comparison becomes impossible.

In the chart, we see goals in hockey as being the best predictor for winning its championship, with football being slightly more random, then basketball, then baseball finishing as the most random. So far, these results are consistent with the previous study where MLB's postseason was the toughest to predict, based upon number of wins during the regular season. Basketball makes intuitive sense because teams play at different paces, and it is not conclusive if playing at a faster rate—which scores more points but not necessarily more points per possession—is the best way to win a title.

The next chart illustrates runs, points, and goals allowed, modeled against winning a championship:

Pasted Graphic 2

Comparatively, the trends are almost the same as they are with offensive output: Major League Baseball is the most random, followed by the NBA. However, an NFL scoring defense is now a better indicator than an NHL scoring defense, but only slightly so.

Now, let's combine these two charts into scoring differential, modeled against a championship:

Pasted Graphic 3

Here, we learn point differential is more predictive in basketball than in any other sport. Remember how different teams playing at different paces obscures the importance of points alone? Including the defensive component erases pace of play and gives a clearer predictor. It also coincides with how a win total in basketball is most predictive for winning a championship. Football and hockey are nearly equal in predictive ability and baseball is a distant fourth.

There are more trends to uncover if we combine all of these charts:

Pasted Graphic 4

In nearly every sport, scoring defense is more predictive than offense (with hockey being the lone exception). Scoring differential is predictably better for analysis than offense or defense by itself, but the degree to which it takes away the randomness is different for each sport. It is only a slight improvement in the NFL, but a drastic improvement for basketball.

Overall, these proportions could prove helpful when determining if a team is going in the right direction when devoting resources to offense and defense. Both are necessary, but perhaps more money should be proportionally allocated to the areas that best predict who will win a championship.

A Unique Cowboys Perspective

Screen Shot 2016-10-30 at 2.59.44 PM
The Dallas Cowboys are constantly watching film and studying the playbook for that added edge. Their fans also want to know anything that can help explain why their favorite team won or lost, and if there is a way to forecast how they will do and where they need to improve. Our newest data visualizations hope to do all of the above.

Before and during every Cowboys game, I will post on my various social media accounts some analytics that explain what is going on and predict what will happen. After the game, I will have one summary detailing what happened, using explanatory variables that are the best indicators for the outcome of any football game. Here is some extra information for each highlighted variable:

  • Turnovers are perhaps self-explanatory and the team with the better turnover ratio has a significant advantage.
  • Scoring efficiency goes beyond just the scoreboard. It's a ratio of (offensive yards/points). A team may have moved the ball but failed to score many points when near the end zone, so they were inefficient. Not only can each team's efficiency be compared, but each bar has a color: red for bad, blue for average and green for good. Respectively, these quality ranges are: 0-12, 12.01-18.5, 18.51-. These ranges came from the last ten years of NFL data, provided by Pro Football Reference.
  • The ratio (time of possession/rushing yards) looks at who was controlling the game effectively. Time of possession is not an effective indicator for success, but how well a team controls the ball while on offense is. The team with the better ratio earns the checkmark.
  • Overachiever/underachiever is a way to look at how well a team is doing for the season, relative to its point differential. In other words, if a team is has a strong record but all of their wins are close, they are overachieving. If they suffered a number of losses but they have been close, they are underachieving. This idea is calculated using a Pythagorean Expectation formula, something more commonly used in football: ((Points for^2.37)/(Points for^2.37 + Points against^2.37)). This winning percentage can then be multiplied by the number of games played to show where a team "should" be with its record.

Periodically there will be additional metrics to explain why the Cowboys won or lost, such as net passing yards/attempt, which takes into account sacks and incompletions as well as how many passing yards each quarterback is able to accrue. As more metrics become readily available, this summary will include them. To see these visualizations in real time, follow me:


Special thanks to
Fuzzy Red Panda for putting together these beautiful images and programs that advance sports analytics in such creative ways.

Screen Shot 2016-10-30 at 3.22.31 PM

Go Cubs Go

Pasted GraphicIn just a few days, Wrigley Field's iconic scoreboard will showcase a World Series for the first time in more than seven decades. A franchise with questionable management and horrible luck has finally come within four wins of its first world championship in more than a century.

The Cubs have fielded formidable teams that have made the postseason, but never have they won the NLCS until this year. Often postseason baseball can be so unpredictable that it is difficult to explain why the Cubs could not reach the World Series until now. But there are some trends that predict success in playoff baseball, that does not have as great an impact in regular-season baseball.

While I have written a paper about this and have applied those lessons to the Texas Rangers in a previous post, I would like to look at alternative research. In the book "Baseball Between the Numbers", three qualities are listed that best determine postseason success:

  • Pitcher Strikeout Rate
  • Fielding Runs Above Average (FRAA)
  • Closer Expected Wins Over Replacement Pitcher (WXRL)

The Cubs finished 3rd in the majors in strikeout percentage and strikeouts per nine innings (the Dodgers finished first in both categories, the team Chicago beat in the NLCS). Fangraphs uses a metric called
Ultimate Zone Rating to calculate fielding, and listed the Cubs as the best fielding team this season. Lastly, the Cubs finished 19th in reliever Wins Above Replacement, but keep in mind, the team traded for Aroldis Chapman late in the season.

It is also worth nothing, the Indians had high rankings in all three of these categories as well (5th, 4th and 7th, respectively). While the matchup should make for a fantastic World Series, given how the Cubs have properly built this team for a postseason run, it should not come as a surprise if they can end this 108-year streak.

No Range for the Texas Rangers

IMG_5937It's hard not to catch shortstop Elvis Andrus smiling these days. His Texas Rangers go into the postseason with home-field advantage all the way through the World Series—while finishing one victory shy of a franchise record for most wins in a season—and boasting the most wins at home in the American League. Elvis himself finished the regular-season as a .302/.362/.439 hitter. And yet, a few sabermetricians have spoken out, saying not only shouldn't the Rangers be one of the favorites to win the World Series, their success is virtually fraudulent.

It involves
Pythagorean Expectation. This is the often-cited formula baseball guru Bill James invented to estimate how many wins a team "should" have based upon how many runs they scored and allowed. Since it became commonplace, the formula has worked quite well explaining why teams are thriving and struggling. Even this season, the formula explains all but a handful of wins or losses for every MLB team. The one team the formula has done the poorest job with, is the Texas Rangers.

For much of the season, this team's Pythagorean W-L hovered around .500. The Rangers finished 13 games above what was expected, at 95-67. Why? The Rangers were 36-11 in one-run games (the .766 winning percentage is a record in modern baseball). They were also 18-24 in games decided by 5+ runs. In other words, the Rangers won a lot of close games and lost a lot of blowouts.

This large of a discrepancy is unprecedented in the last decade for the Rangers:

Pasted Graphic

The Rangers have performed roughly what was expected, given their runs scored and allowed. But the last two years this team has over-performed. It might be a coincidence those were the two years Jeff Banister has been the manager of the Rangers, but maybe not. Banister has a history of evaluating players and looking at skills during blowouts. He is certainly not the only manager to have this approach, but it is possible he takes it to the next level. Two years is not sufficient data to make such a conclusion, but it is a noteworthy trend to consider.

So how accurate is this formula when predicting if the Rangers will win the World Series? Not very. Since 1969,
11 teams out of 47 had the best Pythagorean Expected record and went on to win the World Series. In fact, the likelihood has decreased since the postseason expanded. Many conclude the postseason is almost impossible to predict, though there are the trends to consider that are helpful. Most notably, "Small ball" seems to be a more successful approach in the postseason than the regular-season. Among teams in the postseason, the Rangers rank 3rd in stolen bases, 5th in sacrifice flies and 3rd in hit by pitch (they are however last in walks and almost last in sacrifice hits).

If you believe the Rangers will eventually regress to the mean given this disparity, it has not happened through 162 games, so statistically nothing suggests this trend will automatically change after another 19 games. In a way, the Texas Rangers have just as good a chance to win the franchise's first world championship as anybody, and that smile from Elvis Andrus will be even wider.

Who Wins the FedExCup?

The PGA Tour will award its tenth FedExCup by week's end. This event has not attracted the same fanfare as majors or even other regular tournaments. The TOUR Championship is held during football season, has only been around for a decade and has a scoring system that has changed even in that short window. Still, with $10,000,000 in bonus money on the line for the winner and the best players of the season in the field, it is worth the exercise of predicting this year's champion.

Historically, there has been little fluctuation when it comes to who wins the FedExCup, based upon his ranking the prior week:

Pasted Graphic

Seven out of nine winners were ranked 5th or better heading into the final tournament. This trend bodes well for Dustin Johnson, Patrick Reed and Adam Scott. However, seven out of nine won the tournament and went on to capture the cup, so much of this prediction exercise involves who will win at East Lake Golf Club just as much as it does forecasting the rankings afterwards.

The course has a par 70, uses Bermudagrass and is 7,385 yards long. Last year, it was ranked the 17th toughest course by score, out of 52 tournaments (and again, this tournament features only the top 30 ranked players in the FedExCup standings). As for more specific statistics compared with the rest of the Tour:

Driving Distance: 12th shortest (284.2)
Sand Save Percentage: 14th best (53.49%)
Greens in Regulation Percentage: 13th worst (62.1%)
Putting Average: 42nd best (1.742)

So far, nothing suggests this course has unique attributes that golfers have to make major adjustments for. The next step is looking at the strokes gained statistics for the last nine winners of the golf tournament, prior to the BMW Championship:

Pasted Graphic 1

Winning this tournament seems to require a complete game, though occasionally winners have had negatives strokes gained statistics in putting or driving. This idea does not necessarily eliminate anyone's chances. However, nearly all of them have needed good to great approach games, which is good news for Adam Scott (1st in SG: Approach the Green) and Hideki Matsuyama (2nd).

Often times the winner has also had a stellar World Golf Ranking, which suggests Jason Day or Dustin Johnson could win everything. Five golfers can win the TOUR Championship and hoist the FedExCup without requiring any help thanks to their point totals: Johnson, Scott, Day, Reed and Paul Casey. Given how important momentum can be for winning any golf tournament, these golfers have many reasons to feel confident about their chances.

This idea is furthered when analyzing how much of an advantage the higher-ranked players have heading into the tournament, relative to the rest of the field. Consider this: after the BMW Championship, a player's points are reset to a new number based upon his ranking (to see the updated point totals,
click here). Resetting scores gives everyone a chance to win the FedExCup, even though it wipes away any commanding leads a golfer may have had leading up to the TOUR Championship. The points earned for where a player finishes at the TOUR Championship can be found here.

One way to look at the probability each golfer has for winning the FedExCup is to look at how resetting points improves or worsens each golfer's chances. The most critical assumption in this exercise is every golfer is of the same quality and has the same abilities, so everyone has an equal opportunity to win; their probability to win the TOUR Championship is 1/30, or 3.33%. But to calculate their chances of winning the FedExCup, after resetting points, requires a more rigorous approach. Using
Monte Carlo simulation, I ran 5,000 tournaments and looked at how many times each golfer finished with the highest point total. Their probabilities can be found here:

Pasted Graphic

As expected, the lower the ranking, the worse the probability. Also as expected, if you were to draw a function to fit these points, it would be logarithmic (the R^2 is .9536 suggests this function captures almost all of the variation). Dustin Johnson has a significantly better probability to win than second-place golfer Patrick Reed. After Reed, the variation levels off. Still, in this exercise, golfers ranked 1st thru 8th have a better probability of winning than if points were completely erased, and whoever won the TOUR Championship also won the FedExCup.

No matter if you are computing probabilities using golfers of similar skill set, glimpsing at historical results or looking at abilities using advanced quantitative measures, the lesson is clear: likely looking at the top of the points list is where you will find this year's season-long champion.

A New Journalism Feature

Pasted GraphicEach week, I will air a segment on Good Day on Fox 4 in Dallas/Fort Worth that takes an analytic look inside college football. First, I look at a statistical trend inferring something we saw from the weekend before, the challenges predicting games and the secrets to being a more informed fan. Second, I use data and modeling to forecast games featuring some of the favorite teams from north Texas.

I will then post these segments to YouTube and share the links on the Journalism section here. You can click Journalism at the top of the page or
click here.

Is Jordan Spieth Struggling?

IMG_3376Even before winning two majors—and nearly two more—in 2015, Jordan Spieth was one of the more popular golfers on the PGA Tour. Then, that popularity soared when the 22-year-old set many records beginning with the phrase: "Youngest golfer to…". But with enormous popularity and early success come high expectations. This year, Spieth has not won a major, only being in contention once out of three times. He also fell out of the top spot in the Official World Golf Rankings and has three fewer victories overall. Given what he did accomplish and how he's performing now, is Jordan Spieth Struggling?

Spieth defended his record and, during his performance at The Open at Royal Troon Golf Club, felt any questions about struggling was "unfair". Per
golflink.com:

"It's been tough given I think [2016 has] been a solid year," said Spieth. "I think if last year had not happened I'd be having a lot of positive questions and instead most of the questions I get are comparing to last year and therefore negative because it's not to the same standard…So that's almost tough to then convince myself you're having a good year when nobody else really…even if you guys think it is, the questions I get make me feel like it's not. So I think that's a bit unfair to me…"

Let's take an analytical look at if Jordan Spieth is struggling by his standards and, if so, by how much. The simplest way is to look at
Strokes Gained rankings and compare last year to this year. What makes Strokes Gained so useful is pointing specifically to the parts of the game a golfer may or may not be excelling at. The following statistics compare how well Spieth has done compared with the rest of the field:

Pasted Graphic

The numbers above the bars are his rankings on Tour. What also matters here are the following equations:

Off-the-Tee + Approach-the-Green + Around-the-Green = Tee-to-Green

Off-the-Tee + Approach-the-Green + Around-the-Green + Putting = Total

First, Spieth is actually performing better off the tee, but the rest of the field has caught up. Around the green and putting have remained steady or actually improved. The glaring statistic is his approach to the green. This measures all approach shots on par-4 and par-5 holes that are NOT within 30 yards from the edge of the green and includes tee shots on par-3 holes. Spieth has gone from .618 to -.016 (moving from 11th place to 118th). This statistic is further highlighted by looking at the breakdown of his rankings compared with the rest of the field:

  • 163rd in Greens in Regulation Percentage (62.3%)
  • T107th in Approaches from 75-100 yards (17' 10")
  • T109th in Approaches from 100-125 yards (20' 5")
  • T118th in Approaches from 125-150 yards (23' 9")

This information explains the discrepancy in SG: Tee-to-Green and SG: Total. It also explains the bigger discrepancy in tee-to-green versus total, because his skill at putting is included in the total, not tee-to-green. It is also worth noting, Spieth is playing in fewer tournaments this year than last. He played in 25 last season and is only through 16 this season, prior to the PGA Championship.

Let's now look solely at majors and highlight the discrepancy in Spieth's approach game:

Pasted Graphic 1

Spieth does not have the same driving accuracy, greens in regulation numbers or sand save percentage that he did in that record-breaking year.

Here is something else to consider. Perhaps one of Spieth's strengths is adapting to links courses. PGA Tour players do not play a lot on these types of courses, and while other golfers can drive the ball farther, this skill is not an advantage on a links course. But Spieth's skills as a putter and around the green do come in handy. In 2015, the U.S. Open was on a links course. Spieth won. This year, the only two domestic tournaments that even come close to those types of conditions are the AT&T Pebble Beach Pro-Am and the Hyundai Tournament of Champions. Spieth won the latter.

What Spieth said about his game and his year requires clarification. Strokes gained statistics have helped us highlight two important things about Jordan Spieth. First, his approach game has let him down much more so than last year. Second, he is not struggling with any other part of his game and in some ways he has improved. While his fans hope Spieth would have won more tournaments this year, he still has virtually as good a chance as any to capture the final major of the season.

Who Do You Trust in the 4th Quarter?

Pasted GraphicSince being named the starting quarterback for the Dallas Cowboys, Tony Romo has been in the NFL spotlight for ten seasons and 127 games. While he has put up some of the more prolific statistics of any quarterback during this time, many argue he is the most scrutinized veteran gunslinger in the 21st century. One reason is anti-analytical: blown opportunities to win games in the 4th quarter. While many of these games have been the most critical for his team's championship aspirations, it does bring up the bigger question of which quarterbacks have been the most reliable for winning a game in the 4th quarter.

In a later article we will apply analytics and look at what constitutes a "clutch" quarterback. But first, let's look at the raw statistics. The data features 42 quarterbacks spanning all eras of the NFL but who can be considered, at a minimum, marginally successful (e.g. Peyton Manning, Warren Moon, Roger Staubach, Colin Kaepernick, etc.). The 4th quarter variables are: comeback attempts, comeback wins, comeback rate and career blown leads by the QB's own defense.

First, here is a graph of the comeback success rates:

Pasted Graphic 1

Of the quarterbacks analyzed, Andrew Luck has the best 4th quarter comeback rate of anyone (63%). However, he also had the fewest attempts, so it is too soon to call him the most clutch we have ever seen. In second place is Joe Montana (56%), who many might be more willing to admit is the best in close games. Peyton Manning had the most attempts of anyone (94), but his rate is 47%.

Then comes the aforementioned Tony Romo. His rate matches is only slightly worse than Manning's. While it is below half, only five of the 42 quarterbacks studied finished better than 50%. In fact, Romo's rate is 11th best out of 42. At the other end, the worst rate among active quarterbacks belongs to Aaron Rodgers (27%). Don Meredith has the lowest success rate of anyone at 25%.

Some of these rates can be explained by analyzing blown leads by that quarterback's defense:


Pasted Graphic 2

The quarterback dealt the least clutch defense is Drew Brees, where on 31 occasions, his "D" has blown a 4th quarter lead. Fran Tarkenton ranks second with 27. Tony Romo is tied for 10th with 17. This mark is slightly above the average among the 42 quarterback studied. As for those who have fewer reasons to be upset with their defense, there is Kurt Warner (6) and, as expected, Andrew Luck (2).

Visually and expectedly, there is already a direct correlation between 4th quarter comeback rates and blown leads by defense. Still, it is worth discovering if there are statistics for each quarterback that can help explain why some successful quarterbacks are better than others at the end of football games. I will report my findings in a future article.

Special thanks to Mark Lane for putting this data together. You can follow him on Twitter
@therealmarklane.

An Upgrade to Inside Sports Analytics

Pasted Graphic
This week we made some tweaks to the website. Some of them are literally tweaks, like adding my Instagram photos to the sidebar of the "Photo Album" pages (it's edwardegrosfox4 if you would like to follow me). My LinkedIn page is also available in the sidebar of the "About" page.

But the most exciting addition is the "
Journalism" page. Occasionally I submit sports analytic reports for Fox 4 in Dallas, the TV station for which I am the Weekend Sports Anchor. These stories are available on our station's YouTube page, and now, on this website. These stories focus on athletes and teams in north Texas but it can include major events and tournaments; it also uses the same quantitative tools the blog and podcast does.

As always if you would like to offer feedback or ask questions, please contact me through social media or by using the "
Contact Edward" page.

Yes! Go for Two!

unknownIt's an odd feeling for football fans. After scoring a touchdown, the exhilaration must be contained just as quickly as it erupted, as this same offense, grinding down the field and travailing through the defensive puzzles presented, decides to go for two. The decision is rare: during the 2015 NFL season, 1,217 extra points were attempted, but only 94 times did a team go for two (7%). In fact, five teams never attempted a two-point conversion.

Pittsburgh Steelers quarterback Ben Roethlisberger suggested this week his team should go for two, every time. Though his team attempted more two-point tries than anyone else, fewer than one-fourth of the time did the Steeler offense return to the field after a touchdown.

Traditionally, this idea is irreverent. But analytically, this idea carries merit. Because 94% of extra points were converted last year, if a team always goes for two, they only need to convert 47% of the time to push. It is worth noting, a defense can return the football the length of the field for two points no matter what is being attempted. Though this happened only once and during an extra point, it could fractionally affect this expected value even if it statistically insignificant. Lifetime, teams convert their two-point attempts roughly 50% of the time, almost exactly what they need for it to be a push.

So why always go for two if it is a push and risk injury to more valuable players? And, perhaps more importantly, would this 50% success rate hold if teams went for two more frequently? Aside from the fact there is an obvious trend NFL offenses are improving and kickers are worsening (mainly because the distance of an extra point was moved back 15 yards), the following chart illustrates two-point tries:

Pasted Graphic

As expected, the 50% success rate remains relatively consistent regardless of how many times teams go for two. However, as stated before, this is a small sample size compared with the number of times a team could have gone for two, but elected for the extra point. Usually teams go for two when almost absolutely necessary. When it is not absolutely necessary, will the success rate be the same?

It's worth finding out.

Predicting Pitching Performance

Image-1Noah Syndergaard made his Major League debut last year for the New York Mets and made an immediate impact (3.24 ERA and 9.96 K/9). While his 9-7 record may not have been overly impressive, there were signs this was only the beginning. Now, Syndergaard has multiple National League player of the week awards and is one of the more reliable hurlers in the game.

But not every pitcher lives up to predictions. How can someone better determine which pitchers will become successful the following season? One of the more intriguing presentations concerning the future of baseball predictions involved creating a pitcher projection system based upon Pitch F/X (to read the paper and/or watch the presentation, click
here). The traditional ways to gauge a successful pitcher do not always perform well when forecasting how he'll do the following year. According to this research, if next season's Earned Run Average (or Runs Averaged/9 innings) is regressed onto one of these traditional metrics, here are the following R^2:

Metric R^2
K% 0.67
SIERA 0.52
xFIP 0.46
BB% 0.45
FIP 0.35
HR% 0.18
ERA 0.14
BABIP 0.04

Strikeout percentage is the most successful traditional metric when determining future success. Here are the top ten pitchers in K% in 2015:

  • Clayton Kershaw (33.82%)
  • Chris Sale (32.08%)
  • Max Scherzer (30.7%)
  • Carlos Carrasco (29.59%)
  • Chris Archer (29.03%)
  • Corey Kluber (27.65%)
  • Jacob deGrom (27.03%)
  • Jake Arrieta (27.13%)
  • Madison Bumgarner (26.93%)
  • Francisco Liriano (26.52%)

MLB is through 1/4 of the 2016 season. As it stands, here are the top ten pitchers in K% this year:

  • Jose Fernandez (35.9%)
  • Clayton Kershaw (33.7%)
  • Noah Syndergaard (32.6%)
  • Max Scherzer (31.5%)
  • Stephen Strasburg (30.9%)
  • Danny Salazar (30.3%)
  • David Price (29.4%)
  • Vincent Velasquez (28.8%)
  • Drew Smyly (28.4%)
  • Drew Pomeranz (28.3%)

While many on the 2015 list currently rank just outside of the top ten this year, it shows two things: the difficulty of predicting pitcher success given any traditional metric and it shows just how consistently dominant Clayton Kershaw and Max Scherzer really are.

This paper discussed combining the aforementioned statistics with Arsenal/Zone rating. This metric uses PitchF/X data which tracks the speed, movement and placement of every pitch relative to the strike zone. The idea is, with more data about the specifics of each pitch a pitcher throws, the pitch sequence and which pitches are most sustainable over time, it will be easier to predict success the following season.

Data scientists should always be careful about having too much data because of overfitting. In other words, too much data and too many variables mean watering down the prediction to where it is hard to find actual trends that are meaningful. Still, this is an intriguing paper and hopefully this Arsenal/Zone rating can be more readily available to baseball fans but in an easily digestible way.

You've Drafted Your Team, Now What?

image1For three days, there is a frenetic pace to the NFL Draft, where specific needs are addressed (or not), value is appropriated for each pick (or not), opponents' draft boards are analyzed and combated (or not) and undrafted free agents are debated and signed. After all of these minuscule details, there comes a bigger picture question: What the heck just happened?

Pundits and perhaps those within each franchise grade these draft classes before anyone attends a rookie minicamp or any single contract gets signed. Though there's always the inexplicable human elements, grading draft classes is becoming easier and more analytically sound.

Last year, the Harvard Sports Analysis Collective posted a blog about
predicting success based upon combine numbers. Some positions are much easier to predict than others. For instance, cornerbacks and outside linebackers with good combine numbers tend to do well in the NFL, whereas quarterbacks and wide receivers are much harder to predict.

Value also matters. Franchises will be crippled if an average player is chosen in the first or second round, while others can be bolstered by picking stellar athletes in later in rounds. Many of these problematic picks happen when a team drafts an offensive skill player.
One research paper suggested 60% of running backs and receivers taken in the 3rd-7th rounds have better average career statistics than those taken in the first and second rounds. The trend with receivers makes sense, given combine numbers not having as much predictive power, but perhaps the trend with running backs is a sign of the times (i.e. the NFL evolving to a passing league). As for quarterbacks, the more highly touted ones tend to have better careers.

So who drafted well in 2016? Without offering clear answers, perhaps surprisingly, the perennial bottom feeders in the Cleveland Browns did well. They took defensive stars with good combine numbers like Emmanuel Ogbah and Carl Nassib and, for the most part, waited on skill players in later rounds, including Ricardo Louis and Jordan Payton. Another possible surprise is the Jacksonville Jaguars. Their first round pick was a cornerback with excellent combine numbers in Jalen Ramsey. They also took defensive risks like Myles Jack and Sheldon Day, also suggesting they are focusing on addressing some of their bigger needs. As for the uncertainties, obviously the Rams and Eagles gave up a lot for quarterbacks who may or may not pan out, but the Cowboys, Texans and Redskins also drafted skill players in the first round who will be tougher to predict if they can translate to wins.

What we're learning with the latest research is the NFL Draft is not the crapshoot it once was. It will never be perfect, but it is also becoming clearer which teams are heeding this research and who prefers considering non-analytical advice.

Playoff Unpredictability

Pasted GraphicUntil recently, the Los Angeles Lakers were one of the fixtures of the NBA Playoffs, and in many seasons, the Finals. They have put together dynasties in different generations of the sport, from Magic Johnson's teams to the Shaq and Kobe era. When the Lakers were not winning titles, chances are another team was enjoying its own dynasty, like the Boston Celtics, Chicago Bulls or San Antonio Spurs. Dynasties are so commonplace in the NBA, 15 franchises in the sport's history do not have a championship (and seven of those still in existence never even made it to the Finals).

The NBA is unique in this regard: championships are won in bulk. Other leagues offer more parity, where there is a larger pool of contenders vying for a title. There may be dynasties in other sports, but there seems to be fewer of them, each shorter in duration and there stood a better chance someone unexpected can claim the sport's top prize.

Which of the four top professional sports leagues (NFL, NBA, MLB and NHL) offers the most playoff unpredictability? Is the NBA truly the most predictable? Is it significantly more predictable or marginally so?

One approach to answering these questions is by using a statistical model for each sport. Here, we will use
logistic regressions, where we will look at only wins (or points in hockey) and see how well it predicts whether a team won a championship that year. Here are some other notes for setting up this project:

- All data used begins with the 1989-90 season because
the NFL had the biggest chance to its playoff format at the turn of the new decade.

- Any season in any sport where a lockout shortened the number of games played considerably was removed (e.g., the 1998-99 NBA season, the 2012-13 NHL season, etc.)

- Though the NHL played 80 and 84 games in a few of these seasons, these numbers are not significantly different from the 82 played the rest of the dataset, so they are still used.

At first glance, every variable representing wins is statistically significant with 99% confidence, which should be obvious because you need so many wins just to make the playoffs. What matters is how well wins alone predicts championships. In statistical parlance, we will use a goodness-of-fit measure called
AIC (Akaike Information Criterion) to answer this question. As this number gets smaller, the model has a better fit. The following shows how well each model performs:

Screen Shot 2016-04-17 at 7.47.11 AM
The larger the bar, the more unpredictable the league is. Again, as expected, the NBA is the most predictable, and by a considerable margin. This model also suggests Major League Baseball is the most unpredictable, with the NFL as a close second and the NHL as a close third.

There are a number of other variables that could be added to these models to help determine who will win a championship, but the simplicity of these models makes for an easier comparison across sports.

Predicting the Masters

IMG_3374Jordan Spieth is and should be one of the favorites to win the Masters. He's had two starts at Augusta National, finished tied for second in 2014 and won it in 2015. He also has a PGA Tour victory in 2016, the Hyundai Tournament of Champions.

But, the PGA Tour's website is predicting someone different. Using an analytic formula, the site says
Phil Mickelson will win the green jacket. There are three variables used: the overall rankings for driving distance, putting and scrambling. Mickelson has the best ranking when combining all three variables, and by a lot. The second-place golfer, Jason Day, is 38 "points" lower than Mickelson but only ten points better than third and fourth place (Marc Leishman and Rickie Fowler, respectively). If this formula is completely accurate, Spieth will finish 7th.

Though the simplicity of the formula can be appreciated, any Masters prediction should include past performances. This variable is highly predictive. It explains why Fred Couples finished in the Top 20 in five of the last six years, even though he has played on the Champions Tour since 2010. It might also explain why the Masters remains the only major championship Rory McIlroy has yet to win (he has finished 8th or better the last two times at Augusta National).

Even when adding this variable, it does not take away from the argument for Mickelson. After all, he has won a pair of green jackets and finished tied 2nd in 2015, four strokes behind Spieth. It is also worth noting, of the 48 different golfers who have won the Masters, 17 won it multiple times (35.4%). Look for Mickelson, Spieth or Adam Scott to finish atop Sunday's leaderboard.

Evaluating Your Bracket

Pasted Graphic 1The Law of Conservation of Mass tells us: matter is neither created nor destroyed. When you burn your horribly incorrect college basketball bracket, remember, you never destroyed it, it is in another form somewhere in the universe. So instead of ignoring your transgressions, let's embrace what still exists and see which approaches were the best when predicting who will be in the Final Four.

There's a one-seed (North Carolina), a couple of two-seeds (Villanova and Oklahoma) and a 10-seed (Syracuse). There is not as much parity with this quartet as with some tournaments in the last few years. Still, some of the favorites to win the National Championship did not survive the first two weeks of this crucible. For instance, the top three teams in the Pythagorean Rating at the end of the conference tournaments are not playing in Houston. In fact,
Syracuse did not even crack the top 25, until recently. ESPN's Basketball Power Index offers these rankings: North Carolina (1), Villanova (3), Oklahoma (6) and Syracuse (39). The LRMC Basketball Rankings still has its two, three and seven, but ranks the Orange 41st.

Some computer models have resorted to predictions without solely implementing historical data. How is this possible? Microsoft's search engine, Bing, uses social media to determine which teams will survive and advance.
It has already proven successful in other sporting events like the World Cup and NFL games. But how did it fare for this tournament? Sadly for Bing, it only predicted one Final Four team correctly (North Carolina). In fact, the system predicted the Orange to lose their first game.

It should be clear by now the two schools that ruined this tournament's predictiveness: Kansas and Syracuse. The Jayhawks were the top team by nearly all accounts, yet lost in the Regional Final,
perhaps uncharacteristically. At the other end of the spectrum, Syracuse could be the worst team ever to make the Final Four. There have been 11-seeds to make it to the final weekend of the season, but many debated if Syracuse even deserved to make the tournament. Their RPI was 72 at the time of selection, worse than other schools that were not chosen (e.g. Valparaiso, San Diego St. and St. Bonaventure). Instead of the favorite vying for the National Championship, it's the controversial at-large two wins away from glory.

Even listening to me would not have been wise. Using my own system, I only correctly predicted one team (and it was a different school than what I said was coming out of that Region on Fox 4). My National Champion was knocked out during the Elite Eight (Kansas) and my second place team lost in the First Round (Michigan St.).

So what is the best way to fill out your bracket for the next tournament?

I don't know.

A Recap of the 2016 MIT Sloan Sports Analytics Conference

Sloan 1For the 10th time, sports analytics enthusiasts of all kinds came to Boston to attend the annual MIT Sloan Sports Analytics Conference. I was one of close to 4,000 attendees, though this was my first. Coaches, general managers, players, journalists, academics and just about anyone else in-between gave their takes on the industry and shared their research to the masses.
The following stream-of-consciousness features the panels I attended and some of my bigger observations.


Sloan 2

The War on Analytics

Goose Gossage isn’t the only one profanely fighting analytics. If you believe some of the speakers at the MIT Sloan Sports Analytics Conference, there exists a countermovement to the quantitative revolution.

Perhaps it was most appropriate the 10
th anniversary of this meeting began with a “Moneyball Reunion” panel, including the author of “Moneyball” Michael Lewis, the Godfather of sabermetrics Bill James and an assistant for the Oakland A’s, Paul DePodesta. That team’s general manager, Billy Beane, found a reason for using analytics when scouting players.

“Billy used to tell our scouts…’I have all of this experience’”, said DePodesta, referring to Beane’s 25 years of working in some capacity in Major League Baseball. “I can’t walk into a high school game and say ‘This guy is going to be a star.’ If I can’t do it, I don’t know how anyone can do it…we have to come up with a different way,” said DePodesta.

The team combated old school thinking by finding players who were devalued in some way by others. Sometimes it was due to their physical stature. Lewis recalled the story of the A’s considering Alabama catcher Jeremy Brown, who many considered overweight: “He’s so fat, his thighs would rub together and set his jeans on fire.”

These stories happened more than a decade ago. Just like analytics, the criticisms and concerns have evolved. The second panel of the day focused on basketball and featured former NBA forward Shane Battier. He originally resisted analytics for a more personal reason.

Teams can quantitatively gauge a player’s health when it comes to sleeping habits, nutrition, etc. On the surface, it seems franchises would only need to know this information to maximize a player’s health, thereby making him/her more effective. But Battier’s concern was that teams would find some data to devalue him and have reason to pay him less and/or offer fewer years on a contract.

“It’s called capitalism,” said Battier.

Personal reasons or otherwise, Battier does believe there is a stigma within NBA locker rooms about what he called, “the math”. Though he claims it extended his career as he aged, it’s “still not cool to be hip to the math”. He did add if a player found analytics to be useful, they might find subtle ways to learn to how to improve.

The conflict between believers and non-believers rages on. Safe to say this conference preaches to the choir. When asked about Goose Gossage’s comments that baseball is now run by nerds, Bill James’s response received one of the louder ovations of the morning: “Back in 2002, you had to pay attention those guys. Now, you can just ignore them.”

Sloan 3

Talking About Playoffs

Taking a personal tone with this blog entry, one of the more interesting panel discussions of the day involved playoff analytics. Specifically, how do we devise the best system for determining a champion for each respective sport? It’s a philosophical question as much as it is analytical because leagues could simply have one-game championships for every sport; and though it would be exciting, it would also be inherently unfair for teams that would win a series but lose the opener.

Each sport has its own set of challenges. While the NFL cannot play as many games as other professional leagues, college athletics must deal with other factors. NCAA executive Oliver Luck points to class time, money for travel and time commitments that, if abused, would be unrealistic for student-athletes.

However, at the forefront of these conversations is attracting the most loyal fans. They may not want to see a nine-game World Series (something I have argued for) because it is too long to retain interest. Nine games might be a truer way of determining the best team in a series—especially with expanded starting rotations—but in the end it is what the fans want, and that is something analytics can help with. NASCAR Vice President of Strategic Development Eric Nyquist pointed to how analytics helped his sport redo the Chase for the Sprint Cup so that a champion is not already determined by season’s end but it is not entirely haphazard as to who earns honors as the top driver.

Playoffs can also have other benefits when done correctly. Luck said the College Football Playoff has helped teams schedule more competitive non-conference games. It has also helped college basketball in spotlighting conference tournaments and conference games (though admits non-conference games could be more popular than they are).

This panel also agreed on an underlying truth that analytics highlights: there are many more games that would have to be played in all sports to determine the best team, at least thousands. Because this notion is unachievable, the next best thing is to come up the playoff format in the sport’s best interest. Who does it best? Neil Paine of fivethirtyeight.com says the NFL because it preserves uncertainty but the winner is often in the conversation of one of the top teams that season. The NBA, meanwhile, has too much certainty and only a handful of teams, if that, have a chance at a championship.

It would be ponderous for me to go through each sport and say whether I think they conduct playoffs properly. I also understand why uncertainty must exist to keep fans interested so there are fewer things to point to that would dissuade fans from following the playoffs. Still, I would hope leagues avoid caving too much to all of the whims of fans and perhaps provide a product that is fairer to the teams competing for championships than those rooting for them. I have found it is in the long-term best interest of a sport to maintain an unaffected, traditional system and not make determining a champion seem so capricious.

As a postscript, I found professional bowling to be the worst in determining a champion. In the tournaments I covered, early rounds would be a matchup of two bowlers in a best-of-seven series of matches, but once you reach the final rounds—which are televised—it is one match determining who advances and who wins the whole thing. To prove my point, I would like to believe this is why the sport is not as popular as it once was. I am probably mistaken, and if you are adamantly opposed to this idea, might I suggest a winner-take-all debate.

Sloan 4

Evolution of Sports Journalism

Of all of the panels at this conference, this was the one I was most looking forward to (surprising, isn’t it?). While it took a circuitous route to discussing sports analytics, it was a journey worth taking. For you young journalists, pay attention closely.

One of the more dominant voices on the panel was Jaymee Messler’s, President of the Players’ Tribune. Her company describes itself as “a new media company that provides athletes with a platform to connect directly with their fans, in their own words”. Founder Derek Jeter says he hopes the site will “transform how athletes and newsmakers share information”.

“We’re not following the news cycle,” said Messler. “We complement the media really well…driving stories that are compelling and are not getting covered by the [traditional] media.”

Here’s how it works: an athlete has a message they want to deliver. The Players’ Tribune offers a platform replete with resources to make sure it is exactly what they want to say. While traditional media might lose the ability to break the story, they gain material for questions the next opportunity they have for an interview.

The criticism involves the last part of this sequence. Why would the athlete grant an interview? Why would they talk about something if they feel everything about it has already been said? If they spend less time with reporters and more with the tribune, how do you build trust? (
My thesis alluded to many of these problems).

“The barrier to entry is zero,” said David Dusek of Golfweek. “You can, with a few clicks, get your voice out there…the players are much more controlling in that way and they have a way to react directly to fans (sometimes the media) and to have their voice heard…it’s interesting to see how it’s becoming more challenging.”

Reporters already had challenges talking to athletes before the Players’ Tribune thanks to athletes’ social media accounts. They already have a way to communicate to the public so a reporter may seem like a middleman. Traditional media also has to compete with new media that can provide scores and highlights more quickly than they can present. Lastly, clichés have become even more tired than ever.

What’s a reporter to do? One solution: analytics.

“Analytics is just one avenue to get a creative solution around limited access,” said Carl Bialik of fivethirtyeight.com. “We do want to talk to people in the sports world about what we find…some of the best interviews I’ve had are with people who are rarely asked about certain things.” These things include data trends, advanced statistics and specific forecasts.

Not all reporters can (and perhaps should) research their own analytics. It may not even be the unique route they should take to become more creative. What matters here are the conflicting forces that make the journalist’s job more challenging. Fortunately, there are solutions, hence the evolution.

Sloan 5

Conclusions

Virtually every hour of this two-day event, there are six different panels and lectures to choose from. I attended as much as I could while still covering the event and was not present for 49 different events, and that was just on Friday. That’s not to mention the many sports science exhibits, software presentations and other technological displays I was unable to see readily.

Perhaps one of the things that has attracted more than 3,000 people to this conference is the depth of sports analytics presented. Poster presentations and white papers are available for the deeply analytical. Other events like panels speak of analytics in broader, general terms. Even if a sports fan only wants to see players and coaches discuss their craft, there is a place for that person too. There is also a variety of subjects covered, from business analytics to athletic performance measurements to sports journalism and even to the future of how we will watch and listen to games.

While sports like football, hockey and soccer were covered, there were not as many baseball presentations as one might expect. Analytics have progressed more within that sport than any other. One reason might be a national sabermetric conference happening the same week in almost the other end of the country. It is also Spring Training with many MLB teams preparing for the season. Still, it might be a positive development for sports analytics to stress other sports so it can branch out and attract different fans. On at least two occasions, panels discussed how the NBA and basketball have the most room to grow internationally in terms of popularity.

The conference also took on developing stories. The Steph Curry phenomenon of making so many lengthy basketball shots had its share of supporters. Away from sports, Nate Silver of fivethirtyeight.com updated his political findings of who will be the major party nominees for President. Even conversations I had with presenters and attendees involved sports stories happening in the moment.

If analytics do not whet your appetite, this conference may not change your mind. After all, the pro-analytical comments were often received with at least some fanfare, a kind of “preaching to the choir”. For anyone who does have the slightest interest in sports analytics, chances are there will be at least one lecture or exhibit that will make for an informative weekend.


(All photos courtesy of MIT Sloan Sports Analytics Conference)

Special Teams Not as Special as They Used to Be

GoalpostsVirtually any football fan has heard cliche after cliche about the importance of special teams.  After all, why would they be called "special" if they were anything but?  There are too many instances of momentum being seized and lost because of an impressive kickoff return, devastating injuries affecting a team and the excitement caused by a game-winning field goal.  However, analytics suggest this phase of the game may not be as special as it once was.

Many data scientists have put together linear regressions weighting the importance of a team's offense, defense and special teams for the outcome of a game.  These models say special teams account for less than 20% of the overall effect to the outcome of a game.  
Some models suggest even less.  Winston (2009) put together a regression excluding any special teams variables in his book, Mathletics, and had an R^2 of .8733 and an adjusted-R^2 of .8577 (p. 129).

These models have been around for years, but only recently are we starting to see NFL teams deemphasize special teams:


Screen Shot 2016-03-04 at 12.04.02 AM

This figure represents the touchdowns scored from kickoff returns (red) and punt returns (blue) in the NFL since 2005.  Especially in the last three years, there have been fewer kickoff returns for touchdowns.  Some of this downward trend can be attributed to the league moving the ball to the 35-yard line to promote touchbacks.  Punt return touchdowns had a spike in 2011 and 2012, but have since leveled and do not have a discernible trend over time, positive or negative.  It still does not detract from the overall notion there are fewer points scored from this phase of the game.

What about extra points and field goals?  This past offseason, the league moved the extra point back 13 yards.  
It resulted in a reduction in successful extra point attempts, from 99.3% to 94.2%.  However, this amounts approximately to 80 missed extra point attempts over the course of an entire season for the entire league.  There are even fewer examples of this move affecting the outcome of a game, though one can make an argument with a notable example in the latest AFC Championship Game.  As for going for three, many agree it behooves teams not to kick field goals as frequently as they do.  Lately, there have been fewer field goal attempts.

Again, most of the theoretical research here has been around for a few years, but many successful NFL teams have now heeded the findings and do not invest as much in special teams as they once did.  While many will still pay for top-notch kickers and punt returners and have important reasons for doing so, we are seeing the NFL evolving to a more analytically based approach to the not-as-special special teams.

Greetings and Welcome!

10171781_10100528338078239_2690722215811075676_n

Hello and welcome to the blog portion of my website.  Here, I will write about sports analytic findings I have researched, analyze others' approaches to these quantitative tools and discuss the future of this field.

Though we are seeing players, coaches and the media become more comfortable discussing analytics openly, it also seems to be confined to specific areas like gambling and fantasy sports.  This blog will dig deeper into these areas by means of forecasting, but it will also infer how and why things happened in noteworthy games.  Models, data visualizations and other analytic tools can communicate these ideas.

One goal for this website is to bridge the gap between those who embrace analytics and those who shun the tools.  I have never been comfortable operating with the belief there are two distinct camps.  I believe analytics should be a part of a toolbox for fans and those who work in sports.  If a tool makes the job more efficient, then it should be used; if not, then find another tool or do not use any.  Attaching personal feelings one way or another does not (and should not) serve anyone's purposes.

I also hope this blog will be a call to action for those who read.  If you would like to comment, please do so.  If you would like to reach out, please click "Contact Me" at the top of this page.  Thank you for visiting and I hope you enjoy what this site has to offer.