By: Edward Egros

sports

Is Team USA THAT Dominant to Win the Ryder Cup?

Pasted Graphic 1
Putting it as simply as possible, the Ryder Cup strategy seems to be for Team USA to design easier golf courses and for Team Europe to design tougher ones. It's why, in the last five Ryder Cups, the team with the home course advantage has won four of those five tournaments (with the lone outlier being in 2012, arguably the greatest comeback in Ryder Cup history with Team Europe needing eight points to win and Team USA needing just 4.5 points). The logic comes from analytics groups that believe keeping the action shorter (concentrating on wedges and the putter) benefits Team USA.

The host course for 2018, Le Golf National, seems to be gaining respect from golfers as far as
how important it is to stay in the fairway, seemingly benefitting the Europeans. However, if we look at Strokes Gained: Off-the-Tee from PGA Tour events, Team Europe's average slightly favors the Americans (.382 versus .348). Note: Sergio Garcia and Thorbjørn Olesen did not quality for this statistic so it was assumed their Strokes Gained to be zero.

Golfers also discussed the importance of iron and hybrid shots, so it may be safe to look at
Strokes Gained: Approach-the-Green for guidance. Here, we see an advantage for the Americans: .475 versus .331. In other words, if Team USA does well with tee shots they may be unstoppable. If they have trouble finding fairways, Team Europe has an opportunity.

The Americans are heavy favorites to capture their first Ryder Cup in Europe since 1993 (
-150 versus +130 for Team Europe). While so many more players at the top of the Official World Golf Rankings belong to Team USA, do not be surprised if the long game becomes enough of an advantage for Team Europe to stay in contention.

Biggest Snubs for the Cowboys' Ring of Honor

Pasted Graphic
There was a time when virtually the only way a Dallas Cowboy could make the Ring of Honor was to win a Super Bowl. All but two current members had at least one championship (Don Meredith and Don Perkins). However, this week Cowboys owner Jerry Jones affirmed Tony Romo would be inducted into the Ring of Honor. Not only did the former quarterback fail to reach a Super Bowl, he would be the first Cowboy in franchise history to be inducted without even having won a conference title.

Individually, Romo may not have had stellar a career as Ring of Honorees Troy Aikman or Roger Staubach, but he does surpass the efforts of Meredith, and for being a part of the Cowboys for a dozen years, he likely deserve a place in north Texas immortality. By including Romo, the Cowboys introduce the idea that championships should not be weighted as much when determining who belongs, perhaps opening the door for others.

This idea leads to a question: Who is the most deserving Dallas Cowboy for the Ring of Honor who has yet to make it? One way to evaluate individual performances is with
Approximate Value by Pro Football Reference. The top eight players in Cowboys history have already been inducted, from Emmitt Smith (#1) to Staubach (#8).

The highest Approximate Value not to have his name on the ring is Cornell Green, a cornerback who played for 13 seasons, including for the 1971 Super Bowl team. With 34 interceptions, 171 games started and five Pro Bowl invitations, Green has a better case to make it than anyone else not there, per this metric. His 9th best Approximate Value is better than Aikman, Romo, Lee Roy Jordan, Larry Allen, et al.

Two other players who finish in the Top 20 but who are not in the Ring Honor include Ralph Neely, a left tackle as part of the '71 Super Bowl champions and Nate Newton, the left guard who played during the Cowboys dynasty of the 1990's. While this metric may not be the perfect way to compare players, it does highlight some inconsistency for why some players have already been inducted and why others have had to wait.

TOUR Championship Preview

Pasted Graphic
It may seem like ancient history, but Bryson DeChambeau was nowhere near a favorite to win the first leg of the FedExCup Playoffs; sitting obscurely in the Top 125 rankings. Two wins later and a Top 20 finish at the BMW Championship, DeChambeau will start the TOUR Championship with more points than anyone else. As we've profiled before, the points leader has roughly a 28% chance to win based upon points alone and assuming equal abilities for every golfer (even though points have been redistributed since our post).

Because of DeChambeau's two victories in the last three tournaments and three wins in the last 14 months, he is a trendy pick to win the FedExCup. However, a lot of research suggests there is no such thing as a "hot hand" in golf. In other words, just because a golfer is playing well the hole before or the day before, does not mean he/she is drastically likelier to play successfully the following round. Christopher Cotton, Frank McIntyre and Joseph Price
wrote about this phenomenon and Alan Reifman has discussed the lack of a drastic "hot hand" in several sports including golf.

Again, referencing the previous article, those in the Top 5 in points tend to win this event, and Strokes Gained: Approach-the-Green seems to have the most predictive value of all Strokes Gained statistics. Also, because of a much smaller tournament field than a usual tournament, birdies and birdie averages seem to matter more than normally.

For being atop the birdie average list, fourth in FedExCup points and fifth in Strokes Gained: Approach-the-Green, I am choosing Dustin Johnson to win this year's FedExCup and $10 million prize.

As for Daily Fantasy lineups:

Dustin Johnson
Tony Finau
Keegan Bradley
Brooks Koepka
Tommy Fleetwood
Phil Mickelson

Justin Thomas
Bryson DeChambeau
Jon Rahm
Tiger Woods
Jason Day
Aaron Wise

2018 Cowboys Postgame Reports

Pasted Graphic
For the third-straight year, after every Dallas Cowboys game, I will provide an analytical graphic to begin the conversation as to why the Cowboys won or lost that particular game. However, this year features a new look and simplified visualizations so it's easier to follow and compare what happened. Our graphic is an example from the Cowboys preseason game against the Cardinals.

There are four factors:

- Turnover Margin
- Scoring Efficiency
- Net Yards/Pass Attempt
- Game Control

Our intelligent readers already know what Turnover Margin is, so we move on to Scoring Efficiency, which is essentially points divided by yards. Here, we include percentages, so the more efficient team earns the 100% margin, and the less efficient team shows the fraction of its efficiency compared with its opponent.

Net Yards/Pass Attempt is (passing yards - sack yards) / (passing attempts + times sacked). Because of the reliability of this metric not just to evaluate quarterback performance but also its consistency over time, this serves as an important metric to include.

Lastly, Game Control is based upon a regression where each explanatory variable is the number of rushing yards per quarter and the dependent variable is the likelihood of winning. My research found, predictably, that rushing yards in later quarters matter more to winning than earlier in games. Here, we add up each team's rushing yards and multiply by a factor for each quarter they were rushed in. We then take those results as a proportion to see how much each team controlled the game.

As always, feedback is appreciated!

It May Seem Like Mayhem, But...

Pasted Graphic
Though a few schools decided to start the college football season one week early, the heavyweights, the blue chippers, the ones who are constantly atop any set of rankings you can find and are in contention for that trophy…begin this weekend.

As before, we can use parts of our
college football prediction model to determine who is likeliest to have the most talent and the most favorable schedule, including who has the toughest games at home and if the toughest games are on days with ample rest and preparation.

Using all of this information, my prediction for who will make this year's College Football Playoff are:

Alabama
Ohio State
USC
Florida State

Virtually every year, there is a surprise team sparingly chosen that charges from
outside the Top 10 to the Final Four. This year, I am picking two. First, while many say Washington will represent the West coast, I like USC because of more highly ranked sophomore and junior classes (per 247 Sports) and Washington begins the season in Auburn (a Top 10 team in many metrics including ours), while USC's toughest non-conference opponent is at Texas (not as strong as Auburn), and the Huskies are likelier to lose than the Trojans while USC still earns solid strength of schedule numbers. The Trojans also boast one of the better receiving corps which should help a true freshman quarterback in JT Daniels feel comfortable.

The other outsider is Florida State, edging a perennial contender in Clemson. Again, the Seminoles have more highly ranked second-year and third-year classes and Clemson plays at Florida State. Last season, the Seminoles were ranked third in the AP Preseason. You can make the argument: had they not lost starting
quarterback Deondre Francois for the season with an injured patella tendon in his left knee, they would have been in contention. The running game also carried that offense, and with Cam Akers and Jacques Patrick providing depth in the backfield, this offense should not be overlooked.

This playoff is entering its fifth season. Even though USC and Florida State are outside of the AP Top 10, the Seminoles have been in the playoff before, and the Trojans are the defending Pac-12 champions. It may seem like mayhem, but it's not.

Previewing the 100th PGA Championship

Pasted Graphic 3
(Courtesy: Gary Kellner Getty Images)

In some ways, the PGA Championship is the toughest to predict of all four majors. Previous performance is an enormous factor for the Masters, past results at links style courses help with the (British) Open (and when applicable the U.S. Open) and long hitters often perform well at the second major. But with golf's final major, the skill set required to win can vary significantly. One trend worth noting is those who win the Wanamaker Trophy do well at the other majors. It has the second-fewest number of winners whose only major victory was that major (the Masters has the fewest single-major champions). However, the last three winners of golf's final major are first-time major champions (Jason Day, Jimmy Walker and Justin Thomas).

To make matters even trickier, it's been 10 years since Bellerive Country Club in St. Louis has hosted a PGA Tour event,
and the leaderboard does not exactly uncover a trend for success. However, because rainy weather seems to have softened the course, putting may not be as big of a factor as driving and the short game. The usual suspects appear atop the Strokes Gained: Tee-to-Green leaderboard: Dustin Johnson, Justin Thomas, Francesco Molinari and Henrik Stenson. The PGA Championship has also been known to produce some low scores. In fact, five of the last six winners posted double digits under par. After adjusting for the field's average score of tournaments played by each individual golfer, the lowest scores this season come from Johnson, Justin Rose, Jason Day and Thomas.

Including these statistics, the number-one Official World Golf Ranking and his considerable driving distance, Dustin Johnson is my pick to win the 100th PGA Championship. As for my Daily Fantasy lineups:

Dustin Johnson
Jason Day
Tony Finau
Luke List
Webb Simpson
Hao-Tong Li

Paul Casey
Bryson DeChambeau
Tommy Fleetwood
Ryan Moore
Louis Oosthuizen
Justin Thomas

Ohio State's Less Important Question

Pasted Graphic
Ohio State head coach Urban Meyer continues to face the possibility he will not coach the Buckeyes ever again. The school placed him on paid administrative leave as it investigates if he failed to report (or do anything about) an assistant coach allegedly committing domestic violence. This assistant may have exhibited a pattern of horrific behavior, yet remained on Meyer's coaching staff at Florida and Ohio State for years after reported incidents. The school announced it would like to end its investigation in the coming days.

What matters far less than potentially covering up violent crime is football itself. There exists the serious reality an entire football team will have to scramble to organize, practice and get through a gauntlet of a season, all because its leader exhibited incredibly poor judgment. There also exists an unfortunate reality if no reasonable explanations can be uncovered during this investigation: doing the right thing has consequences.

Other college football programs have parted ways with its head coach within a couple of months of the season's kickoff. In 2017, Ole Miss head coach Hugh Freeze resigned
after questions were raised about phone calls made to a female escort service. One year earlier, Baylor fired head coach Art Briles after a couple of his players were convicted of sexual assault and many more women came forward alleging some within the football team committed multiple acts of violence against them. Lastly, in 2012, Arkansas fired head coach Bobby Petrino for unfairly hiring a mistress, not disclosing the nature of that relationship to his boss and not admitting to authorities she was present when Petrino suffered a motorcycle accident.

In each case, I looked at how many wins each team was projected to win prior to each scandal,
according to our prediction model. This model takes into account recruiting rankings of the sophomore and junior seasons from 247sports (the classes we found to be statistically significant), home and away schedules and if any games were played other than on Saturdays. Here are the results:

Pasted Graphic 2


For Ole Miss, near the end of the season the Rebels had four games decided by one possession. In each game we projected them to win; however, they went 2-2. An 8-4 possibility became a 6-6 performance. For Baylor, there was a three-game stretch near the end of the season where things seemed to fall apart (i.e. losses to Kansas State, Texas Tech and West Virginia). The Bears could have gone 10-3, but instead finished 7-6. Lastly, for Arkansas, we suspected a dip in performance after coming off an appearance in the Sugar Bowl, but the downtick turned out to be more severe. Instead of perhaps going 7-5, they went 4-8.

Several other factors could have caused an underperformance of these projections, so it cannot be definitively concluded the departure of the head coach caused the unforeseen losses. However, intuitively it might make sense that a coaching change late in the offseason could mean two or three additional losses. If, indeed, Ohio State decides to fire Urban Meyer, and if it does mean the Buckeyes narrowly miss out on championships, only Meyer is to blame.

The "Reliable" Open

Pasted Graphic
Colloquially, Carnoustie might be considered the toughest test of all courses that are part of the Open's rotation. Often when the course adds to the already high degree of difficulty a major provides, those who reliably do well at majors become natural favorites when picking a winner and putting together fantasy lineups.

In the last five years, when measured, the winner of the Open finished in the Top 25 in Strokes Gained: Tee-to-Green and Strokes Gained: Approach-the-Green, and the only reason why it isn't even more selective is because of Phil Mickelson's 2013 victory when he hovered around the 25th position in both metrics.

Perhaps the most intriguing option for a winner is Henrik Stenson. In the last five years, the player with the lowest score to par in the Open is Stenson, who won the 2016 championship in what was essentially match play that Sunday. However, he is battling through an elbow injury and even claimed he is not 100%. The second lowest score in the last five years belongs to the man Stenson beat that year, Mickelson. Given Stenson's price in DFS, I'm willing to take a risk on him and make a "preliminary" favorite to win.

If, for some reason, his injury prevents him from playing to his potential, my pick to win is Justin Thomas. He currently ranks 4th in Strokes Gained: Tee-to-Green and has performed well in other majors that were played on links style courses, such as the PGA Championship at Whistling Straits in 2015 and the U.S. Open at Erin Hills in 2017.

Here are my two DFS lineups:

Justin Thomas
Patrick Cantlay
Marc Leishman
Luke List
Francesco Molinari
Henrik Stenson

Keegan Bradley
Tony Finau
Sergio Garcia
Rory McIlroy
Alex Noren
Xander Schauffele

World Cup Finale

Pasted Graphic
Just like with statistics themselves, analyzing the results of any model can be manipulated and spun to fit a narrative. On the one hand, our World Cup model was not perfect when it came to picking the result of every match correctly. Again, our ground rules were to correctly predict if the designated "home team" would win, lose or draw in the group stage and win or lose in the knockout stage. Here are our results:

Group Stage: 27/48 (21 results were one of the two other outcomes than predicted)
Round of 16: 5/8
Quarterfinals: 2/4
Semifinals: 1/2

On the other hand, most of the misclassifications were often marked as having poor odds. For instance, for the Semifinal between England and Croatia, our model gave England a 53% chance to win. The odds were small enough to suggest extra time would be a decent possibility, and in fact
that was the outcome. Also, no other models I was actively monitoring forecasted the more unbelievable results, such as Russia knocking off Spain. In general, we are pleased with our results.

On that note, here are our predictions for the final weekend of the World Cup:

Final: France defeats Croatia (78%)

3rd Place: England defeats Belgium (65%)

World Cup Quarterfinals

Pasted Graphic
Eight teams remain in contention for the World Cup. Our model went 5-3 predicting "Round of 16" matches. While many were predictable, few saw Russia upsetting Spain; the other two we missed (Uruguay beating Portugal and Sweden knocking off Switzerland) were essentially toss-ups. So far, we are quite pleased with our results.

Our next step is to predict Quarterfinal matches, and if you cannot wait for our social media posts or our reveals on Good Day, here they are:

Brazil defeats Belgium (76.4%)
Russia defeats Croatia (50.7%)
England defeats Sweden (58.2%)
France defeats Uruguay (67.8%)

World Cup Predictions

Pasted Graphic
Once again, I'll begin this post by apologizing for so few updates in the last several weeks. However, I have been diligent with analytical research; and now I can share with you one of these projects, and it pertains to the World Cup.

In collaboration with students from Southern Methodist University, we have devised a model that predicts the outcome of every match. The data used includes outcomes of international matches between national teams dating back to 2003 (this does not include friendlies), the location of where they were played, what tournament that match was a part of, the distance each country had to travel to play that match and the total market value of each team, meaning the sum of each professional contract for each player belonging to that team.

Since the start of the tournament, I have reported these results on Fox 4, while also providing context of how the tournament is unfolding, using analytical tools. You can see these videos both at the bottom of the home page
and on my YouTube page.

Lastly, I have assembled all of the data and other files onto a Github page so you can follow along with what we have been doing.
Click here for that information.

As always, I would appreciate feedback you can offer. You can also share your own models by clicking the link: "Contact Edward".

Need Reasons Not to Pick Spieth at the Byron Nelson?

Pasted Graphic
Even the youngest of golf fans know how big of a favorite Jordan Spieth is at the Byron Nelson. Vegas odds have the Dallas native as a heavy favorite, he's the only golfer in the Top 20 in every Strokes Gained category except for putting (and putting is a volatile statistic), he's played at the tournament's new home at Trinity Forest a lot and the field is one of the weaker ones on the PGA Tour. In fact, only five of the Top 50 in the world will compete this weekend. Spieth is 3rd in the Official World Golf Rankings, by far the best in the field.

Then again, Spieth was also familiar with the tournament's old home at TPC Four Seasons and he failed to notch a Top 10 finish there. What makes Trinity Forest different is it's a links style, with wind playing a significant factor, no trees on the inside of the course—only outlining the exterior—and unusually large and detailed greens (this course even features one green with two holes). Because no PGA Tour event has been held here until now, there is no historical data to help determine who is likeliest to win. For my analysis, I am replacing course history with other links style courses, and wouldn't you know it? Jordan Spieth shines in this model too, the defending Open Champion with a 4th place finish in 2015 and a U.S. Open championship at Chambers Bay.

Here's one more perspective: as golf analyst
Mark Broadie points out, winners average about 35% of their total strokes gained from their approach shots. At a links style, it is possible that statistic inflates, what with handling the unique greens and unusual winds. Those playing this week who have better Strokes Gained: Approach-the-Green numbers than Spieth are Scott Piercy and Sergio Garcia, who has handled other Texas tournaments well. There are a few players who could spoil Spieth's first win at this tournament, but it's hard to find them.

Here are my daily fantasy teams:

Jordan Spieth
Sergio Garcia
Robert Streb
J.J. Spaun
Hunter Mahan
Robert Garrigus

Adam Scott
Marc Leishman
Rory Sabbatini
Kevin Na
Scott Piercy
Bill Haas

One Personal Note About the Masters

Pasted Graphic
It seemed like Sunday offered the kind of drama befitting Jim Nantz's description: "the most anticipated Masters…in our lifetime". Jordan Spieth, a former winner capable of dominating Augusta at any moment, nearly shot a course record to complete what would have been the biggest comeback leading into the final round in Masters history. Patrick Reed, who began Sunday with a three-stroke advantage, stayed around his starting score while those nearest to him were crumbling. For a few moments, it seemed like Spieth was going to catch Reed and fulfill the mantra that anything is possible at golf's first major.

Emotions are one thing, statistics are another.

DataGolf calculated its own odds for who would win the Masters, stroke-by-stroke. Even as Jordan Spieth trimmed his deficit with each passing birdie, Patrick Reed remained a sizable favorite for a number of reasons. First, no one has ever shot a 62 at the Masters before; but a few have shot a 63 and a few more have carded a 64. Expecting something unprecedented should be statistically small. But even if Spieth had pulled off that feat and we assume nothing else would have changed in terms of Reed's game, a course record would have only tied Reed, so nothing gives Spieth an advantage to win.

Second, as Spieth was approaching the end of his round, Reed had several holes remaining. Though he was in the middle of Amen Corner which historically can be treacherous,
when Reed's ball sat up on the slope after his approach on the 13th hole, avoiding the water altogether, Reed avoided any major disaster that would have given Spieth an opportunity. Then, Reed had easier holes where he could card more birdies, including the Par-5 15th, where he even scored an eagle the day before.

Lastly, per ShotLink, Reed was already 24th on Tour in Strokes Gained: Tee-to-Green and 41st in One-Putt Percentage, so nothing suggested one aspect of his game could cause a collapse, he would probably remain steady at worst—which is exactly what happened—Reed gained one shot en route to his first major championship.

There may have been wishful thinking by many Spieth would have completed the comeback, whether that come from fans of his, haters of Reed or consumers of incredible storylines. Often those emotions can have us thinking irrationally, that someone can do something that unprecedented. But those stoic statistics reminded us just how much of a longshot Spieth was to win, no matter how thrilling he made it seem. It's not that analytics should prevent us from enjoying the spectacle, but it should put in context what we are witnessing, as it might deceive us.

My 2018 Masters Pick Is...

Pasted Graphic
A couple of months ago, I gave a talk at SportCon, a sports analytics conference in Minneapolis. There, I discussed how I come up with my predictions specifically for golf's first major of the year. If you'd like to listen to the podcast, click here.

What was not touted was, since I began my research into sports analytics, I correctly predicted two of the last three winners at the Masters (Jordan Spieth in 2015 and Sergio Garcia in 2017). Danny Willett in 2016 plays on the European Tour and given his inexperience at Augusta National and my ongoing adjustments as to how European Tour statistics translate to American courses, I missed that result completely.

I will apply the tobit model mentioned in my presentation for my picks, but will also use simpler statistics to highlight what matters most. This year may pose more uncertainty because of the number of international players who are playing well (their statistics do not always translate easily to Augusta National) and so many big names are playing well. Since 2012—when statistics are available for the winners—every Masters champion was in the Top 5 in Strokes Gained: Tee-to-Green. Also, since 2012, every winner was in the Top 16 in the
Official World Golf Rankings (OWGR) going into the tournament.

First, let's address the tiger in the room. Tiger Woods has won more purse money than anyone except Phil Mickelson at Augusta National. In fact, he's won approximately $3.5 million more than third-place Jordan Spieth. While he has shown steady improvement leading up to this week, and while I am willing to disregard his OWGR of 103rd, it is more difficult to assume his total winnings are not some sort of an outlier when analyzing the data (more technically, that there would be a perfect linear relationship between winnings and likelihood of winning the next tournament with the uppermost points that are substantially higher than everyone else). Tiger may play exceptionally well, but given he hasn't played since 2015, he remains a risky choice.

The aforementioned statistics do bode well for defending champion Sergio Garcia. He ranks first in Strokes Gained: Tee-to-Green, has three Top 10 finishes this season and historically has played well, finally putting it all together in a playoff victory. Even the player he beat in that playoff, Justin Rose, could earn a green jacket. Not only has he finished 2nd in two of his last three tries, in a dozen career appearances, Rose has finished in the Top 25 nearly every time and made the cut every time. One more honorable mention who grades highly is Adam Scott, the 2013 winner of this event. Though he has not been in contention in any of his seven events, he's had a relatively consistent game and a sterling history in majors.

But this year, my pick to win is Jordan Spieth. Yes, while his putting used to be a strength of his, it has now become problematic. In the three previous years at the time of the Masters, Spieth's Tour ranks for Strokes Gained: Putting were 39th, 17th and 5th. This time,
he's tied for 185th, missing several short putts throughout the year. However, my model classifies Strokes Gained: Putting as an insignificant variable because of the variability of the metric. More specifically, a golfer may look like a worse putter because the putts are much tougher, not because of ability. Also, Spieth says an illness during the offseason completely threw off his schedule, so he knew he would need additional time to have his game where he wants it.

In four appearances, Jordan Spieth has finished second, first, second and 11th. He ranks third in the history of the tournament in total winnings in just those four appearances. Currently, all three components of Strokes Gained: Tee-to-Green
rank in the Top 20 on Tour. If you believe in momentum, Spieth had his best finish of the year last week, tied for 3rd at the Houston Open. He finished tied for 2nd at that same tournament when he captured his first green jacket. It looks like he could claim his second in just a few days.

For those who assemble Daily Fantasy lineups, here are the two I am submitting:

Jordan Spieth
Paul Casey
Sergio Garcia
Kevin Chappell
Ian Poulter
Matt Kuchar

Bubba Watson
Hideki Matsuyama
Patrick Reed
Justin Rose
Adam Scott
Henrik Stenson

Prelude to the Masters

Pasted Graphic 1The uniqueness of the Shell Houston Open is not so much the course itself, but its timing. Some of the top players skip the event altogether so they can focus solely on next week's Masters, some may very well use the event as a tune-up, vying less for the win and more for retooling and some are playing this event to win. There are some players with a lot of success at this event, notably Phil Mickelson, Russell Henley and Henrik Stenson. Strokes Gained: Off-the-Tee, Tee-to-Green and Approach-the-Green all have predictive value; in fact, when looking at the last 50 Top 5 finishers, the majority were all in the Top 50 in the second pair of statistics. Given this information, here are my Daily Fantasy Lineups:

Keegan Bradley
Tony Finau
Luke List
Ryan Palmer
Kevin Streelman
Jhonattan Vegas

Chesson Hadley
Phil Mickelson
Henrik Stenson
Scott Piercy
Chez Reavie
Nick Watney

Tiger's Best Chance

Pasted Graphic
With the Arnold Palmer Invitational on the horizon, it is easy to forget: it is still golf.

At the Valspar Championship, Tiger Woods had his best finish in five years and was one stroke away from forcing a playoff. Clearly, he is on an uptick, and seemingly it's only a matter of time before he ends his five-year drought and captures a victory. His last win was the WGC: Bridgestone Invitational.

Two victories before that? The Arnold Palmer Invitational.

One of the more significant factors for winning at Bay Hill is past success. When charting Top 5 finishes the last several years, names like Henrik Stenson and Zach Johnson come up multiple times. But as for Tiger, he has won there eight times in his career, including four times in the past decade. Even years when Tiger was slumping by his abnormal standards, he could often count on a win during the Florida portion of the schedule.

These reasons are enough for me to include him in my Daily Fantasy Lineups for this week. When including the significance of Strokes Gained: Tee-to-Green and Strokes Gained: Around-the-Green:

Tiger Woods
Henrik Stenson
Adam Scott
Scott Piercy
Kevin Chappell
Kevin Streelman

Tommy Fleetwood
Alex Noren
Keegan Bradley
Charles Howell III
Luke List
Jason Kokrak

P.S.: A reality check.

As I posited
in an earlier post, the field is tougher now than it was when Tiger was dominating. In fact, last week's Valspar Championship could be proof of this idea: Paul Casey shot a final-round 65 to win by one stroke. As explained in that post, if you assume a stellar golfer gives up a full stroke when Tiger is in the field, Casey would have found himself in a playoff with Tiger, and the probability there gives a massive edge to Woods. Casey had not captured a victory in nine years, so to surge to the top of the leaderboard with one round suggests the sizable number of golfers capable of winning any given weekend.

Just because Tiger is on an uptick does not necessarily mean it is a straight line and he is guaranteed to win at Bay Hill. A lot is working in his favor, but every elite golfer stumbles at some point. It is still golf.

A Lesson in Mexico

Pasted Graphic

Even though golf did not give me anything in return after not cashing with either of my fantasy teams last week, golf gave a lot to Phil Mickelson. He won at the WGC: Mexico Championship, in a playoff, against arguably the hottest golfer at that moment, Justin Thomas.

What's more important is Lefty had not won an event in almost five years (his last victory was the 2013 Open Championship). Because of that drought, it might make sense for several daily fantasy players not to pick Mickelson. This game is more than just picking successful players and stellar lineups, it is about picking golfers who others do not think will play well. Sometimes prices will reflect these trends, but many times they will not, and those are the moments DFS players should try and seize when putting together lineups. It is something I hope I can refine as I move forward.

This week is the Valspar Championship. It is more of a shotmaker's course, so heavy-hitters may not be favored. However, looking at top performers over the last ten years, there did not seem to be discernible trends when it came to the perfect Strokes Gained statistic, though Strokes Gained: Tee-to-Green and Strokes Gained: Approach-the-Green did seem to have some predictive value. More specifically, a player largely could not rank poorly in either metric.

These teams are designed to have a mix of those who perform at least adequately well in the aforementioned statistics, those who have performed well at the Valspar Championship before and who may not be chosen frequently by others:

Jordan Spieth
Chez Reavie
Keegan Bradley
Adam Hadwin
Chesson Hadley
Chris Kirk

Sergio Garcia
Nick Watney
Adam Scott
Charles Howell III
Kevin Streelman
Webb Simpson

Entering the Daily Fantasy Zone

Pasted Graphic 1
This adventurous soul of a webmaster is embarking on a new quest: Daily Fantasy Golf. Over at least the next few weeks, I will submit two teams of six players to a Daily Fantasy Golf website in the hopes of determining if my models have enough predictive power to finish "in the money" with enough frequency to make a profit. Though I am not spending any money of significance, I am keeping track of where each team finishes and what prizes come about.

If you are not familiar with Daily Fantasy Golf, each user has $50,000 to spend on six golfers competing in that week's tournament. Each golfer has a price and it is up to the user to find the best combination of golfers with the best finishing order at the end of the final round, all while not exceeding that $50,000 limit.

I began with the Genesis Open, and though one of my teams had all six players make the cut, no money was earned. Then I assembled teams for the Honda Classic, focusing primarily on Strokes Gained: Off-the-Tee and Strokes Gained: Tee-to-Green. One team of Justin Thomas, Alex Noren and others did finish "in the green". Winners of this event in the past have excelled in those statistics.

This week, the scene is the WGC-Mexico Championship. This tournament proves to be particularly tricky to predict if only because this is just the second time the World Golf Championships have been to Club de Golf Chapultepec. The elevation is high, the air is thin, the length is only 7,330 yards but heavy hitters like Dustin Johnson were successful last year. With a combination of players with high finishes the last few weeks, those excelling with their iron shots (proximity to the hole) and those who are dominant in Strokes Gained: Off-the-Tee, here are my teams:

Justin Thomas
Kevin Chappell
Francesco Molinari
Brendan Steele
Xander Schauffele
Webb Simpson

Tommy Fleetwood
Chez Reavie
Paul Casey
Alex Noren
Patton Kizzire
Charley Hoffman

One Major Challenge for Tiger

Pasted Graphic
(Courtesy: Jamie Squire/Getty Images)

Call it a comeback. At last weekend's Honda Classic, Tiger Woods finished 12th at even par and seven strokes off the pace. In his last five events on Tour, he has three Top-25 finishes, coming within eight shots of the lead at tournament's end each time. More specifically, last weekend his proximity to the hole
led the field at 29 feet, 3 inches (with greens hit in regulation), and while it is not Tiger of old, there is an upward trajectory where it is safe to conclude he can be competitive again.

But with Tiger Woods, it is not about being competitive, it's about winning. At 42 years old with
a number of injuries throughout his career, will Tiger ever win another PGA Tour event? A major? Multiple majors? The aforementioned uptick suggests he's given himself an opportunity, but there's one factor that's perhaps more important than Tiger's performance:

The rest of the field has improved.

Let's say the Tiger era lasted from 1997-2009 and the post-dominant-Tiger era is from 2010 to now. This divide makes the most sense based upon his career. In the modern era of golf, Tiger owns the four lowest average scores per season; and, if you adjust for stroke average by tournaments,
Tiger owns the six lowest. Those seasons happened between 1999 and 2009. Since then, though no one has posted any one season of that caliber, from an article I wrote last year, the median golf score has gone down since 2006. And if you update 2017's median average score from the time that article was published, it's 70.94, a low score compared with the Tiger era.

Also, Tiger owns the largest margin of victory at an event during the modern era:
15 strokes at the 2000 U.S. Open. Since 2010, the largest margin of victory at a major is eight strokes, happening twice (2011 U.S. Open by Rory McIlroy and 2012 PGA Championship, also by McIlroy). Yes, Tiger's run is superior than what any golfer has mustered since, but the smaller margins of victory and greater dispersement of tournaments wins is because of more golfers able to challenge for golf's top prizes.

There's something else explaining stiffer competition. In 2007, Jennifer Brown of the University of California, Berkeley released a paper explaining how, on average,
highly skilled golfers' scores are 0.8 strokes worse when Tiger Woods is playing in the same tournament, compared with if he is not there. This disparity does not exist now for a few reasons. First, Tiger has not won a tournament since 2013 and hasn't won a major since 2008. Second, health continues to be a talking point about Tiger, given he has three withdrawals since 2013, played in far fewer tournaments and missed the cut with greater frequency. Finally, if opponents know Tiger will play well, they are likelier to play riskier golf because otherwise they know they will lose if they play their usual game. This idea is part of a paper I have frequently cited from Brian Skinner about knowing competition and recognizing when having a riskier gameplay is the only way to win.

The field knows Tiger is not what he was and the field itself has improved. Jordan Spieth, for instance,
tied Tiger's course record at the Masters. Dustin Johnson is consistently at the top of the leaderboard in Strokes Gained: Off-the-Tee. And Justin Thomas just earned his 8th victory before the age of 25, just the third golfer ever to accomplish that feat. An improved field is just one of many challenges for Tiger Woods, but if he does return to winning, it would make the comeback all the more impressive.

The Patriots...and Now the Tide

Pasted Graphic
Anyone looking to tease the Atlanta Falcons mercilessly might write that 28-3 score somewhere prominently or even wear that scoreboard on a t-shirt.

Expect Alabama fans to do the same whenever they need to remind Georgia fans about their team's collapse.

Alabama's Nick Saban once worked for Bill Belichick, so knowingly or not, Saban reflected his former boss in terms of how he engineered coming back from a two-possession deficit in the second half of the National Championship Game. He switched quarterbacks at halftime (opting with a true freshman quarterback with little experience up to that game), he wanted more throws down the field (Alabama completed four passes of 15+ yards in the second half, compared with completed none in the first half) and he demanded his defense take more gambles getting to the backfield (nine tackles for loss in the second half versus only three in the first half).

Belichick also took more risks in Super Bowl LI, knowing that these gambles were the only ways he could possibly win the game. If they failed and the Patriots fell into a deeper hole, it didn't matter; they were going to lose anyway, the size of the deficit does not matter.

In a previous post, I talked about a paper from Brian Skinner: "
Scoring Strategies for the Underdog: A General, Quantitative Method for Determining Optimal Sports Strategies". Skinner explained how underdogs must call riskier plays to have a chance at success. It might mean getting shellacked, but by finding specifically how much riskier a team should get, it might be the only way for those trailing to win.

Nick Saban had to make the change at quarterback because there was almost no chance he was going to win with Jalen Hurts. He had to take more throws down the field because he did not have the time to grind it out with the rushing attack. Lastly, he had to take more chances defensively because if Georgia mounted lengthy drives, there wouldn't be any time left for Alabama to have a chance to complete the comeback.

Some fans still seem surprised by these comebacks, calling them "improbable" or "unbelievable". While they are fantastic for football, it is not a coincidence that the coaches mounting these comebacks not only have won championships, they have been with their respective employer for years, with job security that seems undeniably stable. It is possible coaches who do not have this kind of job security are nervous to be blown out in any game, must less a contest for a championship. Any boss insinuating that the margin of defeat matters can have devastating consequences to the likelihood of a comeback.

Hopefully coaches will be more confidence in a deficit, take more risks, and football fans can watch even more competitive games.

Georgia or Alabama?

Pasted Graphic
The field is set inside Mercedes-Benz Stadium in Atlanta for college football's national championship game. Aside from the playoff logo in the center, it looks a lot like what the SEC Championship will probably look like for years to come. Alabama has shown few signs of slowing down from its dynastic pace, while Georgia's achievements on the field and in recruiting suggest they may be that next major program to become a staple of the playoff.

Those games in the future will never have the stakes of tonight. So who will win?

As previously mentioned, Charles South and I put together a prediction model using advanced analytical techniques (you can see our
poster presentation here). Quick warning: you are about to see a long list. The significant variables—pertinent to tonight—that determine the outcome of a football game are:

- Yards per Pass Attempt
- Yards per Rush Attempt
- Rush Attempts
- Total Yards
- Yards per Play
- Turnovers
- Opponent Points Scored
- Opponent Yards per Rush Attempt
- Opponent Total Yards
- Opponent Turnovers
- Opponent Penalty Yards
- Average Point Differential
- Opponent Offense Passing Yds
- Opponent Offense Yards per Rush Att
- Opponent Offense Total Yards
- Opponent Offense YPP
- Opponent Def Total Rush Yds
- Opponent Defense YPRA
- Opponent Defense Total Yards
- Opponent Def Yards Per Play
- Opponent Defense TO
- Opponent Avg Points Differential
- Difference in Win %
- Recruiting Rankings

If you survived reading that long, congratulations! What's important to learn is the Bulldogs and Crimson Tide excel in just about every category. The difference in yards, points and statistical increments are razor thin, no matter your perspective. Without going into every variable, we can summarize several of them into overall offense, defense, schedule and recruiting.

Georgia's rushing attack with Sony Michel and Nick Chubb comprise most of its offense. They overcame the massive deficit in the Rose Bowl, they make the game manageable for a freshman quarterback and, as part of the backfield, they average more yards per carry and rushing attempts than Alabama. Neither team throws it much, though Georgia is more efficient through the air, by roughly one-third of a yard per attempt. Though Alabama is less efficient overall, some of that fact can be attributed to having big leads early in games, then cruising the rest of the way; it is why the Tide have more total yards than the Dawgs and Bama quarterback Jalen Hurts is the second-leading rusher on his own team, to preserve those leads.

Defensively, there seems to be few weaknesses with Alabama, though outside linebacker Anfernee Jennings will not play because of a knee injury. Near the end of the regular-season the injury problems mounted, but were under control in the Sugar Bowl, limiting the number-one ranked team to just six points and 188 total yards. Its rushing defense is best in America, allowing 2.7 yards per carry. The team passing efficiency defense also gives Bama the edge. Led by safety Minkah Fitzpatrick, they've allowed just seven passing touchdowns and has an efficiency mark a full 17 points better than Georgia (1st in college football vs 13th nationally).

These statistics can be misleading given the small sample sizes in college football. Georgia did play an additional game, and often another contest can help a team historically. Alabama has only a slightly better point differential this season than Georgia. The Bulldogs faced the best offense when it comes to passing efficiency (Oklahoma). The best Alabama went up against was Auburn at 13th; a game they lost (Georgia split the two meetings). The Bulldogs got to face a Top 10 rushing attack in Notre Dame, while the Tide never faced anyone in the Top 25. The best passing efficiency defense Alabama faced was in the Sugar Bowl (5th) while the best Georgia saw was 19th (Auburn). The schedule favors Alabama but only slightly.

Finally, our study used
247Sports Composite Class Rankings to determine who has the best talent. Our study highlights the second-year and third-year classes, but also analyzes the average ranking of the first three classes. In this case, Alabama had the top class the past three years, though Georgia consistency fielded a Top 10 group.

Again, it is clear how evenly matched these teams are and how similar they are in terms of their approaches and philosophies. It promises to be an exciting game, and while the unpredictable like turnovers or missed field goal attempts prove all of the difference, if what's controllable decides this game, Alabama should have a narrow victory.

Predicting the College Football Committee

Pasted Graphic
The penultimate College Football Playoff rankings are out and those conceivably in the running are:

1. Clemson
2. Auburn
3. Oklahoma
4. Wisconsin
5. Alabama
6. Georgia
7. Miami
8. Ohio State

Before predicting how the playoff will develop, it is important to keep a couple of things in mind. First, the College Football Playoff committee has
outlined some of the things they hope to accomplish picking the four teams. Among the most relevant items:

- Consider geography
- Avoid rematches in the regular-season
- Consider strength of schedule
- Consider conference championships won

It is also important some of the things the committee has never done in three years:

- Taken two teams from one conference
- Taken a two-loss team
- Taken three teams from the same region of the country

Using these guidelines, here is how the playoff will be decided:

- The winner of the ACC Championship between Clemson and Miami gets in, the loser is out.
- The winner of the SEC Championship between Georgia and Auburn gets in, the loser is out.
- Oklahoma gets in if they win the Big 12 Championship, TCU cannot get in.
- Wisconsin gets in if they win the Big Ten Championship. If Ohio State wins, they get in if TCU wins.
- Alabama gets in if Oklahoma loses OR Wisconsin loses.

It is impossible point differential matters in any of these league championship games (it is the committee, it is omnipotent). But chances are, we have our blueprint for who will compete for the national title in January.

Forced Into Success

Pasted Graphic
(Courtesy: Getty Images)

An odd thing happened to the Dallas Cowboys in their last couple of games: their opponents' starting kickers exited their games early with injuries. Philadelphia's kicker Jake Elliott suffered a head injury and Los Angeles' kicker Nick Novak experienced back problems. Both teams had to resort to emergency backups during the game, with less than ideal results. Each backup was seen missing the practice the net on the sideline while warming up.

The difference between the Eagles and Chargers is how they adjusted to losing their kickers. Philadelphia opted to avoid kicking all together, not attempting field goals and going for two instead of extra point tries. Los Angeles remained conventional, playing as if they had its kicker. The results are drastically different. The Eagles went for 2-point conversions on four occasions, converting three of them. They also faced a fourth-and-5 from the Dallas 17-yard line, scoring a touchdown on the play. Even if you assume Philadelphia would have made that field goal (and every extra point attempt), by not using a kicker, the team gained five points. As for the Chargers, Drew Kaser missed two extra points and still had Novak make one more attempt, which he missed. Had Los Angeles gone for two after all four of its second-half touchdowns, and if we assume they would have converted half of them (the league average), they would have netted three points.

As a result, Los Angeles' conventional wisdom cost them three points, while Philadelphia gained five points with aggressive play calling. In other words, the Eagles were eight points better with their approach.

There is plenty of analytical research suggested NFL teams
kick fewer field goals or attempt more 2-point conversions. While these findings have been perpetually published for years, it hasn't changed the sport very much. Teams are still attempting roughly as many field goals and extra points as ever, even though offenses have improved and extra points have become more difficult. While teams refuse to implement this research, a real life example happened in the span of one week where one team put itself in a better position by kicking less. It doesn't explain everything, but it can spotlight one reason why Philadelphia has the best record in the NFL, while Los Angeles is on the fringe of the playoffs.

Gary Patterson is the Most Hated Man in College Football

Pasted Graphic
(Courtesy: Getty Images)

It's not Nick Saban, Urban Meyer or some college football pundit who polarizes fan bases to insanity, just for that monthly paycheck.

It's TCU head coach Gary Patterson, who's led the program since 2000, including a pair of conference transitions and two New Year's Six Bowl victories. Despite few controversial issues within his program, Patterson earns this distinction because of who he is and where he works.

Who he is, is a winner. Perhaps most notable among his accomplishments, his teams are 43-5 when ranked in the Top 10. This record suggests the longevity of having played so many games near the top of the poll du jour, but also a near perfect winning percentage when expected to succeed.

Where he works is a small, private university with
roughly 10,000 students. To compare, this student body is 1/4 the size of Alabama's and roughly 1/5 the size of other highly touted college football schools like Penn State and Ohio State. Also, many of these schools are flagships of their own state, meaning their fan bases extend well beyond those who actually attend the university. Not only can't TCU boast being a flagship, it operates from a state with some of the larger followings in America like Texas and Texas A&M.

Gary Patterson is a successful coach who works for a small school with a smaller fan base trying to get his team into Year 4 of the College Football Playoff. He came close during the inaugural year of the playoff, but was pushed aside for: Ohio State (Baylor also finished ahead of TCU but was also left out, another small private university). Some will argue vindication for the eventual champion Buckeyes, but how TCU would have performed in the playoff that year remains a mystery, even more shrouded given its 39-point victory over 9th-ranked Ole Miss in the Peach Bowl. The gripes only grow louder knowing TCU
controlled games better than Ohio State, had a better defensive efficiency (a metric that predicts success better than offensive efficiency) and the strength of schedule between the Frogs and Buckeyes were roughly the same.

TCU's lone loss that season was to Baylor, and committees historically rank good losses worse than mediocre defeats. The trend seems counterintuitive, but rhetorically serves as an acceptable argument within college football. Also, because the Frogs and Bears split the Big 12 Championship, despite the head-to-head result, they could have "canceled each other out", opening the door for Ohio State.

Still, the only other school with a successful season these last four years most like TCU is Stanford, with an
enrollment roughly 50% larger than the Frogs'. In 2015, they won the Pac-12 Championship, but two losses locked them out. The last two-loss team to win a National Championship was LSU in 2007, so opportunities for those in Stanford's position have always been limited.

Today, TCU is in a more advantageous position than three years ago. The latest College Football Playoff poll has TCU ranked 6th. They will face 5th-ranked Oklahoma and could face the Sooners again in a separate Big 12 Championship Game, something that did not exist during the TCU/Baylor controversy. The conference added this contest because their analytics suggest the game gives a Big 12 team
a greater likelihood of making the Final Four. Two wins over a highly ranked Sooners squad would give the Horned Frogs an undisputed league championship, something that is a statistically significant variable for making the playoff. Their strength of schedule ranking would also increase and defensive efficiency may also rise because a win would include containing Sooner quarterback and Heisman hopeful Baker Mayfield.

Despite the lone loss, if TCU wins its remaining games, the Frogs' resume would be arguably as bulletproof as any one-loss team. The committee admits to wanting geographic diversity, but there would not be another program in that region of the country with a more attractive resume. If TCU is still left out, something should be considered amiss. Having a smaller following could be assumed as a factor for being left out. Gary Patterson would then spotlight a problem with this era of determining a National Champion: he has done virtually everything he can to put his team in a position to play for a title; and yet gets left out for a second year. A conspiracy theory, true or otherwise, that undermines the validity of the selection process, is something the sport and the committee would hate.

The Truth About 3rd Down

Pasted Graphic
Anyone paying attention to stats during an NFL broadcast has noticed 3rd down conversions being reported. It is an easy way for commentators to critique how clutch a team is and if an offense can maintain a drive when the pressure is at its peak. Obviously a team converting on 100% of its 3rd down attempts is probably winning the game, but otherwise it is not nearly as helpful a statistic as suggested.

For this exercise I took 10 seasons' worth of NFL data (2007-2016) and looked at conversion rates for 1st down, 2nd down, 3rd down and the number of regular season wins that team accumulated. Logically, it would make sense to have an increasing percentage with later downs because you often have fewer yards to go before moving the chains. The numbers reflect this trend: on 1st down, teams on average convert 20% of the time, on 2nd down it's 30.3% and on 3rd down it's 38.1%.

To make things simple, I then calculated a linear regression, treating wins as my dependent variable and keeping it continuous
so as not to lose information. Here are the results:

Pasted Graphic 1

As expected, every down is significant to wins at the 99% level, because the more you convert, the greater your chances of success. The degree to which each down matters does go up, as reflected by the coefficients increasing with each successive down. And, even though later downs should be easier to convert, the coefficient is still increasing, perhaps suggesting third down conversions do matter more than first and second.

However, the
R-squared and adjusted R-squared only hover around 28%. In other words, conversion rates only account for 28% of why a team wins or loses, so a 3rd down conversion percentage by itself is less that figure (22% if 3rd down rate is the only explanatory variable). While these rates are statistically significant (especially on 3rd down) they are also noisy.

In previous blog posts, I have outlined which factors best determine the outcome of football games (
and they are detailed in my Cowboys data visualizations). One reason why I never brought up 3rd down conversion rates is because of how noisy the variable is and how it takes away from 1st and 2nd down. Many others have their own ways of determining success based upon the down, but also the distance. I would suggest, for sake of ease, promoting the discussion of 1st and 2nd down success rates, both as a pair, but also as a bridge to what is a reasonable 3rd down to convert when those plays occur.

A New Explanation of Cowboys Graphics

Pasted Graphic
For the second-straight year, after every Dallas Cowboys game, I will post a recap of the game with an analytic visualization. Once again, these metrics sum up all of the important factors that determine the outcome of a football game. Some of the metrics are the same, while others are more refined and better reflect certain concepts.

Going from the top and working down, once again I will chart turnovers, one of the more impactful statistics in the game. The numbers reflect the turnover margin and the bars reflect how many turnovers were committed.

The next box will look at how the quarterbacks performed, often looking at
net yards per pass attempt. This metric is highly predictive; and while others may be more predictive, it is also far easier to calculate.

Perhaps the biggest change comes where it is labeled "Time of Possession/Rushing Yards". This metric was designed to determine who "controlled" the game. It has since been updated to look at how many rushing yards a team had per quarter.
As noted in a previous blog post, the more rushing yards a team scores later in the game, the likelier they are to win. The larger the number, the better that team "controlled" the game.

Overachiever/Underachiever refers to what the Cowboys' record should be, relative to their point differential for the whole season. In baseball, this idea is referred to as the
Pythagorean Expectation. In football, there is debate as to how to calculate such a record, but here, the exponent is 2.37: ((Points for^2.37) / (Points for^2.37 + Points Against^2.37)) * 16.

Finally, scoring efficiency has been tweaked. The idea here is to see how many points teams scored, relative to the number of yards they needed. The larger the bar and the bigger the number, the more efficient the team was. Simply put, it's points divided by yards, then multiplied by 15.457886 so that average is approximately 1. Using data from 2009-2016, we can also see if a team was overall good, average or bad in its efficiency. If the result is less than .949394, the team was inefficient. If the result is between .949395 and 1.057116, the team was average and gets a blue bar. If the result is greater than the aforementioned range, they were efficient and get a green bar.

Again, these metrics are meant to capture nearly everything that happened in a game that pertained to the result. Some of these metrics can also be used to forecast future games, but the intent is solely inference.

No Need to Establish the Run

David Johnson

Arizona Cardinals running back David Johnson (left) may understand the importance of balancing between rushing and passing about as well as anybody. Last season, he finished with the most touches, all-purpose yards and rushing/rec touchdowns of anyone in the NFL. For an encore, his head coach says he wants Johnson to average 30 touches per game.

It's one thing to strike the right balance between how to use Johnson as a rusher and as a receiver; it's another to make these decision relative to the time of the game. Conventional wisdom in football has always championed the idea of "establishing the run"; meaning no matter how long it takes to create an effective run game, it should be a point of emphasis early in a contest. More recently,
rushing plays are called less frequently, regardless of what the clock reads. Knowing this recent trend, there is a way to explain why, at least analytically, attempting to establish the run is unnecessary.

I took NFL play-by-play data from the 2010 thru the 2015 seasons. This information included which team won and lost. Then, using only rushing plays, I summed up the rushing yards each team had per quarter, per game (in this analysis, I am not including overtime rushing yards because of how infrequently they appeared, but also how much they swayed the results because so many rushing yards will essentially end the game). Using a
logit regression with "win" as a binary dependent variable and rushing yards per quarter as my explanatory variables, here is the output:

=========================================
Deviance Residuals:
Min 1Q Median 3Q Max
-2.8447 -0.9786 -0.5544 1.0545 2.0701
Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept)
-1.747385 0.105946 -16.493 < 2e-16 ***
yards.gained.1
0.006508 0.001922 3.386 0.000708 ***
yards.gained.2
0.007091 0.001953 3.632 0.000282 ***
yards.gained.3
0.015546 0.001910 8.137 4.05e-16 ***
yards.gained.4
0.035783 0.002156 16.594 < 2e-16 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4251.8 on 3066 degrees of freedom
Residual deviance: 3711.2 on 3062 degrees of freedom
AIC: 3721.2 Number of Fisher Scoring iterations: 4
==========================================

First, all of these variables are statistically significant at the 99% level, which makes logical sense. The more yards a team has, no matter the type, the likelier they are to win. Second, there is a direct relationship between the time of the game and the magnitude of the coefficient. In other words, as the game goes on, the more important rushing yards are to the game's outcome. Having the largest coefficient for the fourth quarter makes sense because teams that are leading are trying to take time off the clock, and rushing makes that motive easier to fulfill. However, that the third quarter has a greater magnitude than the first half could suggest there is no statistical advantage to "establishing the run".

It is also important to convert these coefficients to
odds ratios to know how important each rushing yard is to winning. Specifically, an extra first quarter yard increases the odds of winning by a factor of 1.0065. In the second quarter, it's 1.0071, a small difference. In the third quarter, it is 1.0157 and in the fourth, it is 1.0364.

There may be a value to wearing down a defense by running the ball earlier in a game, but from this data and regression, it is not captured. It may also be possible a running back needs several carries before knowing how to dissect a defense later in a game; but again, this idea is not captured aggregately. Again, establishing the run may not be as crucial an idea as originally thought.

However, one conventional bit of wisdom that is reflected is the idea a team controls the game more effectively by running the ball later in the contest. Quantifying how a team controls a game can be captured using a study like this one. In fact, I plan to use this analysis in my weekly Cowboys postgame graphics that explain why Dallas either won or lost a particular contest. I will go over these upgraded graphics in a later blog post.

(Special thanks to
Luke Stanke for providing the data and helping me with the code!)

...One More Thing About the PGA Championship

Pasted Graphic
(Courtesy: Stuart Franklin/Getty Images)

At one point, there was a five-way tie atop the leaderboard during the back nine of the final round of the 99th PGA Championship. Then, Justin Thomas cards a birdie on the 13th hole, enters the Green Mile with a par on 16, a birdie on 17 and an insignificant bogey on 18. While the rest of the field struggled to finish, Thomas blazed through the toughest closing stretch at a major this year, to capture his first Wanamaker Trophy.

My pick to win, Hideki Matsuyama, fared more than respectably, finishing tied for 5th. But as I watched the television coverage of the moments he struggled, one of the commentators pointed out his performance mirrored that of last year's PGA Championship, where he was the best hitter of the golf ball, but could not make any putts. At that point, he finished tied for 4th.

This year, Matsuyama missed a few critical putts, but he was 12th in Strokes Gained: Putting. However, SG: Approach the Green and SG: Around the Green were 20th and 27th, respectively. As for the champion, Thomas was tied for 15th in SG: Approach the Green, 22nd in SG: Around the Green and 4th in SG: Putting. Overall, these numbers are slightly better and equaled a commanding win.

I am reminded of a paper by Dr. George Kondraske of UT Arlington titled: "
General Systems Performance Theory and its Application to Understanding Complex System Performance". In it, Kondraske attempts to explain human systems through complex machines. Regressions have a number components that are often considered additive (which is why we have a lot of "+" signs in our equations). But if one explanatory variable is largely deficient, it is not satisfactory to say the dependent variable decreases by the same amount. The output depends upon everything working together; components are so interconnected that any one piece that does not work or is largely deficient means the entire system might fail to perform.

What does this have to do with golf? If someone cannot putt at all, they will post a high score and have no chance of winning a tournament; they cannot simply overcompensate with a longer drive or a more accurate iron shot. Granted, professional golfers are at least competent in every component of a golf game, but any significant deficiency makes for a bigger setback than simply subtracting odds to win based upon a negative strokes gained metric.

This approach is intuitive to golf enthusiasts. It is why golfers work on everything, not just emphasizing the skills with which they excel. What matters here is when data scientists are putting together models for forecasting winners, perhaps it is important to think less linearly. Maybe it has less to do with the sum of skills coming together and how they fit with a particular course, and more about if every skill is adequate for the demands of a specific tournament. Justin Thomas' skills certainly were.

Who Will Win the 2017 PGA Championship?

Pasted GraphicThis year, the Wanamaker Trophy will be claimed at Quail Hollow Club, the same course that hosts the Wells Fargo Championship (previously the Wachovia Championship). No analysis of this year's PGA Championship would be robust without discussing Rory McIlroy's domination there.

A favorite to win the last major of the season, McIlroy has two victories and once lost in a playoff, in seven appearances there. He also made the cut six of seven times and owns the course record, shooting a 61 in 2015. Also, as I mentioned in a previous article, McIlroy is not only successful in PGA Championships, he is one of the more dominant golfers of any specific event on Tour (even if that major is a hodgepodge of characteristics where no particular abilities stand out). You add to his resume that he has a pair of Top 5 finishes his last two tournaments, and McIlroy seems poised to win for the third time at the PGA Championship.

However, as we have learned with other tournaments,
Strokes Gained statistics have incredible predictive power. When it comes to who has won in North Carolina before, sometimes an already dominant golfer came in and continued his momentum to victory. More recently, Strokes Gained: Around-the-Green has become more crucial to success:

Pasted Graphic 3

There are two periods when a player needed to rank in the Top 40 in SG: Around-the-Green: 2005-2007 and 2014-2016. This season, the Wells Fargo Championship was played elsewhere so Quail Hollow could be redone for a major. The two important changes here are the removal of trees and the adjusting of the front nine to where the final yardage is shorter but likely more challenging. It's possible these two details make SG: Around-the-Green all the more important.

At this point, the players leading in this statistic are: Ian Poulter, Jason Day, Bill Haas, Pat Perez and Cameron Smith. McIlroy barely cracks the Top 80. Jordan Spieth, another favorite who could complete the career Grand Slam at age 24, is 18th. As for Strokes Gained: Off-the-Tee, another stat with some predictive power, the current leaders are Jon Rahm, Dustin Johnson and Sergio Garcia. In terms of skills shown this season, there are several players who are perhaps more suited to win a revamped Quail Hollow than the favorites.

Perhaps the one player that seems to have put it all together, at this point, is Hideki Matsuyama. Fresh off a win at the WGC-Bridgestone Invitational, he is one of only four players with three wins on Tour this season. He also ranks 11th in Strokes Gained: Around-the-Green and 11th in Strokes Gained: Off-the-Tee. Lastly, he finished fourth in last year's PGA Championship and has two Top 20 finishes in the last four seasons. In other words, he overcomes the slightly lower statistical rankings than the aforementioned players with overwhelming momentum and overall success with this specific event. While I expect solid games from the favorites, I am picking Hideki Matsuyama to capture his first major.

The Statcast Revolution

Pasted Graphic
There are more statistics about hitters than ever before. Thanks to Statcast, a baseball fan can learn how fast a ball comes off a bat from any hit, the angle the ball leaves the bat, an accurate distance the ball travels, etc.

These statistics can help characterize and differentiate hitters. A potential extension to these statistics is if they can predict a hitter's success. For instance, if a hitter averages a higher exit velocity, does that mean he is generally a better hitter?

Fangraphs has kept a database with averages of these Statcast statistics for every hitter. Even though there is some missing data, Jeff Zimmerman made necessary corrections based upon the type of balls in play fielded by certain positions. Using 2016 season data, the variables include:


It makes intuitive sense for the second half of this list to be relevant to a hitter's success, but what about the first half? To answer that question, I merged this same dataset with other advanced offensive statistics for these same hitters (this data came from
Baseball Reference). While it would make sense to choose offensive wins above replacement (oWAR) as my dependent variable, there is a problem. WAR is an aggregate, meaning it can add up with additional plate appearances. Because I am already using averaged statistics for hitters and want to look at the average impact each statistic has to a hitter's overall performance, I divided oWAR by plate appearances and then multiplied by 1,000, so as not to have too many zeroes after the decimal point (this variable is named oWARavg).

The next step is to determine which of the first group of variables is significant at the 95% level. I am using a
backward elimination technique, where I start with a regression with all three variables, then remove any of them that are not significant. By executing this approach, the only variable eliminated was speed. In other words, the average exit velocity of a batted ball is not a significant indicator for how successful a hitter is. However, the angle of the batted ball and the distance it travels are significant:

Pasted Graphic 1

The angle has a negative coefficient, meaning batted balls not hit as steeply tend to be hits. Distance has a positive coefficient, which makes intuitive sense, because the farther a ball travels, the likelier it becomes a hit or maybe even a home run. As accurate as these findings are, the adjusted R-squared is only .1687, meaning only approximately 17% of these two variables can explain the variability of average offensive WAR.

Just for fun, let's see what impact angle and distance have when the second group of variables are included in a regression. Again, using the backward elimination technique, here are the results:


Pasted Graphic 2

Once again, backward elimination took out exit velocity. It also took out the expected ratio of home runs to fly balls. While it kept the original ratio, the negative coefficient does not make intuitive sense. The logic is the more home runs hit out of fly balls, the more successful a hitter is. Instead, this model suggests the alternative. However, a positive isolated power does make logical sense and the adjusted R-squared is approximately 40%, making for a model that does a better job explaining what makes for a successful hitter.

Obviously there are a lot more advanced offensive variables that could be included in a model like this. At least there is a statistical approach for determining which variables Statcast emphasizes that explain offensive success. A similar study can be conducted when looking at baserunning, pitching, defense, etc.

Who Will Win the Dean & DeLuca Invitational?

Pasted Graphic 1
Before offering a prediction for who will wear the plaid jacket as the winner of the Dean & DeLuca Invitational, here is a quick recap of the Byron Nelson.

Sergio Garcia, my pick, did have his moments. He did card a 29 for his Back Nine on Saturday. But several mistakes led to an incredible unraveling for his Sunday round. Also, Billy Horschel could have been a more credible dark horse pick, his Strokes Gained: Off-the-Tee, which I concluded was the most telling for the Nelson, had him in the Top 50 on the PGA Tour. He missed the cut in his last four tournaments, but for a course that emphasizes the tee shot, it should not be as big a surprise Horschel won, given the unpredictability of the tournament.

And now, the Tour heads to Colonial. This tournament is much easier to predict because history is a better indicator for success. Jordan Spieth finished 2nd, 14th and 7th there before winning the event last year. Eleven men have won multiple titles at Colonial, compared with the five at the Nelson.

Once again, let's look at the winners from 2004-2016, the years "
Strokes Gained" statistics are readily available using ShotLink data. The most predictive component for the Dean & DeLuca is Strokes Gained: Approach-the-Green. How golfers do on tee shots on Par-3's and approach shots on Par-4's and Par-5's are most predictive. In fact, Spieth is the only player to rank outside of the Top 75 in this statistic when he won last year. He made up for it with his knowledge and previous success on the course. Strokes Gained: Off-the-Tee is also an important indicator, with most players ranking in the Top 50 before competing.

It might be shocking, but the golfer who currently ranks 2nd in Approach-the-Green is Jordan Spieth. Even though he has missed the last two cuts, his approach shots have often not let him down. The next best golfer who is in the tournament field is Webb Simpson. He has only played this event three times. Though he missed the cut his first two appearances, he finished tied for third last year. Spieth has had recent struggles, while Simpson has a couple of Top 20 finishes in two of his last three tournaments. It would not be a surprise for Spieth to repeat as champion, but my pick is Webb Simpson.

Who Will Win the Byron Nelson?

IMG_6351
Last year, Sergio Garcia became just the fifth golfer ever to win multiple titles at the Byron Nelson. Given this tournament has been around since 1944, it shows just how difficult it is to predict this tournament.

It does help the field is stronger than usual; eight of the top 20 golfers in the world will participate, including Dustin Johnson, Jason Day, Jordan Spieth, and of course Sergio. In fact,
Vegas Insider is giving these highly ranked golfers the best odds to win, most notably Johnson at 5/1. On the surface, this mark makes sense, given he has already won three times this year, more than anyone else on Tour.

But as with most golf predictions I have done, I place an emphasis on
strokes gained statistics. These measurements look at how well a golfer does in each phase of his game, compared with the rest of the field. For instance, strokes gained putting looks at how many putts a golfer needs to complete a hole at a specific distance, so if the average golfer needs 1.5 putts to complete a hole from seven feet, 10 inches, the golfer who sinks the putt gains 0.5 strokes, but a two-putt means they lose 0.5 strokes. These totals are then aggregated for the season.

ShotLink data has this information readily available since the 2004 season. Given the renovations TPC Four Seasons made to the course since that year, this time frame may be enough data for us to have a glimpse into what qualities a golfer needs to have to be successful at this particular tournament. I am using four statistics: Strokes Gained: Off-the-Tee, Approach-the-Green, Around-the-Green and Putting.

The statistic with the best ranking for success is Off-the-Tee. In other words, how well a golfer does from the tee box on all par-4's and par-5's is the best predictor for winning the Byron Nelson. Here is how golfers ranked in this statistic just before competing in the Nelson:

Screen Shot 2017-05-15 at 5.56.06 PM

Other than Steven Bowditch in 2015, every golfer ranks in the Top 100, often in the Top 60. As of the end of the PLAYERS Championship, here are the top ten golfers in Strokes Gained: Off-the-Tee

1. Sergio Garcia
2. Dustin Johnson
3. Jon Rahm
4. Tony Finau
5. Bubba Watson
6. Kyle Stanley
7. Patrick Cantlay
8. Justin Rose
9. Hideki Matsuyama
10. Hudson Swafford

Of these ten, only Garcia, Johnson, Finau and Swafford are competing. Finau and Swafford have played this event far fewer times and Swafford has never finished in the Top 30. As for the other two players, Johnson has played at the Nelson seven times and has averaged a score of 68.54, including four "Top Ten" finishes. Garcia has played the event 12 times, has averaged a score of 69.07 and has the same number of "Top Ten" finishes. The difference is, Garcia has won the Byron Nelson twice and also has a third-place finish.

The volatility of this tournament might make this exercise seem foolish, but history does show, three of the five multiple winners won in back-to-back years. I am picking Sergio Garcia to become the fourth to win back-to-back Byron Nelson championships.

The Cleveland Browns Won the Draft

Pasted Graphic
You may already be thinking: "Of course the Cleveland Browns had a great draft! They had the number one pick! Myles Garrett was the obvious move! You can't screw that up!"

You haven't been keeping up with the Browns, have you?

Cleveland picked a defensive end from Texas A&M who was so respected in College Station, two assistant coaches came to his draft party in Arlington to present him with a framed jersey (Garrett is also the Aggies' first-ever number one overall pick). During the combine, as
NFL Research pointed out, Garrett is:

  • Taller than Julio Jones
  • Heavier than Rob Gronkowski
  • Quicker than Devonta Freeman
  • Faster than Jarvis Landry
Cleveland could have drafted a quarterback like Mitch Trubisky or DeShaun Watson, but instead went with the pass rusher. Nothing is a guarantee when it comes to who will have the best NFL career, and the Browns have had failures with top picks in the last several years (i.e. Trent Richardson, Johnny Manziel, Justin Gilbert, etc.) What matters here is how much value the Browns acquired simply with moves they made in the draft.

NFL Draft charts have been around since
Jimmy Johnson and the Dallas Cowboys popularized their own in the 1990s. As sports analytics have become more commonplace, others have come out with their own. But one that is worth noting is a chart by Michael Schuckers of St. Lawrence University. Using games started, Schuckers used a LOESS function to assign value to each pick (to read his entire paper, click here). Here is the table he came up with:

Pasted Graphic 1

What Schuckers extrapolated from his study was that teams tend to overvalue earlier picks and undervalue later ones. The Cleveland Browns seemed to believe the same thing, and stockpiled multiple draft picks in the last couple of years. Here are the trades they made and how much value they acquired, using the chart:

Pasted Graphic 2

Note: + is a second round pick to be determined
++ is a first round pick to be determined

Because two of these picks are undetermined, I used the lowest possible value and added that to the Minimum Known Value Added column, when applicable. Even by doing that, every move the Browns made added value to their draft class. Here is who the Browns drafted last year and how many games they started, in parenthesis:

  • WR Corey Coleman (10)
  • DE Emmanuel Ogbah (16)
  • DE Carl Nassib (3)
  • OT Shon Coleman (10)
  • QB Cody Kessler (8)
  • LB Joe Schobert (4)
  • WR Ricardo Louis (3)
  • S Derrick Kindred (5)
  • TE Seth DeValve (2)
  • WR Jordan Payton (0)
  • OT Spencer Drango (0)
  • WR Rashard Higgins (0)
  • CB Trey Caldwell (0)
  • ILB Scooby Wright III (0)
Combined, this draft class has 61 starts. Yes, this draft class was part of a 1-15 team, bad enough to acquire the top pick in the 2017 draft, but these rookies beat out more experienced players, so it might be safe to say Cleveland did not have much talent before this approach.

The Browns drafted 10 players this year, and currently have a dozen picks for next year's draft. Myles Garrett can be a complete bust, and the Browns have enough insurance, in the form of younger players, to keep going. But if Garrett is as advertised, not only will the Browns have won this year's NFL Draft, they will start winning a lot more games.

Updates to the Site

Pasted Graphic
In the coming weeks you will see a few minor design changes to Inside Sports Analytics. There are a couple of things we have done already. The first is we have added a lot of new photos to the Photo Album that features my journey covering the Dallas Cowboys, NASCAR, college sports, that Browns mascot, etc.

The other change is more of a Call to Action. We are always looking to promote good analytic research. Already we are including
QuantCoach, a site devoted to analytics in NFL coaching. If there is a series of white papers, blog, anything you would like for us to include in our Resources page, please send me an email under Contact Edward or send me a tweet @EdwardEgrosFox4.

Thanks again for visiting Inside Sports Analytics! We'll return to journalism in our next post!

Which Golfers Dominate Where

Pasted Graphic 1
Jordan Spieth was bound to win the plaid jacket at Colonial Country Club. In the three previous times he played the Dean & Deluca Invitational, he finished in the top 15 every time, including a second-place finish in 2015. Spieth mentioned how much the win meant to him because it was a course and tournament he grew up attending.

Outside of Tiger Woods’ heyday, there often seems to be some randomness at the top of the leaderboard of any event. However, like with Spieth at Colonial, some golfers dominate specific courses and tournaments because they simply know it better.

I looked at 15 of the more lucrative tournaments in the world and analyzed how the top 25 in the Official World Golf Ranking faired at each one for their entire careers (I will analyze 46-year-old Phil Mickelson later because he has played much longer than everyone else in the group). Using a top ten finish as the qualification for success, here are six of the more current dominant performances:

Pasted Graphic 3


By this ranking, the most current dominant performance at particular course belongs to Dustin Johnson when he plays at the Genesis Open (at Riviera). Out of ten appearances, he’s had a top ten finish seven times (and won it outright this year).

What should also stand out is how frequently Rory McIlroy appears on this chart. He has become one of more successful golfers in the world by consistently performing well at specific tournaments, including the Wells Fargo Championship, the WGC-HSBC Champions and the PGA Championship. He has also had a high rate of top ten’s at the U.S. Open, WGC-Dell Match Play and Bridgestone Invitational.

It is important to note this chart groups tournaments together, not necessarily the courses. It makes Jason Day’s work at the U.S. Open perhaps more impressive, considering every top ten finish for that major has happened at a different course.

As for Lefty, his favorite tournament might be Wells Fargo, where he’s had top ten finishes 69% of the time. His second-most dominant is the Masters, at 63%. While much is made of his oh-so-close victories at the U.S. Open, only 38% of the time he cracks the top ten.

You may be wondering why Jordan Spieth failed to make the chart. After all, he’s finished first or second in every Masters appearance. In all of the lucrative tournaments analyzed, he has far fewer starts than most everyone else. However, at many of these events, he is on pace to be as dominant at the Masters, Tour Championship and WGC-Bridgestone Invitational, as he already is at Colonial.

(Special thanks to ShotLink for providing the data)

Are We Witnessing the Best Golf Ever?

Last January, Adam Hadwin shot a 13-under 59 at the CareerBuilder Challenge in California. Though it’s a dream scorecard, sub-60 is no longer a rarity. Just in the week prior, Justin Thomas posted a 59 at the Sony Open. Last August, Jim Furyk carded 58 at the Travelers Championship. Of the nine sub-60 round in PGA Tour history, three of them have happened in the span of roughly six months, out of 87 years of pro golf (in more than 1.5 million rounds of play, last I counted).

Because the odds are infinitesimally small these low rounds are by chance, it is safe to say golfers are improving. Equipment, athletic ability and coaching all play a part. But with several months left in the season, can we predict, right now, we are about to witness the best golf ever played?

Let’s first consider scoring average over the last 20 years, specifically, the median scoring on Tour:

Pasted Graphic 1

We had been seeing a significant decline in scoring beginning in 2007—with some fluctuation—but overall lower figures as recently as last year; however, so far this season, an uptick. What makes the higher median score so interesting is how much easier the early tournaments are, compared with the rest of the schedule.

Even for individual seasons, it will be difficult for anyone to match what Tiger Woods accomplished in 2000 and 2007. In both years, he finished with the lowest scoring and adjusted scoring average, ever, with a 67.79. This year, after the CareerBuilder Challenge and all of those historically low scores, even with the 59’s, the lowest scoring average was 68.715, roughly one stroke worse than Tiger’s.

Of course, devious course designers can always stay one step ahead and adjust conditions to keep scores from approaching zero (e.g. Tiger-proofing). Other statistics could better highlight if today’s golfers are indeed the best ever. However, metrics off the tee like driving distance has remained relatively steady over the last several years, though some tournaments show professional golfers are becoming more aggressive than ever before.

Where there might be significant improvement involves the less glamorous approaches and short game. Though the top Greens in Regulation percentages have hovered around 72% each season, this year the best is 75.69%, held by Jordan Spieth. More golfers can finish a hole with one putt. The best could have roughly 44% one putts for a season. In 2017, seven golfers have more than 44% success rate with one putts. But again, it is worth noting how much easier the start to a season is; these golfers have not faced the toughest challenges like The Players, the Barclays and any major championship.

What seems to be happening is not the next coming of 2000 Tiger, but rather, more golfers improving at roughly the same time at roughly the same rate. There are still milestones yet to be reached, like someone shooting a 62 for one round at a major, or less notably, a golfer carding 254 for a 72-hole tournament. There have been more golfers flirting with breaking these records in recent years, but no one has broken through. Sub-60 rounds are happening at easier courses where scores are lower and competition is not as fierce. But because fields are becoming saturated with similarly talented players, some of the better golfers still have to find other events to play. When they do, the occasional golfer could be poised to achieve that coveted 59.

If you believe talented playing partners and deeper tournament fields naturally make an individual golfer better, then the play we will witness this season could very well be the best we have ever seen. There may not be the lone star of golf, but a hodgepodge of pros who will make 2017 something to behold.

Will Jordan Spieth Win a Major in 2017?

Pasted Graphic 4
Leave it up to the U.S. Open’s official twitter handle to place tongue firmly in cheek when it comes Jordan Spieth’s victory at the Australian Open being a sign of things to come: “We all know what came after @JordanSpieth’s first #AusOpenGolf win...” followed by a photo of him holding the major’s championship trophy. In other words, only in the years he won the Australian Open did he win majors.


At the time of publication, no major tournament participants have withdrawn based upon this logic.

There are sounder ways to predict if Jordan Spieth will earn his 3
rd career major this year like momentum. Perhaps surprisingly, in a few ways, Spieth performed better in 2016 than he did in 2015, despite not winning any majors last year. We can illustrate this idea using “Strokes Gained” statistics:

Pasted Graphic 3

For those new to “Strokes Gained”, it simply means how many strokes a player gained or lost, compared with the rest of the field, based upon how they played in the four areas: off the tee, approaching the green, around the green and putting. Spieth was actually a better putter in 2016, it was primarily his iron and hybrid clubs letting him down. Fortunately for Spieth, putting is a better predictor for overall success than other phases of the game, so as long as he can continue improving in close range, he has opportunities.

Next, let’s look at each individual major, beginning with the Masters. When looking at a host of variables, there is no better predictor for future performance than past success. It is why I publicly predicted Spieth to win the green jacket last year, and I would have gotten away with it had it not been for that pesky Amen Corner. Still, nobody has played better at Augusta National the last three years than Spieth, so he is in the best position to win there again.

This year’s U.S. Open will be at Erin Hills. It is listed as 7,823 yards, which would be longer than any PGA Tour event played last season. Though Spieth is not one of the longer drivers on Tour, his U.S. Open win was at Chambers Bay, almost as long as this year’s event. Spieth’s advantage was he knew how to putt on the unique fescue greens better than most everyone else. This setup might pose problems.

Royal Birkdale will host The Open, a shorter links course. Perhaps one of the more underrated qualities of Spieth’s is his ability to play links courses well, compared with other Americans. As long as the momentum is there over the summer, Spieth can also contend there.

Finally, the site of the PGA Championship is Quail Hollow Club. It has hosted the Wells Fargo Championship since 2003. Predictably, familiarity with a course has helped Spieth over the years, but he has only played that tournament once, in 2013 when he finished tied for 32
nd. There may simply be too many other golfers with more knowledge of the course for Spieth to have a realistic chance.

Spieth already has a few Top 10 finishes in 2017, including a victory at Pebble Beach. In the last few months, he helped the Americans claim Ryder Cup win, earned an Australian Open victory and is 2nd on the Tour in greens in regulation percentage (one of the areas that was in need of improvement). His Strokes Gained: Putting has not been as strong this year, ranking 37
th, but a few golfers ahead of him have played more tournaments, so it remains too early in the season to suggest there might be a problem.

Because of the deep fields of majors, the odds are better “not” to predict any one golfer to win one of the big four. But for Jordan Spieth, there are enough reasons to believe he can capture another green jacket, win his first Claret Jug, or both.

Subscribers of the Aussie Open theory would agree.

2017 Sloan Sports Analytics Conference

Pasted Graphic
Another installment of the Sloan Sports Analytics Conference has come and gone. More than 3,500 were estimated attending the proceedings, learning and offering their latest research in the sports analytics world. While football and basketball are often the most popular sports here, there seemed to be a noticeable effort to highlight the quantitative strides made in other sports.

One panel featured golf analytics, led by Golfweek's
David Dusek, who highlighted the success stories of these quantitative tools. Jeff Price, Chief Commercial Officer of the PGA, offered an example of Team USA at the Ryder Cup. At Hazeltine National Golf Club, long par 5's meant emphasizing wedge play. It's this discovery that helped the Stars and Stripes to a decisive 17-11 victory.

On a more personal level, current professional golfer Jason Gore explained how to turn research into actionable results.

"When I talked to a sports psychologist, Fred Astaire would [practice and] put chalk on the floor," said Gore. "But once he grabbed Ginger's hand, he never thought about the chalk on the floor."

There is still room for growth.

"We're in the first inning of the data revolution in golf," said
Arccos Golf CEO Sal Syed.

Dusek pointed out some major tournaments like the Masters and United States Open still do not provide the media with advanced statistics.
15th Club CEO Blake Wooster says the potential is there to analyze how golfers perform under pressure. Lastly, the group seemed to agree lasers should be used to measure distance more accurately. Even Gore believed lasers used by caddies could speed up pace of play.

Pasted Graphic 1

The seminal football panel of this year's conference was unabashedly endorsing the concepts of its own sport's analytic revolution. It was even subtitled "Please Stop Punting", a concept where going for it on 4th down
yields more expected points and discounts a more traditional idea valuing field position.

Almost immediately, Baltimore Ravens offensive lineman John Urschel, who is pursuing a Ph.D. in mathematics from MIT, discussed a common situation he says coaches get wrong. When a team trails by 14 late in a game and score a touchdown, he says it is better to go for two than kick the extra point. The reasoning is, two points essentially give you the win with another touchdown, but even if unsuccessful, you can go for two again and achieve a tie, and because most teams convert two-point attempts 50% of the time, you are at least giving yourself a better chance at winning, with a small chance at needing a third score of some kind.

Mike Lombardi, former football executive and current analyst for Fox Sports, says analytics help with time allocation throughout the week, knowing what coaches should communicate with players and which statistics are important in determining the outcome of a game, such as 3rd down red zone defense.

"You don't establish the run, you establish the lead," said Lombardi. "Teams with the lead at halftime frequently go on to win," citing last year's Super Bowl champion Patriots as the top team with wins after leading at halftime, then citing the second-place team, the Super Bowl runner-up Falcons. The players, which included former Patriot Tedy Bruschi, explained how halftime is all about adjustments, but that they should take fewer than five minutes to implement.

From a front office perspective, analytics can help decipher if trading players and draft picks make fiscal and qualitative sense.

"The toughest thing to do in sports is to know what you're trading. It's why the Patriots won't trade [backup quarterback] Jimmy Garoppolo," said Lombardi.

Football discussion was not confined just to that panel. A couple of talks featured fantasy football and if there are things to give analytic players an advantage. Here are some tips from Tauhid Zaman, the KDD Career Development Professor in Communications and Technology at MIT, and Renee Miller, a neuroscientist at the University of Rochester:

- When picking a quarterback, get one or two of his receivers as well.

- Avoid players who cancel each other out, like a defense against one of your offensive players.

- We weigh football players' performances at the start of the year too heavily. Instead, looking at the bigger picture of their performance.

- Be careful of overconfidence: "The more data we have, the more confident we become in our decision making."

Pasted Graphic 2

Lastly, in basketball, while guys like Luis Scola seemed to get most of the attention from hoops fans, maybe the most direct knowledge given came from Seth Partnow, Director of Basketball Research for the Milwaukee Bucks. In his talk, "Truths and Myths of the Three Point Revolution in Basketball," Partnow offered the following bulletpoints:

- Defensive three-point shooting percentage is a useless stat because of the noise involved (good defenses prevent the shot).

- Long range shots in the NBA do not lead to fast breaks, it's shots around the rim that cause these.

- Ten of the last 12 NBA champions ranked in the top ten in three-point shooting.

As robust as this research might be, it does not offer a glimpse into the future of basketball analytics. However, one panel discussed solely how the sport will evolve thanks to quantitative tools. There may still be blowback from coaches and those who approach the sport more traditionally.

"When you're working with the [NBA] Draft…you end up trying to convince coaches," said Dean Oliver, a statistician who worked in the front offices of the Sacramento Kings, Seattle SuperSonics and Denver Nuggets. "You don't expect to win 100% of these arguments and that's fine."

Using analytics, a couple of panelists offered simple suggestions for improving the game. Former NBA player and coach Vinny Del Negro wants the league to add a fourth referee because the pace of the game has gone up and it is getting tougher for officials to keep up. WNBA point guard Sue Bird wants to get rid of the shootaround because of the rest players need and the lack of proof shooters develop a rhythm because of this routine. She also wants the analytics to assist in the psychology of a team.

"If I were a general manager, I'd want to know if [players] retain information well and how they handle things under pressure," said Bird.

The flexibility of these tools, spanning different sports and perhaps different fields of expertise, perhaps proves why this conference has lasted as long as it has.

Pasted Graphic 3

(All photos courtesy of Sloan Sports Analytics Conference).

The Art of the Comeback

Pasted GraphicLast November, arguably five million people attended the Chicago Cubs victory parade, celebrating the team's first World Series Championship since 1908.

Last Summer,
Cleveland hosted hundreds of thousands of Cavaliers fans to celebrate that franchise's first title and the city's first pro championship in more than half a century.

This year in New England, they constantly win. We move on.

The common storyline among these three winners is "The Comeback". The Cubs overcame a 3-1 deficit in the World Series to claim their championship in an extra-inning Game 7, the Cavaliers also stormed back from down 3-1 in the NBA Finals and the Patriots trailed Atlanta by 25 in the second half of Super Bowl LI, to win in overtime. These comebacks were also nearly unprecedented.
Only five teams had come back from down 3-1 to win the World Series before the Cubs. Cleveland became the first NBA team to overcome a 3-1 deficit in the Finals to win. And, New England's 25-point comeback win is the largest in Super Bowl history. The second largest ever is merely ten points.

This confluence of sports drama may seem like supernatural intervention, but perhaps it can be explained in earthlier terms. In 2011, Brian Skinner published "
Scoring Strategies for the Underdog: A General, Quantitative Method for Determining Optimal Sports Strategies". Skinner explained how underdogs must call riskier plays to have a chance at success. In this case, we can refer to teams significantly trailing in series and games as underdogs when their probability of winning is significantly below 50%. Calling riskier plays might mean getting shellacked, but by finding specifically how much riskier a team should get, it might be the only way for those trailing to win.

Baseball closers are niche pitchers, often asked to pitch only one inning, with his team holding the lead. Aroldis Chapman, the Cubs' closer, came in to pitch 2.2 innings in Game 5, 1.1 innings in Game 6 and 1.1 innings in Game 7. Chapman had one day of rest and pitched Game 5, another day of rest before Game 6 and no days off in Game 7. While he did allow three earned runs in the last two games, Maddon believed the risky strategy of extending his closer was the only way to overcome his 3-1 deficit. Chapman did allow runs, but it left other relievers fresh for longer games. Hitters were also asked to swing for home runs, not mere singles or doubles. The Cubs ranked 13th in home runs last season, but in the World Series, they recorded at least one home run in games five, six and seven, en route to their title.

In basketball, Skinner's paper discussed two key concepts pertinent to the Cavs: how often to shoot 3's and when to stall. The logic in the first case is, depending upon how many possessions are left in the game, a team should resort to shooting triples when reaching its critical threshold. In the regular season, Cleveland ranked 7th in the NBA in three-point shooting percentage and 3rd in three-point shooting attempts, but going up against the Golden State Warriors who ranked first in both categories. The Cavs' two of the three highest rates of three-point shooting in that series
happened in games 6 and 7, two must-win games. As for pace, while Golden State had the second most possessions per 48 minutes in the NBA, Cleveland ranked 27th out of 30 teams. However, the Cavs played a faster pace for games 5 and 6, both resorting to a style more like the Warriors and not shortening the game like it is suggested for underdogs. It is worth noting there was a slower pace for Game 7, the most dramatic in the entire series.

Lastly, the Patriots helped themselves and the Falcons maimed themselves because of risk-taking.
Once Atlanta led 28-3, New England resorted to 40 pass plays (including sacks) and just 10 rushes. Before the deficit, the Patriots passed the ball 34 times and ran it 15 times, relying significantly more on the ground attack. Also, some of Brady's longest completions occurred in the 4th quarter during the comeback. Defensively, Matt Ryan and the Falcons leaned towards passing more frequently in the final minutes than sticking to the ground game, which would have taken more time off the clock. Perhaps the most egregious example was when Atlanta had the ball at the New England 22-yard line with 4:40 left in the game and leading by eight. Instead of running the ball three times and going for a two-possession lead, a sack, a pass (wiped away by offensive holding) and an incompletion took the Falcons out of field goal range AND gave Tom Brady 3:30 to tie the game. Overall, even play-count disparity factored into the outcome; Brady kept the Falcons' defense on the field and Ryan could not give his teammates a break.

Teams in any sport can calculate when it is time to run riskier plays. Many recent and high-profile examples suggest comebacks are more possible than ever before, when the right tactics are implemented.

There is a postscript: win probability charts have become more popular than ever. But these games and series show something seemingly calculated to have a .7% probability of happening can occur. Because underdogs can increase their own variance with their playcalling, perhaps these charts need to be updated in some way. Fortunately, this discussion is ongoing.

A New NCAA Tournament

UNADJUSTEDNONRAW_thumb_10d3
There's no doubting the increased awareness of analytics in predicting the NCAA tournament field in college basketball. Instead of just diagnosing a team's record against the Top 50, it's Rating Percentage Index or Ken Pomeroy rankings, that are becoming more commonplace. It has gotten to where data scientists are actually meeting with the NCAA to determine if one metric should be used above all others to pick tournament teams.

Perhaps surprisingly, data scientists want simpler criteria for picking teams: who wins, who loses and who have you played. This is opposed to other explanatory variables used in more advanced metrics, like margin of victory and offensive/defensive efficiency. Coaches, on the other hand, would prefer more complex formulae for determining the tournament field. Logically, this approach makes more sense from their perspective, because of competition. If a coach has figured out a style of play or way to schedule opponents that increases the likelihood of making the tournament, they develop a competitive advantage. Data scientists want to keep it simple for fans, coaches want a figure out a competitive advantage.

Perhaps in this same spirit of transparency, the tournament selection committee released "in-season" projections for the first time ever, one month before Selection Sunday. It only has the top four seeds of every region, but it is added information for where highly ranked teams really sit. As with any analytic project, more data "usually" means more robust forecasts. Already, it is easier to make more accurate assumptions and offer a better glimpse as to what the committee is looking for.

However, these in-season projections do not include the full field of 68, and what usually causes the most consternation is simply who does and does not make the dance. While it makes sense not to include the full field because you have to assume certain conference champions in mid-major conferences, something that would include all "at large" teams would provide even more information as to the criteria for inclusion.

Nothing is easy about picking 68 teams to play in a tournament, and while analytics may be helpful in forecasting a Final Four, easy-to-understand criteria can help teams and fans quell any controversy.

Who is the NFL MVP?

Pasted Graphic
This year's NFL MVP race is uniquely interesting. Many believe New England Patriots' quarterback Tom Brady deserves this honor, despite missing four games for a controversial deflated football scandal from a few years ago. No matter your opinion as to if Brady deserved to be suspended, it is worth noting, few MVPs have missed games during the regular-season. Players like Emmitt Smith and Aaron Rodgers missed a game or two, but four games is a full quarter of the season and requires a number of assumptions as to if Brady would have played as well as anyone during the stretch he missed.

Before going over these assumptions, let's first look at the history of the award and who else are viable candidates this season. Since the Associated Press began handing out MVP honors, 18 of the recipients were running backs, 40 were quarterbacks and 3 played other positions. The most accomplished running back this season was Ezekiel Elliott. Not only does his 1,631 rushing yards and 15 touchdowns outshine other running backs this year, they outdo others who were proclaimed MVP. Because no one at any other position seemed to stand out, Zeke is the only "non quarterback" worth mentioning.

As for the gunslingers, if you go by passer rating, QBR (
quarterback rating), yards per pass attempt (as well as net yards and adjusted net yards per pass attempt), and passing touchdown percentage, the winner is Atlanta Falcons' quarterback Matt Ryan. New Orleans Saints' QB Drew Brees does have an edge over Ryan in terms of total passing yards and completed passes, but efficiency metrics almost always list Ryan higher than Brees. Brees also did not "lead" his team to the playoffs, something nearly every MVP has done in the past. But this exercise is about Tom Brady and if his numbers would have been superior to Ryan's had he played the entire season.

The simplest way to answer this question is to take proportions of Brady's stats and add them to what he did accomplish and see how they measure up:


Pasted Graphic 1

Pasted Graphic 2

Pasted Graphic 3


By these proportions, Ryan would've still had more passing yards and touchdowns than Brady, though the Patriot would've still had fewer interceptions. However, this exercise assumes opponents are of equal quality, which we know is not true. What we should do is examine the opponents Brady did not play and project what his numbers would have been in those games. New England's first four opponents were Arizona, Miami, Houston and Buffalo. Their combined records are 33-30-1 (Atlanta's first four opponents were Tampa Bay, Oakland, New Orleans and Carolina, with a combined record of 34-30, just one tie better). Brady missed out on the 2nd, 4th, 6th and 15th best passing defenses in the NFL, using passing yards defended as the barometer. Averaging their defensive numbers, that group allowed 219.5 yards and 1.4 touchdowns. To put those numbers in perspective, for the dozen opponents Brady did face, that group allowed 233.4 yards and 1.6 touchdowns.

In other words, the foursome Tom Brady did not play featured significantly better passing defenses than the dozen he did go up against. Given this logic, it is safe to lower Brady's numbers even more than what was projected, which was worse than Matt Ryan's.

Two more things to consider when comparing these two quarterbacks. First,
Pro Football Reference says the Falcons' strength of schedule was significantly tougher than the Patriots' (18th vs 32nd, respectively). It also has its own way of determining Approximate Value of each player as an attempt to show how important they were to a team's overall success. Without getting into the specifics, Ryan led the NFL with 21, Brady was 13, and he would have had to achieve a lot to make up that ground in the four games he missed.

Again, no matter if you believed Tom Brady was unjustly punished for Deflategate, it is unlikely he would have posted better statistics than Matt Ryan. Even though Ezekiel Elliott did have a stellar rookie campaign, his numbers were not historic for any running back. It is Matt Ryan who deserves to be this year's Most Valuable Player.

How Predictive Is Scoring Differential?

Pasted GraphicHow important is an impenetrable goalie in the NHL? How much better is it to outscore opponents throughout the season, as opposed to dominating them defensively? Overall, how important is point differential to overall success?

In an earlier blog post, I discussed
playoff unpredictability when it comes to determining who will win a championship based upon how many games that team won. There, the NBA was the most predictable, then the NHL, NFL, then MLB is the most unpredictable (unless, of course, you are the 2016 Chicago Cubs). But how does point differential (or run differential in baseball or goal differential in hockey) translate to winning championships? And which league is most predictable when looking at that specific metric?

Once again, I am using
logistic regressions using one explanatory variable and if that team won a championship as the dependent variable. However, this time I am using three per sport: offensive output, defensive output and scoring differential. Also once again, here is what is noteworthy with our datasets:

- All data used begins with the 1989-90 season because the NFL had the biggest chance to its playoff format at the turn of the new decade.

- Any season in any sport where a lockout shortened the number of games played considerably was removed (e.g., the 1998-99 NBA season, the 2012-13 NHL season, etc.)

- Though the NHL played 80 and 84 games in a few of these seasons, these numbers are not significantly different from the 82 played the rest of the dataset, so they are still used.

Each explanatory variable has the appropriate and logical coefficient. In other words, scoring variables have a positive coefficient, defensive variables have a negative coefficient and scoring differential variables have a much larger positive coefficient. All of this equates to a better probability of winning a championship. Each variable is also statistically significant with 95% confidence, which is to be expected. A better offense, defense and scoring differential will obviously increase the likelihood of winning a championship. What is not clear is which of these indicators is most predictive.
A goodness-of-fit measure called AIC (Akaike Information Criterion) can shed some light. As this number gets smaller, the model has a better fit, explaining away more of the randomness of that sport.

The first chart is points (or runs or goals) modeled against championships:

Pasted Graphic 1

Before analyzing this chart, it is important to note the value of each point, goal and run, compared with the other sports. In 2016, the average MLB team scored 726 runs for the season. This number is different from the 325 points scored, on average, for an NFL team in 2015, the 8419 points scored for an NBA team for last season and the 222 goals scored for an NHL team for last season. Fortunately, the variation across each league is not so substantially different to where comparison becomes impossible.

In the chart, we see goals in hockey as being the best predictor for winning its championship, with football being slightly more random, then basketball, then baseball finishing as the most random. So far, these results are consistent with the previous study where MLB's postseason was the toughest to predict, based upon number of wins during the regular season. Basketball makes intuitive sense because teams play at different paces, and it is not conclusive if playing at a faster rate—which scores more points but not necessarily more points per possession—is the best way to win a title.

The next chart illustrates runs, points, and goals allowed, modeled against winning a championship:

Pasted Graphic 2

Comparatively, the trends are almost the same as they are with offensive output: Major League Baseball is the most random, followed by the NBA. However, an NFL scoring defense is now a better indicator than an NHL scoring defense, but only slightly so.

Now, let's combine these two charts into scoring differential, modeled against a championship:

Pasted Graphic 3

Here, we learn point differential is more predictive in basketball than in any other sport. Remember how different teams playing at different paces obscures the importance of points alone? Including the defensive component erases pace of play and gives a clearer predictor. It also coincides with how a win total in basketball is most predictive for winning a championship. Football and hockey are nearly equal in predictive ability and baseball is a distant fourth.

There are more trends to uncover if we combine all of these charts:

Pasted Graphic 4

In nearly every sport, scoring defense is more predictive than offense (with hockey being the lone exception). Scoring differential is predictably better for analysis than offense or defense by itself, but the degree to which it takes away the randomness is different for each sport. It is only a slight improvement in the NFL, but a drastic improvement for basketball.

Overall, these proportions could prove helpful when determining if a team is going in the right direction when devoting resources to offense and defense. Both are necessary, but perhaps more money should be proportionally allocated to the areas that best predict who will win a championship.

Go Cubs Go

Pasted GraphicIn just a few days, Wrigley Field's iconic scoreboard will showcase a World Series for the first time in more than seven decades. A franchise with questionable management and horrible luck has finally come within four wins of its first world championship in more than a century.

The Cubs have fielded formidable teams that have made the postseason, but never have they won the NLCS until this year. Often postseason baseball can be so unpredictable that it is difficult to explain why the Cubs could not reach the World Series until now. But there are some trends that predict success in playoff baseball, that does not have as great an impact in regular-season baseball.

While I have written a paper about this and have applied those lessons to the Texas Rangers in a previous post, I would like to look at alternative research. In the book "Baseball Between the Numbers", three qualities are listed that best determine postseason success:

  • Pitcher Strikeout Rate
  • Fielding Runs Above Average (FRAA)
  • Closer Expected Wins Over Replacement Pitcher (WXRL)

The Cubs finished 3rd in the majors in strikeout percentage and strikeouts per nine innings (the Dodgers finished first in both categories, the team Chicago beat in the NLCS). Fangraphs uses a metric called
Ultimate Zone Rating to calculate fielding, and listed the Cubs as the best fielding team this season. Lastly, the Cubs finished 19th in reliever Wins Above Replacement, but keep in mind, the team traded for Aroldis Chapman late in the season.

It is also worth nothing, the Indians had high rankings in all three of these categories as well (5th, 4th and 7th, respectively). While the matchup should make for a fantastic World Series, given how the Cubs have properly built this team for a postseason run, it should not come as a surprise if they can end this 108-year streak.

A Unique Cowboys Perspective

Screen Shot 2016-10-30 at 2.59.44 PM
The Dallas Cowboys are constantly watching film and studying the playbook for that added edge. Their fans also want to know anything that can help explain why their favorite team won or lost, and if there is a way to forecast how they will do and where they need to improve. Our newest data visualizations hope to do all of the above.

Before and during every Cowboys game, I will post on my various social media accounts some analytics that explain what is going on and predict what will happen. After the game, I will have one summary detailing what happened, using explanatory variables that are the best indicators for the outcome of any football game. Here is some extra information for each highlighted variable:

  • Turnovers are perhaps self-explanatory and the team with the better turnover ratio has a significant advantage.
  • Scoring efficiency goes beyond just the scoreboard. It's a ratio of (offensive yards/points). A team may have moved the ball but failed to score many points when near the end zone, so they were inefficient. Not only can each team's efficiency be compared, but each bar has a color: red for bad, blue for average and green for good. Respectively, these quality ranges are: 0-12, 12.01-18.5, 18.51-. These ranges came from the last ten years of NFL data, provided by Pro Football Reference.
  • The ratio (time of possession/rushing yards) looks at who was controlling the game effectively. Time of possession is not an effective indicator for success, but how well a team controls the ball while on offense is. The team with the better ratio earns the checkmark.
  • Overachiever/underachiever is a way to look at how well a team is doing for the season, relative to its point differential. In other words, if a team is has a strong record but all of their wins are close, they are overachieving. If they suffered a number of losses but they have been close, they are underachieving. This idea is calculated using a Pythagorean Expectation formula, something more commonly used in football: ((Points for^2.37)/(Points for^2.37 + Points against^2.37)). This winning percentage can then be multiplied by the number of games played to show where a team "should" be with its record.

Periodically there will be additional metrics to explain why the Cowboys won or lost, such as net passing yards/attempt, which takes into account sacks and incompletions as well as how many passing yards each quarterback is able to accrue. As more metrics become readily available, this summary will include them. To see these visualizations in real time, follow me:


Special thanks to
Fuzzy Red Panda for putting together these beautiful images and programs that advance sports analytics in such creative ways.

Screen Shot 2016-10-30 at 3.22.31 PM

No Range for the Texas Rangers

IMG_5937It's hard not to catch shortstop Elvis Andrus smiling these days. His Texas Rangers go into the postseason with home-field advantage all the way through the World Series—while finishing one victory shy of a franchise record for most wins in a season—and boasting the most wins at home in the American League. Elvis himself finished the regular-season as a .302/.362/.439 hitter. And yet, a few sabermetricians have spoken out, saying not only shouldn't the Rangers be one of the favorites to win the World Series, their success is virtually fraudulent.

It involves
Pythagorean Expectation. This is the often-cited formula baseball guru Bill James invented to estimate how many wins a team "should" have based upon how many runs they scored and allowed. Since it became commonplace, the formula has worked quite well explaining why teams are thriving and struggling. Even this season, the formula explains all but a handful of wins or losses for every MLB team. The one team the formula has done the poorest job with, is the Texas Rangers.

For much of the season, this team's Pythagorean W-L hovered around .500. The Rangers finished 13 games above what was expected, at 95-67. Why? The Rangers were 36-11 in one-run games (the .766 winning percentage is a record in modern baseball). They were also 18-24 in games decided by 5+ runs. In other words, the Rangers won a lot of close games and lost a lot of blowouts.

This large of a discrepancy is unprecedented in the last decade for the Rangers:

Pasted Graphic

The Rangers have performed roughly what was expected, given their runs scored and allowed. But the last two years this team has over-performed. It might be a coincidence those were the two years Jeff Banister has been the manager of the Rangers, but maybe not. Banister has a history of evaluating players and looking at skills during blowouts. He is certainly not the only manager to have this approach, but it is possible he takes it to the next level. Two years is not sufficient data to make such a conclusion, but it is a noteworthy trend to consider.

So how accurate is this formula when predicting if the Rangers will win the World Series? Not very. Since 1969,
11 teams out of 47 had the best Pythagorean Expected record and went on to win the World Series. In fact, the likelihood has decreased since the postseason expanded. Many conclude the postseason is almost impossible to predict, though there are the trends to consider that are helpful. Most notably, "Small ball" seems to be a more successful approach in the postseason than the regular-season. Among teams in the postseason, the Rangers rank 3rd in stolen bases, 5th in sacrifice flies and 3rd in hit by pitch (they are however last in walks and almost last in sacrifice hits).

If you believe the Rangers will eventually regress to the mean given this disparity, it has not happened through 162 games, so statistically nothing suggests this trend will automatically change after another 19 games. In a way, the Texas Rangers have just as good a chance to win the franchise's first world championship as anybody, and that smile from Elvis Andrus will be even wider.

Who Wins the FedExCup?

The PGA Tour will award its tenth FedExCup by week's end. This event has not attracted the same fanfare as majors or even other regular tournaments. The TOUR Championship is held during football season, has only been around for a decade and has a scoring system that has changed even in that short window. Still, with $10,000,000 in bonus money on the line for the winner and the best players of the season in the field, it is worth the exercise of predicting this year's champion.

Historically, there has been little fluctuation when it comes to who wins the FedExCup, based upon his ranking the prior week:

Pasted Graphic

Seven out of nine winners were ranked 5th or better heading into the final tournament. This trend bodes well for Dustin Johnson, Patrick Reed and Adam Scott. However, seven out of nine won the tournament and went on to capture the cup, so much of this prediction exercise involves who will win at East Lake Golf Club just as much as it does forecasting the rankings afterwards.

The course has a par 70, uses Bermudagrass and is 7,385 yards long. Last year, it was ranked the 17th toughest course by score, out of 52 tournaments (and again, this tournament features only the top 30 ranked players in the FedExCup standings). As for more specific statistics compared with the rest of the Tour:

Driving Distance: 12th shortest (284.2)
Sand Save Percentage: 14th best (53.49%)
Greens in Regulation Percentage: 13th worst (62.1%)
Putting Average: 42nd best (1.742)

So far, nothing suggests this course has unique attributes that golfers have to make major adjustments for. The next step is looking at the strokes gained statistics for the last nine winners of the golf tournament, prior to the BMW Championship:

Pasted Graphic 1

Winning this tournament seems to require a complete game, though occasionally winners have had negatives strokes gained statistics in putting or driving. This idea does not necessarily eliminate anyone's chances. However, nearly all of them have needed good to great approach games, which is good news for Adam Scott (1st in SG: Approach the Green) and Hideki Matsuyama (2nd).

Often times the winner has also had a stellar World Golf Ranking, which suggests Jason Day or Dustin Johnson could win everything. Five golfers can win the TOUR Championship and hoist the FedExCup without requiring any help thanks to their point totals: Johnson, Scott, Day, Reed and Paul Casey. Given how important momentum can be for winning any golf tournament, these golfers have many reasons to feel confident about their chances.

This idea is furthered when analyzing how much of an advantage the higher-ranked players have heading into the tournament, relative to the rest of the field. Consider this: after the BMW Championship, a player's points are reset to a new number based upon his ranking (to see the updated point totals,
click here). Resetting scores gives everyone a chance to win the FedExCup, even though it wipes away any commanding leads a golfer may have had leading up to the TOUR Championship. The points earned for where a player finishes at the TOUR Championship can be found here.

One way to look at the probability each golfer has for winning the FedExCup is to look at how resetting points improves or worsens each golfer's chances. The most critical assumption in this exercise is every golfer is of the same quality and has the same abilities, so everyone has an equal opportunity to win; their probability to win the TOUR Championship is 1/30, or 3.33%. But to calculate their chances of winning the FedExCup, after resetting points, requires a more rigorous approach. Using
Monte Carlo simulation, I ran 5,000 tournaments and looked at how many times each golfer finished with the highest point total. Their probabilities can be found here:

Pasted Graphic

As expected, the lower the ranking, the worse the probability. Also as expected, if you were to draw a function to fit these points, it would be logarithmic (the R^2 is .9536 suggests this function captures almost all of the variation). Dustin Johnson has a significantly better probability to win than second-place golfer Patrick Reed. After Reed, the variation levels off. Still, in this exercise, golfers ranked 1st thru 8th have a better probability of winning than if points were completely erased, and whoever won the TOUR Championship also won the FedExCup.

No matter if you are computing probabilities using golfers of similar skill set, glimpsing at historical results or looking at abilities using advanced quantitative measures, the lesson is clear: likely looking at the top of the points list is where you will find this year's season-long champion.

A New Journalism Feature

Pasted GraphicEach week, I will air a segment on Good Day on Fox 4 in Dallas/Fort Worth that takes an analytic look inside college football. First, I look at a statistical trend inferring something we saw from the weekend before, the challenges predicting games and the secrets to being a more informed fan. Second, I use data and modeling to forecast games featuring some of the favorite teams from north Texas.

I will then post these segments to YouTube and share the links on the Journalism section here. You can click Journalism at the top of the page or
click here.

Is Jordan Spieth Struggling?

IMG_3376Even before winning two majors—and nearly two more—in 2015, Jordan Spieth was one of the more popular golfers on the PGA Tour. Then, that popularity soared when the 22-year-old set many records beginning with the phrase: "Youngest golfer to…". But with enormous popularity and early success come high expectations. This year, Spieth has not won a major, only being in contention once out of three times. He also fell out of the top spot in the Official World Golf Rankings and has three fewer victories overall. Given what he did accomplish and how he's performing now, is Jordan Spieth Struggling?

Spieth defended his record and, during his performance at The Open at Royal Troon Golf Club, felt any questions about struggling was "unfair". Per
golflink.com:

"It's been tough given I think [2016 has] been a solid year," said Spieth. "I think if last year had not happened I'd be having a lot of positive questions and instead most of the questions I get are comparing to last year and therefore negative because it's not to the same standard…So that's almost tough to then convince myself you're having a good year when nobody else really…even if you guys think it is, the questions I get make me feel like it's not. So I think that's a bit unfair to me…"

Let's take an analytical look at if Jordan Spieth is struggling by his standards and, if so, by how much. The simplest way is to look at
Strokes Gained rankings and compare last year to this year. What makes Strokes Gained so useful is pointing specifically to the parts of the game a golfer may or may not be excelling at. The following statistics compare how well Spieth has done compared with the rest of the field:

Pasted Graphic

The numbers above the bars are his rankings on Tour. What also matters here are the following equations:

Off-the-Tee + Approach-the-Green + Around-the-Green = Tee-to-Green

Off-the-Tee + Approach-the-Green + Around-the-Green + Putting = Total

First, Spieth is actually performing better off the tee, but the rest of the field has caught up. Around the green and putting have remained steady or actually improved. The glaring statistic is his approach to the green. This measures all approach shots on par-4 and par-5 holes that are NOT within 30 yards from the edge of the green and includes tee shots on par-3 holes. Spieth has gone from .618 to -.016 (moving from 11th place to 118th). This statistic is further highlighted by looking at the breakdown of his rankings compared with the rest of the field:

  • 163rd in Greens in Regulation Percentage (62.3%)
  • T107th in Approaches from 75-100 yards (17' 10")
  • T109th in Approaches from 100-125 yards (20' 5")
  • T118th in Approaches from 125-150 yards (23' 9")

This information explains the discrepancy in SG: Tee-to-Green and SG: Total. It also explains the bigger discrepancy in tee-to-green versus total, because his skill at putting is included in the total, not tee-to-green. It is also worth noting, Spieth is playing in fewer tournaments this year than last. He played in 25 last season and is only through 16 this season, prior to the PGA Championship.

Let's now look solely at majors and highlight the discrepancy in Spieth's approach game:

Pasted Graphic 1

Spieth does not have the same driving accuracy, greens in regulation numbers or sand save percentage that he did in that record-breaking year.

Here is something else to consider. Perhaps one of Spieth's strengths is adapting to links courses. PGA Tour players do not play a lot on these types of courses, and while other golfers can drive the ball farther, this skill is not an advantage on a links course. But Spieth's skills as a putter and around the green do come in handy. In 2015, the U.S. Open was on a links course. Spieth won. This year, the only two domestic tournaments that even come close to those types of conditions are the AT&T Pebble Beach Pro-Am and the Hyundai Tournament of Champions. Spieth won the latter.

What Spieth said about his game and his year requires clarification. Strokes gained statistics have helped us highlight two important things about Jordan Spieth. First, his approach game has let him down much more so than last year. Second, he is not struggling with any other part of his game and in some ways he has improved. While his fans hope Spieth would have won more tournaments this year, he still has virtually as good a chance as any to capture the final major of the season.

Who Do You Trust in the 4th Quarter?

Pasted GraphicSince being named the starting quarterback for the Dallas Cowboys, Tony Romo has been in the NFL spotlight for ten seasons and 127 games. While he has put up some of the more prolific statistics of any quarterback during this time, many argue he is the most scrutinized veteran gunslinger in the 21st century. One reason is anti-analytical: blown opportunities to win games in the 4th quarter. While many of these games have been the most critical for his team's championship aspirations, it does bring up the bigger question of which quarterbacks have been the most reliable for winning a game in the 4th quarter.

In a later article we will apply analytics and look at what constitutes a "clutch" quarterback. But first, let's look at the raw statistics. The data features 42 quarterbacks spanning all eras of the NFL but who can be considered, at a minimum, marginally successful (e.g. Peyton Manning, Warren Moon, Roger Staubach, Colin Kaepernick, etc.). The 4th quarter variables are: comeback attempts, comeback wins, comeback rate and career blown leads by the QB's own defense.

First, here is a graph of the comeback success rates:

Pasted Graphic 1

Of the quarterbacks analyzed, Andrew Luck has the best 4th quarter comeback rate of anyone (63%). However, he also had the fewest attempts, so it is too soon to call him the most clutch we have ever seen. In second place is Joe Montana (56%), who many might be more willing to admit is the best in close games. Peyton Manning had the most attempts of anyone (94), but his rate is 47%.

Then comes the aforementioned Tony Romo. His rate matches is only slightly worse than Manning's. While it is below half, only five of the 42 quarterbacks studied finished better than 50%. In fact, Romo's rate is 11th best out of 42. At the other end, the worst rate among active quarterbacks belongs to Aaron Rodgers (27%). Don Meredith has the lowest success rate of anyone at 25%.

Some of these rates can be explained by analyzing blown leads by that quarterback's defense:


Pasted Graphic 2

The quarterback dealt the least clutch defense is Drew Brees, where on 31 occasions, his "D" has blown a 4th quarter lead. Fran Tarkenton ranks second with 27. Tony Romo is tied for 10th with 17. This mark is slightly above the average among the 42 quarterback studied. As for those who have fewer reasons to be upset with their defense, there is Kurt Warner (6) and, as expected, Andrew Luck (2).

Visually and expectedly, there is already a direct correlation between 4th quarter comeback rates and blown leads by defense. Still, it is worth discovering if there are statistics for each quarterback that can help explain why some successful quarterbacks are better than others at the end of football games. I will report my findings in a future article.

Special thanks to Mark Lane for putting this data together. You can follow him on Twitter
@therealmarklane.

An Upgrade to Inside Sports Analytics

Pasted Graphic
This week we made some tweaks to the website. Some of them are literally tweaks, like adding my Instagram photos to the sidebar of the "Photo Album" pages (it's edwardegrosfox4 if you would like to follow me). My LinkedIn page is also available in the sidebar of the "About" page.

But the most exciting addition is the "
Journalism" page. Occasionally I submit sports analytic reports for Fox 4 in Dallas, the TV station for which I am the Weekend Sports Anchor. These stories are available on our station's YouTube page, and now, on this website. These stories focus on athletes and teams in north Texas but it can include major events and tournaments; it also uses the same quantitative tools the blog and podcast does.

As always if you would like to offer feedback or ask questions, please contact me through social media or by using the "
Contact Edward" page.

Yes! Go for Two!

unknownIt's an odd feeling for football fans. After scoring a touchdown, the exhilaration must be contained just as quickly as it erupted, as this same offense, grinding down the field and travailing through the defensive puzzles presented, decides to go for two. The decision is rare: during the 2015 NFL season, 1,217 extra points were attempted, but only 94 times did a team go for two (7%). In fact, five teams never attempted a two-point conversion.

Pittsburgh Steelers quarterback Ben Roethlisberger suggested this week his team should go for two, every time. Though his team attempted more two-point tries than anyone else, fewer than one-fourth of the time did the Steeler offense return to the field after a touchdown.

Traditionally, this idea is irreverent. But analytically, this idea carries merit. Because 94% of extra points were converted last year, if a team always goes for two, they only need to convert 47% of the time to push. It is worth noting, a defense can return the football the length of the field for two points no matter what is being attempted. Though this happened only once and during an extra point, it could fractionally affect this expected value even if it statistically insignificant. Lifetime, teams convert their two-point attempts roughly 50% of the time, almost exactly what they need for it to be a push.

So why always go for two if it is a push and risk injury to more valuable players? And, perhaps more importantly, would this 50% success rate hold if teams went for two more frequently? Aside from the fact there is an obvious trend NFL offenses are improving and kickers are worsening (mainly because the distance of an extra point was moved back 15 yards), the following chart illustrates two-point tries:

Pasted Graphic

As expected, the 50% success rate remains relatively consistent regardless of how many times teams go for two. However, as stated before, this is a small sample size compared with the number of times a team could have gone for two, but elected for the extra point. Usually teams go for two when almost absolutely necessary. When it is not absolutely necessary, will the success rate be the same?

It's worth finding out.

Predicting Pitching Performance

Image-1Noah Syndergaard made his Major League debut last year for the New York Mets and made an immediate impact (3.24 ERA and 9.96 K/9). While his 9-7 record may not have been overly impressive, there were signs this was only the beginning. Now, Syndergaard has multiple National League player of the week awards and is one of the more reliable hurlers in the game.

But not every pitcher lives up to predictions. How can someone better determine which pitchers will become successful the following season? One of the more intriguing presentations concerning the future of baseball predictions involved creating a pitcher projection system based upon Pitch F/X (to read the paper and/or watch the presentation, click
here). The traditional ways to gauge a successful pitcher do not always perform well when forecasting how he'll do the following year. According to this research, if next season's Earned Run Average (or Runs Averaged/9 innings) is regressed onto one of these traditional metrics, here are the following R^2:

Metric R^2
K% 0.67
SIERA 0.52
xFIP 0.46
BB% 0.45
FIP 0.35
HR% 0.18
ERA 0.14
BABIP 0.04

Strikeout percentage is the most successful traditional metric when determining future success. Here are the top ten pitchers in K% in 2015:

  1. Clayton Kershaw (33.82%)
  2. Chris Sale (32.08%)
  3. Max Scherzer (30.7%)
  4. Carlos Carrasco (29.59%)
  5. Chris Archer (29.03%)
  6. Corey Kluber (27.65%)
  7. Jacob deGrom (27.03%)
  8. Jake Arrieta (27.13%)
  9. Madison Bumgarner (26.93%)
  10. Francisco Liriano (26.52%)

MLB is through 1/4 of the 2016 season. As it stands, here are the top ten pitchers in K% this year:

  1. Jose Fernandez (35.9%)
  2. Clayton Kershaw (33.7%)
  3. Noah Syndergaard (32.6%)
  4. Max Scherzer (31.5%)
  5. Stephen Strasburg (30.9%)
  6. Danny Salazar (30.3%)
  7. David Price (29.4%)
  8. Vincent Velasquez (28.8%)
  9. Drew Smyly (28.4%)
  10. Drew Pomeranz (28.3%)

While many on the 2015 list currently rank just outside of the top ten this year, it shows two things: the difficulty of predicting pitcher success given any traditional metric and it shows just how consistently dominant Clayton Kershaw and Max Scherzer really are.

This paper discussed combining the aforementioned statistics with Arsenal/Zone rating. This metric uses PitchF/X data which tracks the speed, movement and placement of every pitch relative to the strike zone. The idea is, with more data about the specifics of each pitch a pitcher throws, the pitch sequence and which pitches are most sustainable over time, it will be easier to predict success the following season.

Data scientists should always be careful about having too much data because of overfitting. In other words, too much data and too many variables mean watering down the prediction to where it is hard to find actual trends that are meaningful. Still, this is an intriguing paper and hopefully this Arsenal/Zone rating can be more readily available to baseball fans but in an easily digestible way.

You've Drafted Your Team, Now What?

image1For three days, there is a frenetic pace to the NFL Draft, where specific needs are addressed (or not), value is appropriated for each pick (or not), opponents' draft boards are analyzed and combated (or not) and undrafted free agents are debated and signed. After all of these minuscule details, there comes a bigger picture question: What the heck just happened?

Pundits and perhaps those within each franchise grade these draft classes before anyone attends a rookie minicamp or any single contract gets signed. Though there's always the inexplicable human elements, grading draft classes is becoming easier and more analytically sound.

Last year, the Harvard Sports Analysis Collective posted a blog about
predicting success based upon combine numbers. Some positions are much easier to predict than others. For instance, cornerbacks and outside linebackers with good combine numbers tend to do well in the NFL, whereas quarterbacks and wide receivers are much harder to predict.

Value also matters. Franchises will be crippled if an average player is chosen in the first or second round, while others can be bolstered by picking stellar athletes in later in rounds. Many of these problematic picks happen when a team drafts an offensive skill player.
One research paper suggested 60% of running backs and receivers taken in the 3rd-7th rounds have better average career statistics than those taken in the first and second rounds. The trend with receivers makes sense, given combine numbers not having as much predictive power, but perhaps the trend with running backs is a sign of the times (i.e. the NFL evolving to a passing league). As for quarterbacks, the more highly touted ones tend to have better careers.

So who drafted well in 2016? Without offering clear answers, perhaps surprisingly, the perennial bottom feeders in the Cleveland Browns did well. They took defensive stars with good combine numbers like Emmanuel Ogbah and Carl Nassib and, for the most part, waited on skill players in later rounds, including Ricardo Louis and Jordan Payton. Another possible surprise is the Jacksonville Jaguars. Their first round pick was a cornerback with excellent combine numbers in Jalen Ramsey. They also took defensive risks like Myles Jack and Sheldon Day, also suggesting they are focusing on addressing some of their bigger needs. As for the uncertainties, obviously the Rams and Eagles gave up a lot for quarterbacks who may or may not pan out, but the Cowboys, Texans and Redskins also drafted skill players in the first round who will be tougher to predict if they can translate to wins.

What we're learning with the latest research is the NFL Draft is not the crapshoot it once was. It will never be perfect, but it is also becoming clearer which teams are heeding this research and who prefers considering non-analytical advice.

Playoff Unpredictability

Pasted GraphicUntil recently, the Los Angeles Lakers were one of the fixtures of the NBA Playoffs, and in many seasons, the Finals. They have put together dynasties in different generations of the sport, from Magic Johnson's teams to the Shaq and Kobe era. When the Lakers were not winning titles, chances are another team was enjoying its own dynasty, like the Boston Celtics, Chicago Bulls or San Antonio Spurs. Dynasties are so commonplace in the NBA, 15 franchises in the sport's history do not have a championship (and seven of those still in existence never even made it to the Finals).

The NBA is unique in this regard: championships are won in bulk. Other leagues offer more parity, where there is a larger pool of contenders vying for a title. There may be dynasties in other sports, but there seems to be fewer of them, each shorter in duration and there stood a better chance someone unexpected can claim the sport's top prize.

Which of the four top professional sports leagues (NFL, NBA, MLB and NHL) offers the most playoff unpredictability? Is the NBA truly the most predictable? Is it significantly more predictable or marginally so?

One approach to answering these questions is by using a statistical model for each sport. Here, we will use
logistic regressions, where we will look at only wins (or points in hockey) and see how well it predicts whether a team won a championship that year. Here are some other notes for setting up this project:

- All data used begins with the 1989-90 season because
the NFL had the biggest chance to its playoff format at the turn of the new decade.

- Any season in any sport where a lockout shortened the number of games played considerably was removed (e.g., the 1998-99 NBA season, the 2012-13 NHL season, etc.)

- Though the NHL played 80 and 84 games in a few of these seasons, these numbers are not significantly different from the 82 played the rest of the dataset, so they are still used.

At first glance, every variable representing wins is statistically significant with 99% confidence, which should be obvious because you need so many wins just to make the playoffs. What matters is how well wins alone predicts championships. In statistical parlance, we will use a goodness-of-fit measure called
AIC (Akaike Information Criterion) to answer this question. As this number gets smaller, the model has a better fit. The following shows how well each model performs:

Screen Shot 2016-04-17 at 7.47.11 AM
The larger the bar, the more unpredictable the league is. Again, as expected, the NBA is the most predictable, and by a considerable margin. This model also suggests Major League Baseball is the most unpredictable, with the NFL as a close second and the NHL as a close third.

There are a number of other variables that could be added to these models to help determine who will win a championship, but the simplicity of these models makes for an easier comparison across sports.

Predicting the Masters

IMG_3374Jordan Spieth is and should be one of the favorites to win the Masters. He's had two starts at Augusta National, finished tied for second in 2014 and won it in 2015. He also has a PGA Tour victory in 2016, the Hyundai Tournament of Champions.

But, the PGA Tour's website is predicting someone different. Using an analytic formula, the site says
Phil Mickelson will win the green jacket. There are three variables used: the overall rankings for driving distance, putting and scrambling. Mickelson has the best ranking when combining all three variables, and by a lot. The second-place golfer, Jason Day, is 38 "points" lower than Mickelson but only ten points better than third and fourth place (Marc Leishman and Rickie Fowler, respectively). If this formula is completely accurate, Spieth will finish 7th.

Though the simplicity of the formula can be appreciated, any Masters prediction should include past performances. This variable is highly predictive. It explains why Fred Couples finished in the Top 20 in five of the last six years, even though he has played on the Champions Tour since 2010. It might also explain why the Masters remains the only major championship Rory McIlroy has yet to win (he has finished 8th or better the last two times at Augusta National).

Even when adding this variable, it does not take away from the argument for Mickelson. After all, he has won a pair of green jackets and finished tied 2nd in 2015, four strokes behind Spieth. It is also worth noting, of the 48 different golfers who have won the Masters, 17 won it multiple times (35.4%). Look for Mickelson, Spieth or Adam Scott to finish atop Sunday's leaderboard.

Evaluating Your Bracket

Pasted Graphic 1The Law of Conservation of Mass tells us: matter is neither created nor destroyed. When you burn your horribly incorrect college basketball bracket, remember, you never destroyed it, it is in another form somewhere in the universe. So instead of ignoring your transgressions, let's embrace what still exists and see which approaches were the best when predicting who will be in the Final Four.

There's a one-seed (North Carolina), a couple of two-seeds (Villanova and Oklahoma) and a 10-seed (Syracuse). There is not as much parity with this quartet as with some tournaments in the last few years. Still, some of the favorites to win the National Championship did not survive the first two weeks of this crucible. For instance, the top three teams in the Pythagorean Rating at the end of the conference tournaments are not playing in Houston. In fact,
Syracuse did not even crack the top 25, until recently. ESPN's Basketball Power Index offers these rankings: North Carolina (1), Villanova (3), Oklahoma (6) and Syracuse (39). The LRMC Basketball Rankings still has its two, three and seven, but ranks the Orange 41st.

Some computer models have resorted to predictions without solely implementing historical data. How is this possible? Microsoft's search engine, Bing, uses social media to determine which teams will survive and advance.
It has already proven successful in other sporting events like the World Cup and NFL games. But how did it fare for this tournament? Sadly for Bing, it only predicted one Final Four team correctly (North Carolina). In fact, the system predicted the Orange to lose their first game.

It should be clear by now the two schools that ruined this tournament's predictiveness: Kansas and Syracuse. The Jayhawks were the top team by nearly all accounts, yet lost in the Regional Final,
perhaps uncharacteristically. At the other end of the spectrum, Syracuse could be the worst team ever to make the Final Four. There have been 11-seeds to make it to the final weekend of the season, but many debated if Syracuse even deserved to make the tournament. Their RPI was 72 at the time of selection, worse than other schools that were not chosen (e.g. Valparaiso, San Diego St. and St. Bonaventure). Instead of the favorite vying for the National Championship, it's the controversial at-large two wins away from glory.

Even listening to me would not have been wise. Using my own system, I only correctly predicted one team (and it was a different school than what I said was coming out of that Region on Fox 4). My National Champion was knocked out during the Elite Eight (Kansas) and my second place team lost in the First Round (Michigan St.).

So what is the best way to fill out your bracket for the next tournament?

I don't know.

A Recap of the 2016 MIT Sloan Sports Analytics Conference

Sloan 1For the 10th time, sports analytics enthusiasts of all kinds came to Boston to attend the annual MIT Sloan Sports Analytics Conference. I was one of close to 4,000 attendees, though this was my first. Coaches, general managers, players, journalists, academics and just about anyone else in-between gave their takes on the industry and shared their research to the masses.
The following stream-of-consciousness features the panels I attended and some of my bigger observations.


Sloan 2

The War on Analytics

Goose Gossage isn’t the only one profanely fighting analytics. If you believe some of the speakers at the MIT Sloan Sports Analytics Conference, there exists a countermovement to the quantitative revolution.

Perhaps it was most appropriate the 10
th anniversary of this meeting began with a “Moneyball Reunion” panel, including the author of “Moneyball” Michael Lewis, the Godfather of sabermetrics Bill James and an assistant for the Oakland A’s, Paul DePodesta. That team’s general manager, Billy Beane, found a reason for using analytics when scouting players.

“Billy used to tell our scouts…’I have all of this experience’”, said DePodesta, referring to Beane’s 25 years of working in some capacity in Major League Baseball. “I can’t walk into a high school game and say ‘This guy is going to be a star.’ If I can’t do it, I don’t know how anyone can do it…we have to come up with a different way,” said DePodesta.

The team combated old school thinking by finding players who were devalued in some way by others. Sometimes it was due to their physical stature. Lewis recalled the story of the A’s considering Alabama catcher Jeremy Brown, who many considered overweight: “He’s so fat, his thighs would rub together and set his jeans on fire.”

These stories happened more than a decade ago. Just like analytics, the criticisms and concerns have evolved. The second panel of the day focused on basketball and featured former NBA forward Shane Battier. He originally resisted analytics for a more personal reason.

Teams can quantitatively gauge a player’s health when it comes to sleeping habits, nutrition, etc. On the surface, it seems franchises would only need to know this information to maximize a player’s health, thereby making him/her more effective. But Battier’s concern was that teams would find some data to devalue him and have reason to pay him less and/or offer fewer years on a contract.

“It’s called capitalism,” said Battier.

Personal reasons or otherwise, Battier does believe there is a stigma within NBA locker rooms about what he called, “the math”. Though he claims it extended his career as he aged, it’s “still not cool to be hip to the math”. He did add if a player found analytics to be useful, they might find subtle ways to learn to how to improve.

The conflict between believers and non-believers rages on. Safe to say this conference preaches to the choir. When asked about Goose Gossage’s comments that baseball is now run by nerds, Bill James’s response received one of the louder ovations of the morning: “Back in 2002, you had to pay attention those guys. Now, you can just ignore them.”

Sloan 3

Talking About Playoffs

Taking a personal tone with this blog entry, one of the more interesting panel discussions of the day involved playoff analytics. Specifically, how do we devise the best system for determining a champion for each respective sport? It’s a philosophical question as much as it is analytical because leagues could simply have one-game championships for every sport; and though it would be exciting, it would also be inherently unfair for teams that would win a series but lose the opener.

Each sport has its own set of challenges. While the NFL cannot play as many games as other professional leagues, college athletics must deal with other factors. NCAA executive Oliver Luck points to class time, money for travel and time commitments that, if abused, would be unrealistic for student-athletes.

However, at the forefront of these conversations is attracting the most loyal fans. They may not want to see a nine-game World Series (something I have argued for) because it is too long to retain interest. Nine games might be a truer way of determining the best team in a series—especially with expanded starting rotations—but in the end it is what the fans want, and that is something analytics can help with. NASCAR Vice President of Strategic Development Eric Nyquist pointed to how analytics helped his sport redo the Chase for the Sprint Cup so that a champion is not already determined by season’s end but it is not entirely haphazard as to who earns honors as the top driver.

Playoffs can also have other benefits when done correctly. Luck said the College Football Playoff has helped teams schedule more competitive non-conference games. It has also helped college basketball in spotlighting conference tournaments and conference games (though admits non-conference games could be more popular than they are).

This panel also agreed on an underlying truth that analytics highlights: there are many more games that would have to be played in all sports to determine the best team, at least thousands. Because this notion is unachievable, the next best thing is to come up the playoff format in the sport’s best interest. Who does it best? Neil Paine of fivethirtyeight.com says the NFL because it preserves uncertainty but the winner is often in the conversation of one of the top teams that season. The NBA, meanwhile, has too much certainty and only a handful of teams, if that, have a chance at a championship.

It would be ponderous for me to go through each sport and say whether I think they conduct playoffs properly. I also understand why uncertainty must exist to keep fans interested so there are fewer things to point to that would dissuade fans from following the playoffs. Still, I would hope leagues avoid caving too much to all of the whims of fans and perhaps provide a product that is fairer to the teams competing for championships than those rooting for them. I have found it is in the long-term best interest of a sport to maintain an unaffected, traditional system and not make determining a champion seem so capricious.

As a postscript, I found professional bowling to be the worst in determining a champion. In the tournaments I covered, early rounds would be a matchup of two bowlers in a best-of-seven series of matches, but once you reach the final rounds—which are televised—it is one match determining who advances and who wins the whole thing. To prove my point, I would like to believe this is why the sport is not as popular as it once was. I am probably mistaken, and if you are adamantly opposed to this idea, might I suggest a winner-take-all debate.

Sloan 4

Evolution of Sports Journalism

Of all of the panels at this conference, this was the one I was most looking forward to (surprising, isn’t it?). While it took a circuitous route to discussing sports analytics, it was a journey worth taking. For you young journalists, pay attention closely.

One of the more dominant voices on the panel was Jaymee Messler’s, President of the Players’ Tribune. Her company describes itself as “a new media company that provides athletes with a platform to connect directly with their fans, in their own words”. Founder Derek Jeter says he hopes the site will “transform how athletes and newsmakers share information”.

“We’re not following the news cycle,” said Messler. “We complement the media really well…driving stories that are compelling and are not getting covered by the [traditional] media.”

Here’s how it works: an athlete has a message they want to deliver. The Players’ Tribune offers a platform replete with resources to make sure it is exactly what they want to say. While traditional media might lose the ability to break the story, they gain material for questions the next opportunity they have for an interview.

The criticism involves the last part of this sequence. Why would the athlete grant an interview? Why would they talk about something if they feel everything about it has already been said? If they spend less time with reporters and more with the tribune, how do you build trust? (
My thesis alluded to many of these problems).

“The barrier to entry is zero,” said David Dusek of Golfweek. “You can, with a few clicks, get your voice out there…the players are much more controlling in that way and they have a way to react directly to fans (sometimes the media) and to have their voice heard…it’s interesting to see how it’s becoming more challenging.”

Reporters already had challenges talking to athletes before the Players’ Tribune thanks to athletes’ social media accounts. They already have a way to communicate to the public so a reporter may seem like a middleman. Traditional media also has to compete with new media that can provide scores and highlights more quickly than they can present. Lastly, clichés have become even more tired than ever.

What’s a reporter to do? One solution: analytics.

“Analytics is just one avenue to get a creative solution around limited access,” said Carl Bialik of fivethirtyeight.com. “We do want to talk to people in the sports world about what we find…some of the best interviews I’ve had are with people who are rarely asked about certain things.” These things include data trends, advanced statistics and specific forecasts.

Not all reporters can (and perhaps should) research their own analytics. It may not even be the unique route they should take to become more creative. What matters here are the conflicting forces that make the journalist’s job more challenging. Fortunately, there are solutions, hence the evolution.

Sloan 5

Conclusions

Virtually every hour of this two-day event, there are six different panels and lectures to choose from. I attended as much as I could while still covering the event and was not present for 49 different events, and that was just on Friday. That’s not to mention the many sports science exhibits, software presentations and other technological displays I was unable to see readily.

Perhaps one of the things that has attracted more than 3,000 people to this conference is the depth of sports analytics presented. Poster presentations and white papers are available for the deeply analytical. Other events like panels speak of analytics in broader, general terms. Even if a sports fan only wants to see players and coaches discuss their craft, there is a place for that person too. There is also a variety of subjects covered, from business analytics to athletic performance measurements to sports journalism and even to the future of how we will watch and listen to games.

While sports like football, hockey and soccer were covered, there were not as many baseball presentations as one might expect. Analytics have progressed more within that sport than any other. One reason might be a national sabermetric conference happening the same week in almost the other end of the country. It is also Spring Training with many MLB teams preparing for the season. Still, it might be a positive development for sports analytics to stress other sports so it can branch out and attract different fans. On at least two occasions, panels discussed how the NBA and basketball have the most room to grow internationally in terms of popularity.

The conference also took on developing stories. The Steph Curry phenomenon of making so many lengthy basketball shots had its share of supporters. Away from sports, Nate Silver of fivethirtyeight.com updated his political findings of who will be the major party nominees for President. Even conversations I had with presenters and attendees involved sports stories happening in the moment.

If analytics do not whet your appetite, this conference may not change your mind. After all, the pro-analytical comments were often received with at least some fanfare, a kind of “preaching to the choir”. For anyone who does have the slightest interest in sports analytics, chances are there will be at least one lecture or exhibit that will make for an informative weekend.


(All photos courtesy of MIT Sloan Sports Analytics Conference)

Special Teams Not as Special as They Used to Be

GoalpostsVirtually any football fan has heard cliche after cliche about the importance of special teams.  After all, why would they be called "special" if they were anything but?  There are too many instances of momentum being seized and lost because of an impressive kickoff return, devastating injuries affecting a team and the excitement caused by a game-winning field goal.  However, analytics suggest this phase of the game may not be as special as it once was.

Many data scientists have put together linear regressions weighting the importance of a team's offense, defense and special teams for the outcome of a game.  These models say special teams account for less than 20% of the overall effect to the outcome of a game.  
Some models suggest even less.  Winston (2009) put together a regression excluding any special teams variables in his book, Mathletics, and had an R^2 of .8733 and an adjusted-R^2 of .8577 (p. 129).

These models have been around for years, but only recently are we starting to see NFL teams deemphasize special teams:


Screen Shot 2016-03-04 at 12.04.02 AM

This figure represents the touchdowns scored from kickoff returns (red) and punt returns (blue) in the NFL since 2005.  Especially in the last three years, there have been fewer kickoff returns for touchdowns.  Some of this downward trend can be attributed to the league moving the ball to the 35-yard line to promote touchbacks.  Punt return touchdowns had a spike in 2011 and 2012, but have since leveled and do not have a discernible trend over time, positive or negative.  It still does not detract from the overall notion there are fewer points scored from this phase of the game.

What about extra points and field goals?  This past offseason, the league moved the extra point back 13 yards.  
It resulted in a reduction in successful extra point attempts, from 99.3% to 94.2%.  However, this amounts approximately to 80 missed extra point attempts over the course of an entire season for the entire league.  There are even fewer examples of this move affecting the outcome of a game, though one can make an argument with a notable example in the latest AFC Championship Game.  As for going for three, many agree it behooves teams not to kick field goals as frequently as they do.  Lately, there have been fewer field goal attempts.

Again, most of the theoretical research here has been around for a few years, but many successful NFL teams have now heeded the findings and do not invest as much in special teams as they once did.  While many will still pay for top-notch kickers and punt returners and have important reasons for doing so, we are seeing the NFL evolving to a more analytically based approach to the not-as-special special teams.

Greetings and Welcome!

10171781_10100528338078239_2690722215811075676_n

Hello and welcome to the blog portion of my website.  Here, I will write about sports analytic findings I have researched, analyze others' approaches to these quantitative tools and discuss the future of this field.

Though we are seeing players, coaches and the media become more comfortable discussing analytics openly, it also seems to be confined to specific areas like gambling and fantasy sports.  This blog will dig deeper into these areas by means of forecasting, but it will also infer how and why things happened in noteworthy games.  Models, data visualizations and other analytic tools can communicate these ideas.

One goal for this website is to bridge the gap between those who embrace analytics and those who shun the tools.  I have never been comfortable operating with the belief there are two distinct camps.  I believe analytics should be a part of a toolbox for fans and those who work in sports.  If a tool makes the job more efficient, then it should be used; if not, then find another tool or do not use any.  Attaching personal feelings one way or another does not (and should not) serve anyone's purposes.

I also hope this blog will be a call to action for those who read.  If you would like to comment, please do so.  If you would like to reach out, please click "Contact Me" at the top of this page.  Thank you for visiting and I hope you enjoy what this site has to offer.