By: Edward Egros

Analytics

Who Wins the FedExCup?

The PGA Tour will award its tenth FedExCup by week's end. This event has not attracted the same fanfare as majors or even other regular tournaments. The TOUR Championship is held during football season, has only been around for a decade and has a scoring system that has changed even in that short window. Still, with $10,000,000 in bonus money on the line for the winner and the best players of the season in the field, it is worth the exercise of predicting this year's champion.

Historically, there has been little fluctuation when it comes to who wins the FedExCup, based upon his ranking the prior week:

Pasted Graphic

Seven out of nine winners were ranked 5th or better heading into the final tournament. This trend bodes well for Dustin Johnson, Patrick Reed and Adam Scott. However, seven out of nine won the tournament and went on to capture the cup, so much of this prediction exercise involves who will win at East Lake Golf Club just as much as it does forecasting the rankings afterwards.

The course has a par 70, uses Bermudagrass and is 7,385 yards long. Last year, it was ranked the 17th toughest course by score, out of 52 tournaments (and again, this tournament features only the top 30 ranked players in the FedExCup standings). As for more specific statistics compared with the rest of the Tour:

Driving Distance: 12th shortest (284.2)
Sand Save Percentage: 14th best (53.49%)
Greens in Regulation Percentage: 13th worst (62.1%)
Putting Average: 42nd best (1.742)

So far, nothing suggests this course has unique attributes that golfers have to make major adjustments for. The next step is looking at the strokes gained statistics for the last nine winners of the golf tournament, prior to the BMW Championship:

Pasted Graphic 1

Winning this tournament seems to require a complete game, though occasionally winners have had negatives strokes gained statistics in putting or driving. This idea does not necessarily eliminate anyone's chances. However, nearly all of them have needed good to great approach games, which is good news for Adam Scott (1st in SG: Approach the Green) and Hideki Matsuyama (2nd).

Often times the winner has also had a stellar World Golf Ranking, which suggests Jason Day or Dustin Johnson could win everything. Five golfers can win the TOUR Championship and hoist the FedExCup without requiring any help thanks to their point totals: Johnson, Scott, Day, Reed and Paul Casey. Given how important momentum can be for winning any golf tournament, these golfers have many reasons to feel confident about their chances.

This idea is furthered when analyzing how much of an advantage the higher-ranked players have heading into the tournament, relative to the rest of the field. Consider this: after the BMW Championship, a player's points are reset to a new number based upon his ranking (to see the updated point totals,
click here). Resetting scores gives everyone a chance to win the FedExCup, even though it wipes away any commanding leads a golfer may have had leading up to the TOUR Championship. The points earned for where a player finishes at the TOUR Championship can be found here.

One way to look at the probability each golfer has for winning the FedExCup is to look at how resetting points improves or worsens each golfer's chances. The most critical assumption in this exercise is every golfer is of the same quality and has the same abilities, so everyone has an equal opportunity to win; their probability to win the TOUR Championship is 1/30, or 3.33%. But to calculate their chances of winning the FedExCup, after resetting points, requires a more rigorous approach. Using
Monte Carlo simulation, I ran 5,000 tournaments and looked at how many times each golfer finished with the highest point total. Their probabilities can be found here:

Pasted Graphic

As expected, the lower the ranking, the worse the probability. Also as expected, if you were to draw a function to fit these points, it would be logarithmic (the R^2 is .9536 suggests this function captures almost all of the variation). Dustin Johnson has a significantly better probability to win than second-place golfer Patrick Reed. After Reed, the variation levels off. Still, in this exercise, golfers ranked 1st thru 8th have a better probability of winning than if points were completely erased, and whoever won the TOUR Championship also won the FedExCup.

No matter if you are computing probabilities using golfers of similar skill set, glimpsing at historical results or looking at abilities using advanced quantitative measures, the lesson is clear: likely looking at the top of the points list is where you will find this year's season-long champion.

A New Journalism Feature

Pasted GraphicEach week, I will air a segment on Good Day on Fox 4 in Dallas/Fort Worth that takes an analytic look inside college football. First, I look at a statistical trend inferring something we saw from the weekend before, the challenges predicting games and the secrets to being a more informed fan. Second, I use data and modeling to forecast games featuring some of the favorite teams from north Texas.

I will then post these segments to YouTube and share the links on the Journalism section here. You can click Journalism at the top of the page or
click here.

Is Jordan Spieth Struggling?

IMG_3376Even before winning two majors—and nearly two more—in 2015, Jordan Spieth was one of the more popular golfers on the PGA Tour. Then, that popularity soared when the 22-year-old set many records beginning with the phrase: "Youngest golfer to…". But with enormous popularity and early success come high expectations. This year, Spieth has not won a major, only being in contention once out of three times. He also fell out of the top spot in the Official World Golf Rankings and has three fewer victories overall. Given what he did accomplish and how he's performing now, is Jordan Spieth Struggling?

Spieth defended his record and, during his performance at The Open at Royal Troon Golf Club, felt any questions about struggling was "unfair". Per
golflink.com:

"It's been tough given I think [2016 has] been a solid year," said Spieth. "I think if last year had not happened I'd be having a lot of positive questions and instead most of the questions I get are comparing to last year and therefore negative because it's not to the same standard…So that's almost tough to then convince myself you're having a good year when nobody else really…even if you guys think it is, the questions I get make me feel like it's not. So I think that's a bit unfair to me…"

Let's take an analytical look at if Jordan Spieth is struggling by his standards and, if so, by how much. The simplest way is to look at
Strokes Gained rankings and compare last year to this year. What makes Strokes Gained so useful is pointing specifically to the parts of the game a golfer may or may not be excelling at. The following statistics compare how well Spieth has done compared with the rest of the field:

Pasted Graphic

The numbers above the bars are his rankings on Tour. What also matters here are the following equations:

Off-the-Tee + Approach-the-Green + Around-the-Green = Tee-to-Green

Off-the-Tee + Approach-the-Green + Around-the-Green + Putting = Total

First, Spieth is actually performing better off the tee, but the rest of the field has caught up. Around the green and putting have remained steady or actually improved. The glaring statistic is his approach to the green. This measures all approach shots on par-4 and par-5 holes that are NOT within 30 yards from the edge of the green and includes tee shots on par-3 holes. Spieth has gone from .618 to -.016 (moving from 11th place to 118th). This statistic is further highlighted by looking at the breakdown of his rankings compared with the rest of the field:

  • 163rd in Greens in Regulation Percentage (62.3%)
  • T107th in Approaches from 75-100 yards (17' 10")
  • T109th in Approaches from 100-125 yards (20' 5")
  • T118th in Approaches from 125-150 yards (23' 9")

This information explains the discrepancy in SG: Tee-to-Green and SG: Total. It also explains the bigger discrepancy in tee-to-green versus total, because his skill at putting is included in the total, not tee-to-green. It is also worth noting, Spieth is playing in fewer tournaments this year than last. He played in 25 last season and is only through 16 this season, prior to the PGA Championship.

Let's now look solely at majors and highlight the discrepancy in Spieth's approach game:

Pasted Graphic 1

Spieth does not have the same driving accuracy, greens in regulation numbers or sand save percentage that he did in that record-breaking year.

Here is something else to consider. Perhaps one of Spieth's strengths is adapting to links courses. PGA Tour players do not play a lot on these types of courses, and while other golfers can drive the ball farther, this skill is not an advantage on a links course. But Spieth's skills as a putter and around the green do come in handy. In 2015, the U.S. Open was on a links course. Spieth won. This year, the only two domestic tournaments that even come close to those types of conditions are the AT&T Pebble Beach Pro-Am and the Hyundai Tournament of Champions. Spieth won the latter.

What Spieth said about his game and his year requires clarification. Strokes gained statistics have helped us highlight two important things about Jordan Spieth. First, his approach game has let him down much more so than last year. Second, he is not struggling with any other part of his game and in some ways he has improved. While his fans hope Spieth would have won more tournaments this year, he still has virtually as good a chance as any to capture the final major of the season.

Who Do You Trust in the 4th Quarter?

Pasted GraphicSince being named the starting quarterback for the Dallas Cowboys, Tony Romo has been in the NFL spotlight for ten seasons and 127 games. While he has put up some of the more prolific statistics of any quarterback during this time, many argue he is the most scrutinized veteran gunslinger in the 21st century. One reason is anti-analytical: blown opportunities to win games in the 4th quarter. While many of these games have been the most critical for his team's championship aspirations, it does bring up the bigger question of which quarterbacks have been the most reliable for winning a game in the 4th quarter.

In a later article we will apply analytics and look at what constitutes a "clutch" quarterback. But first, let's look at the raw statistics. The data features 42 quarterbacks spanning all eras of the NFL but who can be considered, at a minimum, marginally successful (e.g. Peyton Manning, Warren Moon, Roger Staubach, Colin Kaepernick, etc.). The 4th quarter variables are: comeback attempts, comeback wins, comeback rate and career blown leads by the QB's own defense.

First, here is a graph of the comeback success rates:

Pasted Graphic 1

Of the quarterbacks analyzed, Andrew Luck has the best 4th quarter comeback rate of anyone (63%). However, he also had the fewest attempts, so it is too soon to call him the most clutch we have ever seen. In second place is Joe Montana (56%), who many might be more willing to admit is the best in close games. Peyton Manning had the most attempts of anyone (94), but his rate is 47%.

Then comes the aforementioned Tony Romo. His rate matches is only slightly worse than Manning's. While it is below half, only five of the 42 quarterbacks studied finished better than 50%. In fact, Romo's rate is 11th best out of 42. At the other end, the worst rate among active quarterbacks belongs to Aaron Rodgers (27%). Don Meredith has the lowest success rate of anyone at 25%.

Some of these rates can be explained by analyzing blown leads by that quarterback's defense:


Pasted Graphic 2

The quarterback dealt the least clutch defense is Drew Brees, where on 31 occasions, his "D" has blown a 4th quarter lead. Fran Tarkenton ranks second with 27. Tony Romo is tied for 10th with 17. This mark is slightly above the average among the 42 quarterback studied. As for those who have fewer reasons to be upset with their defense, there is Kurt Warner (6) and, as expected, Andrew Luck (2).

Visually and expectedly, there is already a direct correlation between 4th quarter comeback rates and blown leads by defense. Still, it is worth discovering if there are statistics for each quarterback that can help explain why some successful quarterbacks are better than others at the end of football games. I will report my findings in a future article.

Special thanks to Mark Lane for putting this data together. You can follow him on Twitter
@therealmarklane.

An Upgrade to Inside Sports Analytics

Pasted Graphic
This week we made some tweaks to the website. Some of them are literally tweaks, like adding my Instagram photos to the sidebar of the "Photo Album" pages (it's edwardegrosfox4 if you would like to follow me). My LinkedIn page is also available in the sidebar of the "About" page.

But the most exciting addition is the "
Journalism" page. Occasionally I submit sports analytic reports for Fox 4 in Dallas, the TV station for which I am the Weekend Sports Anchor. These stories are available on our station's YouTube page, and now, on this website. These stories focus on athletes and teams in north Texas but it can include major events and tournaments; it also uses the same quantitative tools the blog and podcast does.

As always if you would like to offer feedback or ask questions, please contact me through social media or by using the "
Contact Edward" page.

Yes! Go for Two!

unknownIt's an odd feeling for football fans. After scoring a touchdown, the exhilaration must be contained just as quickly as it erupted, as this same offense, grinding down the field and travailing through the defensive puzzles presented, decides to go for two. The decision is rare: during the 2015 NFL season, 1,217 extra points were attempted, but only 94 times did a team go for two (7%). In fact, five teams never attempted a two-point conversion.

Pittsburgh Steelers quarterback Ben Roethlisberger suggested this week his team should go for two, every time. Though his team attempted more two-point tries than anyone else, fewer than one-fourth of the time did the Steeler offense return to the field after a touchdown.

Traditionally, this idea is irreverent. But analytically, this idea carries merit. Because 94% of extra points were converted last year, if a team always goes for two, they only need to convert 47% of the time to push. It is worth noting, a defense can return the football the length of the field for two points no matter what is being attempted. Though this happened only once and during an extra point, it could fractionally affect this expected value even if it statistically insignificant. Lifetime, teams convert their two-point attempts roughly 50% of the time, almost exactly what they need for it to be a push.

So why always go for two if it is a push and risk injury to more valuable players? And, perhaps more importantly, would this 50% success rate hold if teams went for two more frequently? Aside from the fact there is an obvious trend NFL offenses are improving and kickers are worsening (mainly because the distance of an extra point was moved back 15 yards), the following chart illustrates two-point tries:

Pasted Graphic

As expected, the 50% success rate remains relatively consistent regardless of how many times teams go for two. However, as stated before, this is a small sample size compared with the number of times a team could have gone for two, but elected for the extra point. Usually teams go for two when almost absolutely necessary. When it is not absolutely necessary, will the success rate be the same?

It's worth finding out.

Predicting Pitching Performance

Image-1Noah Syndergaard made his Major League debut last year for the New York Mets and made an immediate impact (3.24 ERA and 9.96 K/9). While his 9-7 record may not have been overly impressive, there were signs this was only the beginning. Now, Syndergaard has multiple National League player of the week awards and is one of the more reliable hurlers in the game.

But not every pitcher lives up to predictions. How can someone better determine which pitchers will become successful the following season? One of the more intriguing presentations concerning the future of baseball predictions involved creating a pitcher projection system based upon Pitch F/X (to read the paper and/or watch the presentation, click
here). The traditional ways to gauge a successful pitcher do not always perform well when forecasting how he'll do the following year. According to this research, if next season's Earned Run Average (or Runs Averaged/9 innings) is regressed onto one of these traditional metrics, here are the following R^2:

Metric R^2
K% 0.67
SIERA 0.52
xFIP 0.46
BB% 0.45
FIP 0.35
HR% 0.18
ERA 0.14
BABIP 0.04

Strikeout percentage is the most successful traditional metric when determining future success. Here are the top ten pitchers in K% in 2015:

  • Clayton Kershaw (33.82%)
  • Chris Sale (32.08%)
  • Max Scherzer (30.7%)
  • Carlos Carrasco (29.59%)
  • Chris Archer (29.03%)
  • Corey Kluber (27.65%)
  • Jacob deGrom (27.03%)
  • Jake Arrieta (27.13%)
  • Madison Bumgarner (26.93%)
  • Francisco Liriano (26.52%)

MLB is through 1/4 of the 2016 season. As it stands, here are the top ten pitchers in K% this year:

  • Jose Fernandez (35.9%)
  • Clayton Kershaw (33.7%)
  • Noah Syndergaard (32.6%)
  • Max Scherzer (31.5%)
  • Stephen Strasburg (30.9%)
  • Danny Salazar (30.3%)
  • David Price (29.4%)
  • Vincent Velasquez (28.8%)
  • Drew Smyly (28.4%)
  • Drew Pomeranz (28.3%)

While many on the 2015 list currently rank just outside of the top ten this year, it shows two things: the difficulty of predicting pitcher success given any traditional metric and it shows just how consistently dominant Clayton Kershaw and Max Scherzer really are.

This paper discussed combining the aforementioned statistics with Arsenal/Zone rating. This metric uses PitchF/X data which tracks the speed, movement and placement of every pitch relative to the strike zone. The idea is, with more data about the specifics of each pitch a pitcher throws, the pitch sequence and which pitches are most sustainable over time, it will be easier to predict success the following season.

Data scientists should always be careful about having too much data because of overfitting. In other words, too much data and too many variables mean watering down the prediction to where it is hard to find actual trends that are meaningful. Still, this is an intriguing paper and hopefully this Arsenal/Zone rating can be more readily available to baseball fans but in an easily digestible way.

You've Drafted Your Team, Now What?

image1For three days, there is a frenetic pace to the NFL Draft, where specific needs are addressed (or not), value is appropriated for each pick (or not), opponents' draft boards are analyzed and combated (or not) and undrafted free agents are debated and signed. After all of these minuscule details, there comes a bigger picture question: What the heck just happened?

Pundits and perhaps those within each franchise grade these draft classes before anyone attends a rookie minicamp or any single contract gets signed. Though there's always the inexplicable human elements, grading draft classes is becoming easier and more analytically sound.

Last year, the Harvard Sports Analysis Collective posted a blog about
predicting success based upon combine numbers. Some positions are much easier to predict than others. For instance, cornerbacks and outside linebackers with good combine numbers tend to do well in the NFL, whereas quarterbacks and wide receivers are much harder to predict.

Value also matters. Franchises will be crippled if an average player is chosen in the first or second round, while others can be bolstered by picking stellar athletes in later in rounds. Many of these problematic picks happen when a team drafts an offensive skill player.
One research paper suggested 60% of running backs and receivers taken in the 3rd-7th rounds have better average career statistics than those taken in the first and second rounds. The trend with receivers makes sense, given combine numbers not having as much predictive power, but perhaps the trend with running backs is a sign of the times (i.e. the NFL evolving to a passing league). As for quarterbacks, the more highly touted ones tend to have better careers.

So who drafted well in 2016? Without offering clear answers, perhaps surprisingly, the perennial bottom feeders in the Cleveland Browns did well. They took defensive stars with good combine numbers like Emmanuel Ogbah and Carl Nassib and, for the most part, waited on skill players in later rounds, including Ricardo Louis and Jordan Payton. Another possible surprise is the Jacksonville Jaguars. Their first round pick was a cornerback with excellent combine numbers in Jalen Ramsey. They also took defensive risks like Myles Jack and Sheldon Day, also suggesting they are focusing on addressing some of their bigger needs. As for the uncertainties, obviously the Rams and Eagles gave up a lot for quarterbacks who may or may not pan out, but the Cowboys, Texans and Redskins also drafted skill players in the first round who will be tougher to predict if they can translate to wins.

What we're learning with the latest research is the NFL Draft is not the crapshoot it once was. It will never be perfect, but it is also becoming clearer which teams are heeding this research and who prefers considering non-analytical advice.

Playoff Unpredictability

Pasted GraphicUntil recently, the Los Angeles Lakers were one of the fixtures of the NBA Playoffs, and in many seasons, the Finals. They have put together dynasties in different generations of the sport, from Magic Johnson's teams to the Shaq and Kobe era. When the Lakers were not winning titles, chances are another team was enjoying its own dynasty, like the Boston Celtics, Chicago Bulls or San Antonio Spurs. Dynasties are so commonplace in the NBA, 15 franchises in the sport's history do not have a championship (and seven of those still in existence never even made it to the Finals).

The NBA is unique in this regard: championships are won in bulk. Other leagues offer more parity, where there is a larger pool of contenders vying for a title. There may be dynasties in other sports, but there seems to be fewer of them, each shorter in duration and there stood a better chance someone unexpected can claim the sport's top prize.

Which of the four top professional sports leagues (NFL, NBA, MLB and NHL) offers the most playoff unpredictability? Is the NBA truly the most predictable? Is it significantly more predictable or marginally so?

One approach to answering these questions is by using a statistical model for each sport. Here, we will use
logistic regressions, where we will look at only wins (or points in hockey) and see how well it predicts whether a team won a championship that year. Here are some other notes for setting up this project:

- All data used begins with the 1989-90 season because
the NFL had the biggest chance to its playoff format at the turn of the new decade.

- Any season in any sport where a lockout shortened the number of games played considerably was removed (e.g., the 1998-99 NBA season, the 2012-13 NHL season, etc.)

- Though the NHL played 80 and 84 games in a few of these seasons, these numbers are not significantly different from the 82 played the rest of the dataset, so they are still used.

At first glance, every variable representing wins is statistically significant with 99% confidence, which should be obvious because you need so many wins just to make the playoffs. What matters is how well wins alone predicts championships. In statistical parlance, we will use a goodness-of-fit measure called
AIC (Akaike Information Criterion) to answer this question. As this number gets smaller, the model has a better fit. The following shows how well each model performs:

Screen Shot 2016-04-17 at 7.47.11 AM
The larger the bar, the more unpredictable the league is. Again, as expected, the NBA is the most predictable, and by a considerable margin. This model also suggests Major League Baseball is the most unpredictable, with the NFL as a close second and the NHL as a close third.

There are a number of other variables that could be added to these models to help determine who will win a championship, but the simplicity of these models makes for an easier comparison across sports.

Predicting the Masters

IMG_3374Jordan Spieth is and should be one of the favorites to win the Masters. He's had two starts at Augusta National, finished tied for second in 2014 and won it in 2015. He also has a PGA Tour victory in 2016, the Hyundai Tournament of Champions.

But, the PGA Tour's website is predicting someone different. Using an analytic formula, the site says
Phil Mickelson will win the green jacket. There are three variables used: the overall rankings for driving distance, putting and scrambling. Mickelson has the best ranking when combining all three variables, and by a lot. The second-place golfer, Jason Day, is 38 "points" lower than Mickelson but only ten points better than third and fourth place (Marc Leishman and Rickie Fowler, respectively). If this formula is completely accurate, Spieth will finish 7th.

Though the simplicity of the formula can be appreciated, any Masters prediction should include past performances. This variable is highly predictive. It explains why Fred Couples finished in the Top 20 in five of the last six years, even though he has played on the Champions Tour since 2010. It might also explain why the Masters remains the only major championship Rory McIlroy has yet to win (he has finished 8th or better the last two times at Augusta National).

Even when adding this variable, it does not take away from the argument for Mickelson. After all, he has won a pair of green jackets and finished tied 2nd in 2015, four strokes behind Spieth. It is also worth noting, of the 48 different golfers who have won the Masters, 17 won it multiple times (35.4%). Look for Mickelson, Spieth or Adam Scott to finish atop Sunday's leaderboard.