By: Edward Egros

NCAA

A New NCAA Tournament

UNADJUSTEDNONRAW_thumb_10d3
There's no doubting the increased awareness of analytics in predicting the NCAA tournament field in college basketball. Instead of just diagnosing a team's record against the Top 50, it's Rating Percentage Index or Ken Pomeroy rankings, that are becoming more commonplace. It has gotten to where data scientists are actually meeting with the NCAA to determine if one metric should be used above all others to pick tournament teams.

Perhaps surprisingly, data scientists want simpler criteria for picking teams: who wins, who loses and who have you played. This is opposed to other explanatory variables used in more advanced metrics, like margin of victory and offensive/defensive efficiency. Coaches, on the other hand, would prefer more complex formulae for determining the tournament field. Logically, this approach makes more sense from their perspective, because of competition. If a coach has figured out a style of play or way to schedule opponents that increases the likelihood of making the tournament, they develop a competitive advantage. Data scientists want to keep it simple for fans, coaches want a figure out a competitive advantage.

Perhaps in this same spirit of transparency, the tournament selection committee released "in-season" projections for the first time ever, one month before Selection Sunday. It only has the top four seeds of every region, but it is added information for where highly ranked teams really sit. As with any analytic project, more data "usually" means more robust forecasts. Already, it is easier to make more accurate assumptions and offer a better glimpse as to what the committee is looking for.

However, these in-season projections do not include the full field of 68, and what usually causes the most consternation is simply who does and does not make the dance. While it makes sense not to include the full field because you have to assume certain conference champions in mid-major conferences, something that would include all "at large" teams would provide even more information as to the criteria for inclusion.

Nothing is easy about picking 68 teams to play in a tournament, and while analytics may be helpful in forecasting a Final Four, easy-to-understand criteria can help teams and fans quell any controversy.

Evaluating Your Bracket

Pasted Graphic 1The Law of Conservation of Mass tells us: matter is neither created nor destroyed. When you burn your horribly incorrect college basketball bracket, remember, you never destroyed it, it is in another form somewhere in the universe. So instead of ignoring your transgressions, let's embrace what still exists and see which approaches were the best when predicting who will be in the Final Four.

There's a one-seed (North Carolina), a couple of two-seeds (Villanova and Oklahoma) and a 10-seed (Syracuse). There is not as much parity with this quartet as with some tournaments in the last few years. Still, some of the favorites to win the National Championship did not survive the first two weeks of this crucible. For instance, the top three teams in the Pythagorean Rating at the end of the conference tournaments are not playing in Houston. In fact,
Syracuse did not even crack the top 25, until recently. ESPN's Basketball Power Index offers these rankings: North Carolina (1), Villanova (3), Oklahoma (6) and Syracuse (39). The LRMC Basketball Rankings still has its two, three and seven, but ranks the Orange 41st.

Some computer models have resorted to predictions without solely implementing historical data. How is this possible? Microsoft's search engine, Bing, uses social media to determine which teams will survive and advance.
It has already proven successful in other sporting events like the World Cup and NFL games. But how did it fare for this tournament? Sadly for Bing, it only predicted one Final Four team correctly (North Carolina). In fact, the system predicted the Orange to lose their first game.

It should be clear by now the two schools that ruined this tournament's predictiveness: Kansas and Syracuse. The Jayhawks were the top team by nearly all accounts, yet lost in the Regional Final,
perhaps uncharacteristically. At the other end of the spectrum, Syracuse could be the worst team ever to make the Final Four. There have been 11-seeds to make it to the final weekend of the season, but many debated if Syracuse even deserved to make the tournament. Their RPI was 72 at the time of selection, worse than other schools that were not chosen (e.g. Valparaiso, San Diego St. and St. Bonaventure). Instead of the favorite vying for the National Championship, it's the controversial at-large two wins away from glory.

Even listening to me would not have been wise. Using my own system, I only correctly predicted one team (and it was a different school than what I said was coming out of that Region on Fox 4). My National Champion was knocked out during the Elite Eight (Kansas) and my second place team lost in the First Round (Michigan St.).

So what is the best way to fill out your bracket for the next tournament?

I don't know.