“Man, Team X is playing really well.
They’re gonna be a dangerous 4 seed in the tournament.”
“What do you mean, Team X?
Did you see them a few weeks ago against Team Y? They’re going to
lose in round one to Random-13-Seed.”
“OK, Team X might not win the whole thing, but come on, they’re not
going to lose to Random-13-Seed.”
“So not even a chance of that
happening? Why not?”
“I don’t know. I can’t put my finger on it, but I just don’t
see it.”
-
Typical conversation before Sabermetrics
This
week marks two significant milestones in the new and exciting world of
data-driven analysis of real-world events.
The first is the launching of Nate Silver’s new Grantland-esque website,
Five Thirty Eight. The comparisons to Bill Simmons’ online
domicile are striking, with typical articles with the titles “How Statisticians
Could Help Find That Missing Plane” and “Three Rules to Make Sure Economic Data
Aren’t Bunk.” Heck, even the graphic
layout looks the same (one is left wondering whether Silver can do a baseball
podcast with his Yankee fan friend, Black-O).
But in all fairness, Silver’s website is a great contribution to those
of us who are sick of glib pundits giving their opinions based solely on their
own half-assed assumptions. The rest of
you can turn on Fox News or Chris Matthews and go back to the 1970s; here in
2014, it has become imperative that machines help derive data that can support
or repudiate our assumptions about the way we live, work, and function.
The
second event is the 2014 NCAA Basketball Tournament. This means everyone and their mother is
filling out their own bracket, gleefully informing their friends that they
picked Dayton to make the Sweet 16 or have Kentucky eliminated in the first round. This is typically followed by a conversation
much like the one at the beginning of this article, with little factual
evidence supporting which teams do and don’t advance. What’s left is mostly inarticulate
speculation and awkward attempts to save face and change the subject after
Dayton loses by 20 the first Thursday of the tournament.
This
is OK for two main reasons. The first is
that no one can possibly watch every team in the NCAA play every game for an
entire season. In fact, most of the talking
heads who are supposed to know this stuff are, in reality, closer to a layman’s
interpretation of events rather than those of an “expert.” Think about it: The people who are supposed
to be most knowledgeable about NCAA Basketball should (at least, in theory) be
the members of the tournament committee, but the fact that three 15 seeds have
advanced in the last two seasons are indicators of gross misinterpretation. But this relates to the second reason, which
is that March Madness is fun because of all the upsets and
unpredictability. A perfect committee
would mean that no 9 seed or below would ever advance. This would mean that George Mason would have
been eliminated by Michigan State in the first round of the 2006 tournament,
Butler’s near-miracle 2010 run would have ended in the Sweet 16, and Florida
Gulf Coast would still be a school no one ever heard of.
But
I’m still uncomfortable with the notion that sports predictions (not only
limited to college basketball) for the most part tend to be apocryphal. We still have idiots like Lou Holtz
predicting that Notre Dame will win the national championship. Now does anyone take Lou Holtz seriously? Maybe not, but he has a job at ESPN and you
and I do not. Even my NFL playoff
prediction articles tend to be ridden with unproven gut feelings instead of
statistically-based indicators. Seattle
won this year’s Super Bowl, but because I don’t like them and because I
remember many years of the playoffs where they were one-and-done, I failed to
account for statistics that positioned them as clearly the NFL’s best
team. Human prejudices and oversights
are just as capable of producing off-target predictions as they are forecasting
the George Masons of the world. But
without data-based statistics, there’s no road map, just impressionistic gut
feelings.
This
week, I did something I’ve wanted to do for a long time: My own statistical
analysis of the NCAA Tournament. I call
my method the “Championship Range” (CR).
I used ten categories from Kenpom:
Pythagorean
Score
Adjusted
Offense
Adjusted
Defense
Adjusted
Tempo
Luck
Opponents
Strength of Schedule – Total
Opponents
Strength of Schedule – Offense
Opponents
Strength of Schedule – Defense
Non-Conference
Strength of Schedule
Then,
I looked at teams that won the NCAA championship going back to 2003. I derived a range in each of these categories
that each championship team fell into (for example, no team had an Adjusted
Defense score of under 86.4 and above 92.9).
I call this the “championship range.”
Then I found which teams in this year’s tournament fit into each of
those ranges.
Not
all statistical measures are equal, so knowing this, I scored each of Pomeroy’s
ten categories from 1-10. The category
which had the fewest total number of 2014 teams in this year’s tournament (Opponents’
D SOS) got a 10. The category which
boasted the most teams (Non-conference SOS) got a 1. Then I marked each time a 2014 team fell into
a championship range in one of the ten categories and applied the score I gave
each category. Then I added up the
scores. Based on the where the fewest
2014 tournament teams fit within the championship range, here is how I scored
Pomeroy’s categories:
Opponents
Strength of Schedule – Defense (two 2014 teams): 10 points
Pythagorean
Score (five teams): 9 points
Adjusted
Defense (10 teams): 8 points
Adjusted
Offense (23 teams): 7 points
Opponents
Strength of Schedule – Offense (26 teams): 6 points
Opponents
Strength of Schedule – Total (28 teams): 5 points
Adjusted
Tempo (40 teams): 4 points
Total
losses (41 teams): 3 points
Luck
(45 teams): 2 points
Non-Conference
SOS (55 teams): 1 point
So,
for example, let’s take my beloved Kansas Jayhawks. They fit into four championship ranges
(Adjusted Offense, Adjusted Tempo, Opponents’ Strength of Schedule – Total,
Total Losses). They could have fit into
the other Opponents SOS categories, but actually exceeded the ranges of both
(this could be considered a flaw in my method).
When you add the four championship ranges (7+4+5+3), you get a Championship
Range Score (CRS) of 19. This is higher
than Eastern Kentucky, who gets a score of 10 for fitting into the championship
ranges of Adjusted Tempo, Total Losses, Luck, and Nonconference SOS
(4+3+2+1). Because 19 is higher than 10,
I project Kansas to beat Eastern Kentucky.
However, because Eastern Kentucky has the same CRS as 9-seeded Colorado,
the committee may have seeded one or both improperly.
There
are obviously flaws in my method. One
could be: Why consider these particular ten metrics when other predictors may
be more accurate? Well, the answer is
that those are the indicators that are available to a non-subscriber on
Kenpom. Another question would be if the
wider championship ranges are less accurate than others (for example, the range
for Adjusted Tempo was broad). That’s
why I structured my analysis around the idea that the more inclusive the range,
the less exclusive the metric. Only two
teams fit within the top range, Opponents SOS – Defense (Nebraska and Wisconsin). Because of this exclusivity, this metric was
considered the most valuable.
Other
metrics tell interesting stories. Consider Luck, for example. Most teams which have won the title have had
very little oscillation in luck, either positive or negative. In other words, consistent teams play well in
the tournament. Too much luck may mean
an early exit, while too little luck may doom you to a low seed. Another interesting metric is Adjusted
Tempo. I tend to think that of the ten
categories, this is the most useless, since the last ten champions vary greatly
and tempo doesn’t necessarily reflect quality.
But in reality, the most useless metric is Non-Conference SOS. So much for those who criticize major schools
for not scheduling enough Davids.
When
all is said and done, here are the final tallies based on my analysis (I’ve put
each teams seeding in parenthesis).
Team
|
Champ. Range Score (CRS)
|
Should Be Seeded
|
Actual Seeding
|
Florida
|
41
|
1
|
1
|
Louisville
|
38
|
1
|
4
|
Wichita State
|
36
|
1
|
1
|
Virginia
|
35
|
1
|
1
|
Arizona
|
28
|
2
|
1
|
Nebraska
|
28
|
2
|
11
|
Wisconsin
|
28
|
2
|
2
|
Saint Louis
|
24
|
2
|
5
|
Iowa State
|
22
|
3
|
3
|
Michigan State
|
22
|
3
|
4
|
Oklahoma
|
22
|
3
|
5
|
Pittsburgh
|
22
|
3
|
9
|
Ohio State
|
21
|
4
|
6
|
Oregon
|
21
|
4
|
7
|
Cincinnati
|
20
|
4-5
|
5
|
San Diego State
|
20
|
4-5
|
4
|
UConn
|
20
|
4-5
|
7
|
UMass
|
20
|
4-5
|
6
|
Villanova
|
20
|
4-5
|
2
|
Kansas
|
19
|
5-6
|
2
|
NC State
|
19
|
5-6
|
12
|
Tennessee
|
19
|
5-6
|
11
|
Iowa
|
18
|
6-7
|
11
|
Michigan
|
18
|
6-7
|
2
|
Creighton
|
17
|
6-7
|
3
|
Oklahoma State
|
17
|
6-7
|
9
|
UCLA
|
17
|
6-7
|
4
|
G. Washington
|
16
|
7-8
|
9
|
Kentucky
|
16
|
7-8
|
8
|
Manhattan
|
16
|
7-8
|
13
|
Memphis
|
16
|
7-8
|
8
|
New Mexico
|
16
|
7-8
|
7
|
Baylor
|
15
|
9
|
6
|
Delaware
|
15
|
9
|
13
|
North Carolina
|
15
|
9
|
6
|
Providence
|
15
|
9
|
11
|
St. Joseph’s
|
14
|
10
|
10
|
VCU
|
14
|
10
|
5
|
Arizona State
|
13
|
10-11
|
10
|
Brigham Young
|
13
|
10-11
|
10
|
Duke
|
13
|
10-11
|
3
|
N.D. State
|
13
|
10-11
|
12
|
Kansas State
|
12
|
11
|
9
|
Stanford
|
12
|
11
|
10
|
Syracuse
|
11
|
11
|
3
|
Colorado
|
10
|
12
|
9
|
Eastern Kentucky
|
10
|
12
|
15
|
Gonzaga
|
10
|
12
|
8
|
Texas
|
10
|
12
|
7
|
Dayton
|
9
|
12
|
11
|
NM State
|
8
|
13
|
13
|
W. Michigan
|
8
|
13
|
14
|
Coastal Carolina
|
7
|
13
|
16
|
Texas Southern
|
7
|
13
|
16
|
Cal Poly
|
6
|
14-15
|
16
|
Harvard
|
6
|
14-15
|
12
|
La-Lafayette
|
6
|
14-15
|
14
|
NC Central
|
6
|
14-15
|
14
|
Steph. F. Austin
|
6
|
14-15
|
12
|
Mercer
|
5
|
15
|
14
|
Milwaukee
|
5
|
15
|
15
|
Tulsa
|
5
|
15
|
13
|
Albany
|
3
|
16
|
16
|
Weber State
|
2
|
16
|
16
|
Wofford
|
2
|
16
|
15
|
American
|
1
|
16
|
15
|
Mt. St. Mary’s
|
0
|
16
|
16
|
Xavier
|
0
|
16
|
12
|
Obviously, statistics tell an
incomplete story. Syracuse fans would be
livid to see their team, ranked first in the country one month ago, reduced to
11 seed. It would be difficult to
explain why teams like UCLA and Baylor didn’t see a favorable bounce by the
committee after their successful conference tournament runs. The human factor is absent here. This is based on raw, empirical data. But from this data, we can draw a few
interesting observations about how the 2014 NCAA Tournament might shape out:
1. The most
underseeded teams – A.K.A. potential Cinderella teams – are (in rough order)
Nebraska, Pittsburgh, North Carolina State, Tennessee, and Manhattan.
It may be surprising to see Nebraska ranked so high according to CRS. But the Huskers played a difficult schedule and were one of the two beneficiaries from that much-desired Opponent SOS- Defense championship range. Of the five teams listed above, I trust them the most to advance, and I have them losing narrowly to Arizona in the Elite Eight. Since the Cornhuskers and Wildcats each have a CRS of 28, I had to look closer at the individual matchup and determine a tiebreaker based on considering which CRs they fit into were more valuable. This may be the Sabermetric equivalent of "overtime." In other words, if Arizona-Nebraska is an Elite Eight matchup, expect it to be a very close, low scoring game.
As for the other four, Pitt has the
great misfortune of a potential round-two matchup with Florida, who I project
will win the tournament without a great deal of difficulty. The CRS does suggest that the Pitt-Colorado
8-9 matchup will actually be one of the most one-sided of the first round, so
expect a Panther blowout. North Carolina
State, Tennessee and Manhattan also play teams with a higher CRS in the round
of 64 (why couldn’t they have faced Duke or Syracuse?) Of those three, Tennessee may have the best
shot, since their CRS of 19 is only one below UMass’ 20. Whoever wins that game (which I project UMass
to) should reach the Elite Eight, where they lose to Louisville. I would love to see Manhattan advance though.
2. The most
overvalued teams – A.K.A. vulnerable to early upsets – are (in rough order) Syracuse,
Duke, VCU, Texas and Michigan.
Syracuse fit in the three lowest championship ranges and the Opponents SOS- Total, which negatively affected the Orange's CRS pretty dramatically. 14-seeded Western Michigan has a CRS of 8, which isn't great but is only three below Syracuse's 11. Watch for the upset. But even if the Orange advance, they will be awaiting a tough matchup with Ohio State in the round of 32. The Buckeyes have a CRS of 21, meaning they should beat Syracuse without great difficulty. In fact, CRS projects the Buckeyes to even beat the slightly overvalued Jayhawks to reach the Elite Eight.
The only one of the remaining four
on serious upset alert is Texas, which is projected to lose to Arizona State
based on CRS. I can believe it; Texas is
a boring team that can’t shoot, played poorly on the road, and has a hard time
slowing down teams. Arizona State is
seeded properly, and should advance.
Fortunately for the 99 percent of us who are Duke haters, the Blue
Devils shouldn’t move past the round of 32, as they have a lower CRS than
UMass, Tennessee and Iowa. It’s
interesting that for all of the talk about the “incredible” strength of the Midwest
bracket, three of its teams are overvalued, each of them are on the same half
of the bracket. This should be enticing
to Louisville and Wichita State.
It’s also noteworthy that three of
the four 3 seeds are overvalued according to CRS (Syracuse, Duke and
Creighton). Only Iowa State is worthy of
its seeding, and I project the Cyclones to reach the Elite Eight.
3. Stop getting
excited about Harvard, North Dakota State, Stephen F. Austin and Mercer.
According to CRS, those upsets aren't happening. Sorry. North Dakota State has a respectable CRS of 13, but Oklahoma is also undervalued. Stephen F. Austin may have the coolest name in the tournament and is playing an overrated VCU team but just because you win 30 games against teams like Elmhurst, Houston Baptist, and Incarnate Word does not make you a legit Cinderella contender. Duke is overrated, yes but Mercer has a lower CRS than two 16 seeds.
As for Harvard, yes, they are a
trendy pick because (A) they won as a 14 seed last year, (B) they only lost
four games this season, two of which were competitive road games against
Colorado and UConn, and (C) Cincinnati always seems like a team that loses
early. But CRS was not kind to Harvard,
as its score of 6 was derived in the three weakest categories. Cincinnati and its west coast doppelganger,
San Diego State, each have a CRS of 20, which is good enough to win their first
games, but probably not good enough to make it to next weekend. Interestingly, they were two of the most
accurately seeded teams in the tournament, according to CRS.
4. What you should
be getting excited about are these super-close games in the Round of 64
Memphis-George Washington, North Carolina-Providence, and UMass-Tennessee. In the cases of the first two matchups, the CR scores are identical, meaning that the games are virtually crapshoots. In the case of UMass-Tennessee, both stand a very good chance of advancing to the Elite Eight.
It’s interesting that only one of
these matchups is an 8-9, and the other two are 6-11. What are other matchups which according to seeding
should be good, but according to CRS will be one-sided? 7-seeded Oregon (CRS: 21) should take care of
10-seeded BYU (CRS: 13) and Oklahoma State as a 9 seed (CRS: 17) shouldn’t have
too many problems with Gonzaga (CRS: 10).
The biggest mismatch of Round One that does not involve a 1 seed? North Carolina State (CRS: 19) vs. Xavier
(CRS: 0). Yes, amazingly Xavier did not
get any points, and their 15-point loss on Tuesday night was an early
demonstration of CRS’s accuracy (Mount Saint Mary’s, Tuesday night’s other
casualty, was the only other team without a single CRS point).
5. It won't happen, but the 16 seed with the best chance of an upset is:
Weber
State. No, their CRS of 2 won’t impress
many, but they did beat North Dakota
three
times, played UCLA and BYU, and allowed the fewest 3 pointers in the country
(not that Arizona will make many). They have
a pair of 6-10 centers and have only had 51 blocked shots against them, the
fewest in the nation.
If we’re looking at realistic
double-digit seeds which could advance, I would nominate Western Michigan, Delaware,
Mercer (sort of), Eastern Kentucky, and North Dakota State (this doesn’t include Nebraska
because none of these teams are expected to win). Delaware is particularly intriguing because
everyone is overvaluing Michigan State getting healthy again. Not to indulge in the pre-Sabermetric
mentality, but doesn’t it seem like every year there is the one team that
everyone feels proud to pick but gets
stunningly upset on the first day?
I call this the “Sorrentine Effect.” And, as has been noted in
about 100 percent of text messages I’ve sent since Sunday, Eastern Kentucky
made over 300 three-pointers on the season, and is far and away the best 15
seed according to CRS.
6. The high seeded mid-majors will perform ________ in the tournament.
This specifically refers to Wichita State,
Creighton, San Diego State, New Mexico,
UMass
and Gonzaga, all of whom frequently make the tournament, but none of whom have
made it very far, save the Shockers’ Final Four run last year. The only teams which CRS projects to move on
to the Sweet 16 are Wichita State and UMass, but both are expected to
eventually fall to Louisville in the Midwest.
In other words, this may not be a great year for mid-majors.
But if I were to reject the CRS data
and purely go off the “eye” test, the team that would jump out at me most is
San Diego State. Quite simply, they play
the best defense I’ve seen of any team all season. They’ve been to the tournament before (seeded
2 in 2011), beat Kansas in Lawrence, nearly beat Arizona, and only had two
games allowing over 70 points. Xavier
Thames is an electric offensive force, and I don’t think anyone doubts Steve
Fisher’s coaching credentials. Oklahoma
and Arizona seem offensively vulnerable at times, and like Wichita State last
year, the Aztecs appear like the sneaky mid-major few people are seriously
paying attention to. They lost some CRS
points because they aren’t offensively great and their schedule was weak, but
there’s little doubt that on any given day, they could upset anyone in the
country. Miles and Jack would agree.
7.
_____ and ______
could play each other, and it would be a phenomenal game.
Clearly you can
see from the chart above which tournament teams are the most evenly
matched
for one another. But seeing as it is
pretty unlikely that the championship game will feature New Mexico vs.
Manhattan, so we need to consider which potential matchups are likely within particular regions, and which matchups live up (or fail to live up to) their hype.
The best potential for great games
comes in the West region, believe it or not.
Arizona, Wisconsin, and Nebraska have identical CRS scores of 28. Oregon, Oklahoma and San Diego State are all
above 20, and Oklahoma State and Creighton aren’t far behind. Arizona-Nebraska, Arizona-Wisconsin, or
Nebraska-Wisconsin would be epic.
Meanwhile, Dana Altman squaring off against his old team would be
must-see TV, and who knows what Marcus Smart is capable of doing on any given
night. Arizona has the weakest CRS of
any 1 seed, meaning that the West is arguably the most wide-open bracket. I still like the Wildcats, but barely.
Florida should run away with the
South, but Ohio State-Kansas, separated by two CRS points, could be a great
rematch of their memorable Final Four game in 2012 (many people here in
Lawrence still have a disdain for Aaron Craft).
Same goes for Virginia in the East; in fact, the Cavaliers are 13 CRS points
above the second-best teams in the region, Iowa State and Michigan State. UConn and Villanova would be a great (former)
Big East matchup in round two, and Cincinnati-Michigan State are only separated
by a couple of CRS points.
We’ve already demystified the “epic”
Midwest, but it is worth noting that two of the three best teams in the country
according to CRS are in that region.
Their potential Sweet 16 matchup would probably be the best game of the
entire tournament (CBS better schedule that game for prime time if they don’t
want to see some serious complaints).
The winner should have the inside track to the championship game. The Shockers’ CRS did fall slightly due to
their poor schedule, but they did fit in the championship range for Opponents
SOS- Offense. Contrary to what many “experts”
believe, Wichita State has as good a shot as anyone to win the tournament, and
the NCAA committee scheduling them in the same bracket as Louisville is a
shortsighted move, in my opinion. Whoever loses that Sweet 16 game will not be
remembered as favorably as they should be, which is a disservice to each team,
the entire tournament and all fans.
8.
At the end of it
all . . .
CRS projects
Florida over Louisville in the National Championship game (Virginia and
Arizona
round out the Final Four). However,
Louisville played the 110th toughest schedule in the country, and
outside of schizophrenic Kentucky and Tennessee, Florida’s SEC opponents weren’t
exactly exceptional. Nevertheless, both
fit within the championship range for Opponents SOS- Offense, and Florida met
Opponents SOS- Total, which ultimately pushes the Gators’ CRS score over the
Wildcats.
Florida’s toughest competition on its way to the
championship comes in the form of Pitt (CRS: 22), Ohio State (CRS: 21), and
Virginia (CRS: 35). In other words,
should they be champions at the end of it all, they will have earned it in mostly
fair fashion, playing four of the top 14 teams in the tournament (contrast this
with UConn’s 2011 championship run, where they didn’t have to square off
against a single 1 seed; according to Pomeroy, they are unquestionably the
weakest national champion of the last decade).
They also really resemble a lot of the previous NCAA champions: Not the
most talented group necessarily, but the most tight-knit and defensively sound,
with excellent senior leadership, experience in past tournaments, and solid
coaching. One of their losses was by a single
point, another was a 6-point loss in Madison on November 12. This is one of those rare years where it
appears that humans and computers agree.
As for the future of CRS, we will
have to wait and see how this year’s tournament pans out. If it is an accurate model, then I’ll look
like a genius. If not, then I’ll look
like someone who doesn’t understand the proper analytical frameworks of
Sabermetrics; in other words, I’ll look like most everyone else.
Thoughts? Disagreements? Think statistical analysis is stupid because
I didn’t predict Duke to win it all? Let
me know below.