Wednesday, March 19, 2014

2014 NCAA Tournament Predictions: A Statisical Analysis using Championship Range Score (CRS)



http://www.elhstalon.net/wp-content/uploads/2013/04/NCAA_Final_Four_Michigan_Louisville_Basketball_07942-8013.jpg

            “Man, Team X is playing really well.  They’re gonna be a dangerous 4 seed in the tournament.”
            “What do you mean, Team X?  Did you see them a few weeks ago against Team Y?  They’re going to lose in round one to Random-13-Seed.”
            “OK, Team X might not win the whole thing, but come on, they’re not going to lose to Random-13-Seed.”
            “So not even a chance of that happening?  Why not?”
            “I don’t know.  I can’t put my finger on it, but I just don’t see it.”

- Typical conversation before Sabermetrics   

This week marks two significant milestones in the new and exciting world of data-driven analysis of real-world events.  The first is the launching of Nate Silver’s new Grantland-esque website, Five Thirty Eight.  The comparisons to Bill Simmons’ online domicile are striking, with typical articles with the titles “How Statisticians Could Help Find That Missing Plane” and “Three Rules to Make Sure Economic Data Aren’t Bunk.”  Heck, even the graphic layout looks the same (one is left wondering whether Silver can do a baseball podcast with his Yankee fan friend, Black-O).  But in all fairness, Silver’s website is a great contribution to those of us who are sick of glib pundits giving their opinions based solely on their own half-assed assumptions.  The rest of you can turn on Fox News or Chris Matthews and go back to the 1970s; here in 2014, it has become imperative that machines help derive data that can support or repudiate our assumptions about the way we live, work, and function.
The second event is the 2014 NCAA Basketball Tournament.  This means everyone and their mother is filling out their own bracket, gleefully informing their friends that they picked Dayton to make the Sweet 16 or have Kentucky eliminated in the first round.  This is typically followed by a conversation much like the one at the beginning of this article, with little factual evidence supporting which teams do and don’t advance.  What’s left is mostly inarticulate speculation and awkward attempts to save face and change the subject after Dayton loses by 20 the first Thursday of the tournament. 
This is OK for two main reasons.  The first is that no one can possibly watch every team in the NCAA play every game for an entire season.  In fact, most of the talking heads who are supposed to know this stuff are, in reality, closer to a layman’s interpretation of events rather than those of an “expert.”  Think about it: The people who are supposed to be most knowledgeable about NCAA Basketball should (at least, in theory) be the members of the tournament committee, but the fact that three 15 seeds have advanced in the last two seasons are indicators of gross misinterpretation.  But this relates to the second reason, which is that March Madness is fun because of all the upsets and unpredictability.  A perfect committee would mean that no 9 seed or below would ever advance.  This would mean that George Mason would have been eliminated by Michigan State in the first round of the 2006 tournament, Butler’s near-miracle 2010 run would have ended in the Sweet 16, and Florida Gulf Coast would still be a school no one ever heard of. 
But I’m still uncomfortable with the notion that sports predictions (not only limited to college basketball) for the most part tend to be apocryphal.  We still have idiots like Lou Holtz predicting that Notre Dame will win the national championship.  Now does anyone take Lou Holtz seriously?  Maybe not, but he has a job at ESPN and you and I do not.  Even my NFL playoff prediction articles tend to be ridden with unproven gut feelings instead of statistically-based indicators.  Seattle won this year’s Super Bowl, but because I don’t like them and because I remember many years of the playoffs where they were one-and-done, I failed to account for statistics that positioned them as clearly the NFL’s best team.  Human prejudices and oversights are just as capable of producing off-target predictions as they are forecasting the George Masons of the world.  But without data-based statistics, there’s no road map, just impressionistic gut feelings.
This week, I did something I’ve wanted to do for a long time: My own statistical analysis of the NCAA Tournament.  I call my method the “Championship Range” (CR).  I used ten categories from Kenpom:

Pythagorean Score
Adjusted Offense
Adjusted Defense
Adjusted Tempo
Luck
Opponents Strength of Schedule – Total
Opponents Strength of Schedule – Offense
Opponents Strength of Schedule – Defense
Non-Conference Strength of Schedule

Then, I looked at teams that won the NCAA championship going back to 2003.  I derived a range in each of these categories that each championship team fell into (for example, no team had an Adjusted Defense score of under 86.4 and above 92.9).  I call this the “championship range.”  Then I found which teams in this year’s tournament fit into each of those ranges.
Not all statistical measures are equal, so knowing this, I scored each of Pomeroy’s ten categories from 1-10.  The category which had the fewest total number of 2014 teams in this year’s tournament (Opponents’ D SOS) got a 10.  The category which boasted the most teams (Non-conference SOS) got a 1.  Then I marked each time a 2014 team fell into a championship range in one of the ten categories and applied the score I gave each category.  Then I added up the scores.  Based on the where the fewest 2014 tournament teams fit within the championship range, here is how I scored Pomeroy’s categories:

Opponents Strength of Schedule – Defense (two 2014 teams): 10 points
Pythagorean Score (five teams): 9 points
Adjusted Defense (10 teams): 8 points
Adjusted Offense (23 teams): 7 points
Opponents Strength of Schedule – Offense (26 teams): 6 points
Opponents Strength of Schedule – Total (28 teams): 5 points
Adjusted Tempo (40 teams): 4 points
Total losses (41 teams): 3 points
Luck (45 teams): 2 points
Non-Conference SOS (55 teams): 1 point

So, for example, let’s take my beloved Kansas Jayhawks.  They fit into four championship ranges (Adjusted Offense, Adjusted Tempo, Opponents’ Strength of Schedule – Total, Total Losses).  They could have fit into the other Opponents SOS categories, but actually exceeded the ranges of both (this could be considered a flaw in my method).  When you add the four championship ranges (7+4+5+3), you get a Championship Range Score (CRS) of 19.  This is higher than Eastern Kentucky, who gets a score of 10 for fitting into the championship ranges of Adjusted Tempo, Total Losses, Luck, and Nonconference SOS (4+3+2+1).  Because 19 is higher than 10, I project Kansas to beat Eastern Kentucky.  However, because Eastern Kentucky has the same CRS as 9-seeded Colorado, the committee may have seeded one or both improperly.
There are obviously flaws in my method.  One could be: Why consider these particular ten metrics when other predictors may be more accurate?  Well, the answer is that those are the indicators that are available to a non-subscriber on Kenpom.  Another question would be if the wider championship ranges are less accurate than others (for example, the range for Adjusted Tempo was broad).  That’s why I structured my analysis around the idea that the more inclusive the range, the less exclusive the metric.  Only two teams fit within the top range, Opponents SOS – Defense (Nebraska and Wisconsin).  Because of this exclusivity, this metric was considered the most valuable.
Other metrics tell interesting stories. Consider Luck, for example.  Most teams which have won the title have had very little oscillation in luck, either positive or negative.  In other words, consistent teams play well in the tournament.  Too much luck may mean an early exit, while too little luck may doom you to a low seed.  Another interesting metric is Adjusted Tempo.  I tend to think that of the ten categories, this is the most useless, since the last ten champions vary greatly and tempo doesn’t necessarily reflect quality.  But in reality, the most useless metric is Non-Conference SOS.  So much for those who criticize major schools for not scheduling enough Davids.
When all is said and done, here are the final tallies based on my analysis (I’ve put each teams seeding in parenthesis).

Team
Champ. Range Score (CRS)
Should Be Seeded
Actual Seeding
Florida
41
1
1
Louisville
38
1
4
Wichita State
36
1
1
Virginia
35
1
1
Arizona
28
2
1
Nebraska
28
2
11
Wisconsin
28
2
2
Saint Louis
24
2
5
Iowa State
22
3
3
Michigan State
22
3
4
Oklahoma
22
3
5
Pittsburgh
22
3
9
Ohio State
21
4
6
Oregon
21
4
7
Cincinnati
20
4-5
5
San Diego State
20
4-5
4
UConn
20
4-5
7
UMass
20
4-5
6
Villanova
20
4-5
2
Kansas
19
5-6
2
NC State
19
5-6
12
Tennessee
19
5-6
11
Iowa
18
6-7
11
Michigan
18
6-7
2
Creighton
17
6-7
3
Oklahoma State
17
6-7
9
UCLA
17
6-7
4
G. Washington
16
7-8
9
Kentucky
16
7-8
8
Manhattan
16
7-8
13
Memphis
16
7-8
8
New Mexico
16
7-8
7
Baylor
15
9
6
Delaware
15
9
13
North Carolina
15
9
6
Providence
15
9
11
St. Joseph’s
14
10
10
VCU
14
10
5
Arizona State
13
10-11
10
Brigham Young
13
10-11
10
Duke
13
10-11
3
N.D. State
13
10-11
12
Kansas State
12
11
9
Stanford
12
11
10
Syracuse
11
11
3
Colorado
10
12
9
Eastern Kentucky
10
12
15
Gonzaga
10
12
8
Texas
10
12
7
Dayton
9
12
11
NM State
8
13
13
W. Michigan
8
13
14
Coastal Carolina
7
13
16
Texas Southern
7
13
16
Cal Poly
6
14-15
16
Harvard
6
14-15
12
La-Lafayette
6
14-15
14
NC Central
6
14-15
14
Steph. F. Austin
6
14-15
12
Mercer
5
15
14
Milwaukee
5
15
15
Tulsa
5
15
13
Albany
3
16
16
Weber State
2
16
16
Wofford
2
16
15
American
1
16
15
Mt. St. Mary’s
0
16
16
Xavier
0
16
12

            Obviously, statistics tell an incomplete story.  Syracuse fans would be livid to see their team, ranked first in the country one month ago, reduced to 11 seed.  It would be difficult to explain why teams like UCLA and Baylor didn’t see a favorable bounce by the committee after their successful conference tournament runs.  The human factor is absent here.  This is based on raw, empirical data.  But from this data, we can draw a few interesting observations about how the 2014 NCAA Tournament might shape out:

1. The most underseeded teams – A.K.A. potential Cinderella teams – are (in rough order) Nebraska, Pittsburgh, North Carolina State, Tennessee, and Manhattan.  

It may be surprising to see Nebraska ranked so high according to CRS.  But the Huskers played a difficult schedule and were one of the two beneficiaries from that much-desired Opponent SOS- Defense championship range.  Of the five teams listed above, I trust them the most to advance, and I have them losing narrowly to Arizona in the Elite Eight.  Since the Cornhuskers and Wildcats each have a CRS of 28, I had to look closer at the individual matchup and determine a tiebreaker based on considering which CRs they fit into were more valuable.  This may be the Sabermetric equivalent of "overtime."  In other words, if Arizona-Nebraska is an Elite Eight matchup, expect it to be a very close, low scoring game.
            As for the other four, Pitt has the great misfortune of a potential round-two matchup with Florida, who I project will win the tournament without a great deal of difficulty.  The CRS does suggest that the Pitt-Colorado 8-9 matchup will actually be one of the most one-sided of the first round, so expect a Panther blowout.  North Carolina State, Tennessee and Manhattan also play teams with a higher CRS in the round of 64 (why couldn’t they have faced Duke or Syracuse?)  Of those three, Tennessee may have the best shot, since their CRS of 19 is only one below UMass’ 20.  Whoever wins that game (which I project UMass to) should reach the Elite Eight, where they lose to Louisville.  I would love to see Manhattan advance though.

http://l2.yimg.com/bt/api/res/1.2/UeyrIB9.eLPb2ZCZV1VB.Q--/YXBwaWQ9eW5ld3M7cT04NTt3PTYzMA--/http://media.zenfs.com/en/blogs/sptusncaabexperts/USATSI_7117226_221257_lowres.jpg

2. The most overvalued teams – A.K.A. vulnerable to early upsets – are (in rough order) Syracuse, Duke, VCU, Texas and Michigan.

Syracuse fit in the three lowest championship ranges and the Opponents SOS- Total, which negatively affected the Orange's CRS pretty dramatically.  14-seeded Western Michigan has a CRS of 8, which isn't great but is only three below Syracuse's 11.  Watch for the upset. But even if the Orange advance, they will be awaiting a tough matchup with Ohio State in the round of 32.  The Buckeyes have a CRS of 21, meaning they should beat Syracuse without great difficulty.  In fact, CRS projects the Buckeyes to even beat the slightly overvalued Jayhawks to reach the Elite Eight.
            The only one of the remaining four on serious upset alert is Texas, which is projected to lose to Arizona State based on CRS.  I can believe it; Texas is a boring team that can’t shoot, played poorly on the road, and has a hard time slowing down teams.  Arizona State is seeded properly, and should advance.  Fortunately for the 99 percent of us who are Duke haters, the Blue Devils shouldn’t move past the round of 32, as they have a lower CRS than UMass, Tennessee and Iowa.  It’s interesting that for all of the talk about the “incredible” strength of the Midwest bracket, three of its teams are overvalued, each of them are on the same half of the bracket.  This should be enticing to Louisville and Wichita State.
            It’s also noteworthy that three of the four 3 seeds are overvalued according to CRS (Syracuse, Duke and Creighton).  Only Iowa State is worthy of its seeding, and I project the Cyclones to reach the Elite Eight. 

 http://www.gq.com/images/sports/2012/03/duke/duke-628.jpg

3. Stop getting excited about Harvard, North Dakota State, Stephen F. Austin and Mercer.

According to CRS, those upsets aren't happening. Sorry. North Dakota State has a respectable CRS of 13, but Oklahoma is also undervalued. Stephen F. Austin may have the coolest name in the tournament and is playing an overrated VCU team but just because you win 30 games against teams like Elmhurst, Houston Baptist, and Incarnate Word does not make you a legit Cinderella contender.  Duke is overrated, yes but Mercer has a lower CRS than two 16  seeds.
            As for Harvard, yes, they are a trendy pick because (A) they won as a 14 seed last year, (B) they only lost four games this season, two of which were competitive road games against Colorado and UConn, and (C) Cincinnati always seems like a team that loses early.  But CRS was not kind to Harvard, as its score of 6 was derived in the three weakest categories.  Cincinnati and its west coast doppelganger, San Diego State, each have a CRS of 20, which is good enough to win their first games, but probably not good enough to make it to next weekend.  Interestingly, they were two of the most accurately seeded teams in the tournament, according to CRS. 

http://static5.businessinsider.com/image/4f58db4deab8ea3c6e000030/harvards-best-basketball-player-is-leaving-school-as-a-result-of-the-academic-cheating-scandal.jpg

4.  What you should be getting excited about are these super-close games in the Round of 64

Memphis-George Washington, North Carolina-Providence, and UMass-Tennessee.  In the cases of the first two matchups, the CR scores are identical, meaning that the games are virtually crapshoots.  In the case of UMass-Tennessee, both stand a very good chance of advancing to the Elite Eight.
            It’s interesting that only one of these matchups is an 8-9, and the other two are 6-11.  What are other matchups which according to seeding should be good, but according to CRS will be one-sided?  7-seeded Oregon (CRS: 21) should take care of 10-seeded BYU (CRS: 13) and Oklahoma State as a 9 seed (CRS: 17) shouldn’t have too many problems with Gonzaga (CRS: 10).  The biggest mismatch of Round One that does not involve a 1 seed?  North Carolina State (CRS: 19) vs. Xavier (CRS: 0).  Yes, amazingly Xavier did not get any points, and their 15-point loss on Tuesday night was an early demonstration of CRS’s accuracy (Mount Saint Mary’s, Tuesday night’s other casualty, was the only other team without a single CRS point).

http://thesportsquotient.com/wp-content/uploads/2013/01/usp-ncaa-basketball_-duke-at-north-carolina-state-4_3_r536_c534.jpg

5. It won't happen, but the 16 seed with the best chance of an upset is:

Weber State.  No, their CRS of 2 won’t impress many, but they did beat North Dakota
three times, played UCLA and BYU, and allowed the fewest 3 pointers in the country (not that Arizona will make many).  They have a pair of 6-10 centers and have only had 51 blocked shots against them, the fewest in the nation. 
            If we’re looking at realistic double-digit seeds which could advance, I would nominate Western Michigan, Delaware, Mercer (sort of), Eastern Kentucky, and North Dakota State (this doesn’t include Nebraska because none of these teams are expected to win).  Delaware is particularly intriguing because everyone is overvaluing Michigan State getting healthy again.  Not to indulge in the pre-Sabermetric mentality, but doesn’t it seem like every year there is the one team that everyone feels proud to pick but gets  stunningly upset on the first day?  I call this the “Sorrentine Effect.”  And, as has been noted in about 100 percent of text messages I’ve sent since Sunday, Eastern Kentucky made over 300 three-pointers on the season, and is far and away the best 15 seed according to CRS. 

http://image.cdnllnwnl.xosnetwork.com/pics33/800/TT/TTVCIQNZDTUHQIF.20140107172125.jpg

6. The high seeded mid-majors will perform ________ in the tournament.



This specifically refers to Wichita State, Creighton, San Diego State, New Mexico,
UMass and Gonzaga, all of whom frequently make the tournament, but none of whom have made it very far, save the Shockers’ Final Four run last year.  The only teams which CRS projects to move on to the Sweet 16 are Wichita State and UMass, but both are expected to eventually fall to Louisville in the Midwest.  In other words, this may not be a great year for mid-majors.
            But if I were to reject the CRS data and purely go off the “eye” test, the team that would jump out at me most is San Diego State.  Quite simply, they play the best defense I’ve seen of any team all season.  They’ve been to the tournament before (seeded 2 in 2011), beat Kansas in Lawrence, nearly beat Arizona, and only had two games allowing over 70 points.  Xavier Thames is an electric offensive force, and I don’t think anyone doubts Steve Fisher’s coaching credentials.  Oklahoma and Arizona seem offensively vulnerable at times, and like Wichita State last year, the Aztecs appear like the sneaky mid-major few people are seriously paying attention to.  They lost some CRS points because they aren’t offensively great and their schedule was weak, but there’s little doubt that on any given day, they could upset anyone in the country.  Miles and Jack would agree.

http://media.utsandiego.com/img/photos/2014/01/08/smhhoops334589x0018_r728x492.jpg?653b3ef4c4323d1b2fd2e016b347dd065c404ac0 
7.      _____ and ______ could play each other, and it would be a phenomenal game.

Clearly you can see from the chart above which tournament teams are the most evenly
matched for one another.  But seeing as it is pretty unlikely that the championship game will feature New Mexico vs. Manhattan, so we need to consider which potential matchups are likely within particular regions, and which matchups live up (or fail to live up to) their hype.
            The best potential for great games comes in the West region, believe it or not.  Arizona, Wisconsin, and Nebraska have identical CRS scores of 28.  Oregon, Oklahoma and San Diego State are all above 20, and Oklahoma State and Creighton aren’t far behind.  Arizona-Nebraska, Arizona-Wisconsin, or Nebraska-Wisconsin would be epic.  Meanwhile, Dana Altman squaring off against his old team would be must-see TV, and who knows what Marcus Smart is capable of doing on any given night.  Arizona has the weakest CRS of any 1 seed, meaning that the West is arguably the most wide-open bracket.  I still like the Wildcats, but barely.
            Florida should run away with the South, but Ohio State-Kansas, separated by two CRS points, could be a great rematch of their memorable Final Four game in 2012 (many people here in Lawrence still have a disdain for Aaron Craft).  Same goes for Virginia in the East; in fact, the Cavaliers are 13 CRS points above the second-best teams in the region, Iowa State and Michigan State.  UConn and Villanova would be a great (former) Big East matchup in round two, and Cincinnati-Michigan State are only separated by a couple of CRS points.
            We’ve already demystified the “epic” Midwest, but it is worth noting that two of the three best teams in the country according to CRS are in that region.  Their potential Sweet 16 matchup would probably be the best game of the entire tournament (CBS better schedule that game for prime time if they don’t want to see some serious complaints).  The winner should have the inside track to the championship game.  The Shockers’ CRS did fall slightly due to their poor schedule, but they did fit in the championship range for Opponents SOS- Offense.  Contrary to what many “experts” believe, Wichita State has as good a shot as anyone to win the tournament, and the NCAA committee scheduling them in the same bracket as Louisville is a shortsighted move, in my opinion.  Whoever loses that Sweet 16 game will not be remembered as favorably as they should be, which is a disservice to each team, the entire tournament and all fans.

 http://images.wjla.com/sports/louisville_wichita_state_ncaa_ap_606.jpg

8.      At the end of it all . . .

CRS projects Florida over Louisville in the National Championship game (Virginia and
Arizona round out the Final Four).  However, Louisville played the 110th toughest schedule in the country, and outside of schizophrenic Kentucky and Tennessee, Florida’s SEC opponents weren’t exactly exceptional.  Nevertheless, both fit within the championship range for Opponents SOS- Offense, and Florida met Opponents SOS- Total, which ultimately pushes the Gators’ CRS score over the Wildcats.
             Florida’s toughest competition on its way to the championship comes in the form of Pitt (CRS: 22), Ohio State (CRS: 21), and Virginia (CRS: 35).  In other words, should they be champions at the end of it all, they will have earned it in mostly fair fashion, playing four of the top 14 teams in the tournament (contrast this with UConn’s 2011 championship run, where they didn’t have to square off against a single 1 seed; according to Pomeroy, they are unquestionably the weakest national champion of the last decade).  They also really resemble a lot of the previous NCAA champions: Not the most talented group necessarily, but the most tight-knit and defensively sound, with excellent senior leadership, experience in past tournaments, and solid coaching.  One of their losses was by a single point, another was a 6-point loss in Madison on November 12.  This is one of those rare years where it appears that humans and computers agree.
            As for the future of CRS, we will have to wait and see how this year’s tournament pans out.  If it is an accurate model, then I’ll look like a genius.  If not, then I’ll look like someone who doesn’t understand the proper analytical frameworks of Sabermetrics; in other words, I’ll look like most everyone else. 

http://www.sportstalkflorida.com/wp-content/uploads/2014/02/FL-Gators-MBB.jpg

            Thoughts?  Disagreements?  Think statistical analysis is stupid because I didn’t predict Duke to win it all?  Let me know below.

No comments:

Post a Comment