Sunday, October 18, 2009

The value of ad hoc statistics

More questionable baseball statistics, this time about the Yankees:

The math certainly bodes well for them. Since Major League Baseball adopted a best-of-seven format for the ALCS in 1985, the team that won Game 2 has advanced to the World Series 17 of 23 times.

I'm always wary of this sort of statistic because it can be misleading. Of course, for any Game n in a series, you expect the eventual series winner to have won that game more often than the loser, if only because in every series the winner is guaranteed to have 4 wins spread over those 4-7 games and the loser only anywhere from zero to three wins spread over the same. And of course, by definition, you expect the series winner to win Game 7 100% of the time.

So what are the a priori probabilities here? Well, running a simulation of 10,000 7 game series in which each team has a 50-50 chance of winning, these were the results:

Team A wins 4922 out of 10000 series
Series winner wins
Game 1: 6541/10000 (65%)
Game 2: 6565/10000 (66%)
Game 3: 6579/10000 (66%)
Game 4: 6643/10000 (66%)
Game 5: 5896/8732 (68%)
Game 6: 4692/6220 (75%)
Game 7: 3084/3084 (100%)

So a priori, assuming evenly matched teams, we would expect the series winner to win Game 2 about 65% of the time. The statistic in the article said that the series winner in the ALCS has won Game 2 17 out of 23 times, or about 74% of the time. Considering that the sample size is very small--23 games--this difference of 10% doesn't seem to be terribly significant.

(Bonus section: in the above example, I'm actually being conservative, because I'm assuming that the teams are evenly matched. But of course, in real life the teams are sometimes not evenly matched, in which case we should say that one team has a (e.g.) 60-40 or 70-30 chance of winning. If we run the 10,000 series simulation with 60-40 odds, we get this:

Team A wins 7140 out of 10000 series
Series winner wins
Game 1: 6766/10000 (68%)
Game 2: 6690/10000 (67%)
Game 3: 6784/10000 (68%)
Game 4: 6724/10000 (67%)
Game 5: 5853/8436 (69%)
Game 6: 4456/5748 (78%)
Game 7: 2727/2727 (100%)

If we run it with 70-30 odds, we get this:

Team A wins 8729 out of 10000 series
Series winner wins
Game 1: 7318/10000 (73%)
Game 2: 7286/10000 (73%)
Game 3: 7274/10000 (73%)
Game 4: 7305/10000 (73%)
Game 5: 5558/7532 (74%)
Game 6: 3474/4363 (80%)
Game 7: 1785/1785 (100%)

So what we see here is that, when we account for the fact that teams are not always evenly matched--that sometimes a team will have a odds-on advantage in winning each game--it only nudges the probability that the series winner will win Game 2 upwards. Which makes my case a little bit stronger....

...although, we should note that the probability probably never swings too far away from 50-50. Remember that, in the regular season, the best team in the league rarely has better than about a 65-35 advantage when it plays 165 games against all the other teams in the league (which includes a lot of crappy and mediocre teams). When you consider that in the ALCS the best teams are playing against each other, I imagine that the odds of the team favorited to win doesn't go much beyond 60, if that.)

Anyway, to conclude: the statistic cited in the article is not particularly meaningful. Moreover, it's odd to focus in on Game 2 in isolation of the fact that the Yankees also won Game 1. It seems like, if anything, the statistic we should be getting is: what are the a priori odds that the Angels will come back from 0-2 to win the series, assuming they're evenly matched with the Yankees (a good assumption, I think)? Well, assigning the Angels to "Team A":

Team A wins 1893 out of 10000 series

Doesn't look too good for the Angels.

(Photo used sans permission from here.)

2 comments:

zedzure said...

I don't need fancy math to tell me my team is playing rather poorly, losing each game due mostly to errors. Ugh.

David Morris said...

Yeah, yesterday must have been rough.

My theory though is that the cold was mostly responsible for all the bad defense--I mean, the Yankees had 3 errors, too. So I'm kind of nervous about the Dodgers in Philadelphia tonight--it's probably going to be a pretty sloppy game...