Maris, Mantle and the trouble with Statistics

One of the folks I watch on YouTube from time to time is genial guy named Chris, aka Strat-O-Matic Delaware (you can check him out here). He is currently immersed in a project he calls Project 61: Chase for the Babe. Fortunately, Chris is not 61-years-old and a fan of baseball so this isn’t the sort of “project” that will be used as evidence in a later sexual harassment trial. The chase, in this case, is for Babe Ruth’s 1927 home run record and the 61 refers to both the year the record was broken and the number of home runs it took to break it.

You will, of course, recall it was Roger Maris who broke Ruth’s long-standing record and that his accomplishment was met with equal parts delight and derision. There have been countless books and articles written about the chase and even a Hollywood movie.

Chris’ replay is not centered around the human drama surrounding the chase. As he describes it, it is a “no frills race to the top.” As was true in 1961, Chris is focusing on both Maris and Mantle, who, in real-life, entered September on nearly even terms with 51 and 48 home runs, respectively.

Chris’ project caught my attention about the same time I was monitoring another Strat-O-Matic replay by another affable fellow named Eric from Higher Ground Gaming (check him out here). He recently finished a computer replay of the 1978 Boston Red Sox and I have to say the results were impressive. The home run king in his replay was Jim Rice, who belted 46 round-trippers for Boston that year. In the replay, Rice hit 53.

The accuracy of his results made me think of the project Chris is engaged with. It made me wonder: What are the chances that Chris finishes the replay and neither Maris or Mantle break Babe’s record? What are the chances they both break it? I wondered, too, what the chances were that either simply obliterated Ruth’s mark.

What are the odds?

So I sat down in front of my computer, opened Excel and did what I often do in these situations: I decided I was suddenly tired, walked over to the couch and fell asleep. (Obviously I eventually woke up and returned to the task at hand).

I calculated that Maris stands about a 52% chance of breaking the record while Mantle’s chances are below the Mendoza Line at 18%. The probability of both players failing to exceed 60 home runs is about 40%. I’ll explain how these numbers were calculated in a minute. The results didn’t surprise me, but they also didn’t prepare me for other surprises to come. I’ll get to that later. But first I promised to discuss the methodology I used to calculate them.

Note: I feel I owe it to folks to show how I arrived at these numbers, in the off chance they care. Feel free to skip to the end of this article where I discuss why I found these number interesting in the first place.

A bunch of math…

The probabilities above and those we’ll utilize for this discussion will be calculated in two ways. The first is purely mathematical. I will treat both Maris’ and Mantle’s plate appearances as a series of Bernoulli (or binomial) trials. (If you need to leave your computer or phone to grab a quick nap, go for it! I understand!) Without going into detail, this approach assumes that both Maris and Mantle will hit home runs at a fixed rate or probability. I determine these probabilities by dividing the number of home runs each player hit by their number of place appearances. For example, since Maris hit 61 home runs in 698 plate appearances, his home run probability is calculated as 61 ÷ 698 ≈ 0.0874 ≈  8.74%. In a similar fashion, I calculate Mantle’s home run probability to be 54 ÷ 646 ≈ 0.0836 ≈  8.36%. (Note: You could just as easily use at bats; it makes little difference).

Looking at these numbers it might surprise you that Maris is so much more likely to break Ruth’s record than Mantle but don’t forget Maris had 52 more plate appearances. When I calculated each player’s home run totals it was based on their number plate appearances. Focusing on just Maris for the time being, I used Excel to calculate the probability he will hit 61 or more home runs as follows:

= 1 - BINOM.DIST(60,698,61/698,TRUE)

You may already have noticed a major problem with my approach: it assumes Maris’ probability of hitting a home run doesn’t change from at bat-to-at bat, much less game-to-game, which simply isn’t true†. While I won’t try to model each at bat, we can use Maris’ actual game-by-game results to calculate these probabilities, as well. This is the second method.

In an earlier article about the 50/50 model utilized by Strat-O-Matic and other games, I relied exclusively on the results of simulations to make my point when, strictly speaking, it wasn’t necessary. None of the underlying mathematics was very complicated. Simulations can be invaluable in cases when the underlying probability distributions get even remotely complicated, as is the case here. For this reason, I’ll treat Maris’ game-by-game results as the population data then create a number of “bootstrap” samples to use in our analysis.  Put another way, I’ll simulate 1 million 1961 seasons and see how his home run totals vary.

To do, this I will consider his home run probability on a game-by-game basis. I will essentially ignore those games he didn’t hit a home run since the probability is zero for those cases. For example, Maris did not hit a home run until the eleventh game of the 1961 season, a 13-11 victory over Detroit. Since he had 5 plate appearances and 1 home run in this game, I will simulate this game by giving him a 1-in-5 chance to hit a home run for each plate appearance, then record the number of home runs he hit (which can be any integer from 0 to 5).

These results are compared to the simple model using the binomial distribution in the table below.

Home runs Binomial Model  1,000,000 Trials
50 or less 7.7% 5.5%
60 or less 48.1% 47.4%
61 or more 51.9% 52.6%
65 or more 31.4% 29.6%
70 or more 12.8% 10.2%
74 or more 5.0% 3.2%

Not surprisingly, the two approaches produced similar results. Here are the results for Mantle.

Home runs Binomial Model  1,000,000 Trials
50 or less 31.5% 28.8%
60 or less 82.3% 85.3%
61 or more 17.7% 14.7%
65 or more 7.1% 4.7%
70 or more 1.6% 0.7%
74 or more 0.4% 0.1%
Finally the stuff I want to talk about!

I have taken us on a very long and circuitous route to the facts I find most interesting. Let’s start with Maris. There is a roughly 12% chance that Maris will hit 70 or more home runs, which is about the same probability as rolling a sum of 9 on two dice in Strat, not exactly a rare occurrence. And yet, I wonder how many people would look asconce at those sorts of results. How many people would find them unrealistic?

Even more extreme, there is about a 5% chance that Maris will wind up with 74 or more home runs, breaking Bonds’ future mark of 73. A 5% chance is like rolling a 3 or, put another way, the difference between Mario Mendoza and an average big league hitter. Roll a 4, and that’s about the same chance both players have of breaking Ruth’s record.

These thoughts crossed my mind when Eric (see above) was summarizing the results of his 1978 Boston Red Sox replay. Overall I thought the results looked great. But Rice hit 53 home runs when he should have hit 46 and I remember thinking I wish he’d hit 49.

It will be interesting to see how Chris’ project turns out. I don’t think Maris will reach 70 home runs. But I’ll know not to be too surprised if he does.

† Note: Neither model is as complex as Strat-O-Matic which not only models every at bat against a wide variety of pitchers but also accounts for lefty/righty and righty/lefty match-ups. In addition, there is no guarantee that the number of plate appearances will be the same in Strat-O-Matic as in real life. Indeed, we should expect Strat results that are less predictable with more variance than the numbers here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.