Ninth Inning Rally Chart

I rarely use this space to discuss game rules for the game I created, Baseball Trivia Challenge, but figured I’d make a rare exception. I sent the game to several folks to play test and have received some excellent feedback. The particular bit of feedback I want to discuss here concerns teams that enter the ninth inning trailing by a large number of runs. The reason this is a problem is because, regardless of their ability, players in Baseball Trivia Challenge cannot score more than 4 runs per at bat.

Both Earle Shamblin from Tabletop Baseball + and Nick Hawes came up with the same excellent house rule to deal with these situations. Their solution is to let the ninth batter continue the at bat as long as runs are scored. In other words, the at bat will continue unless one of three things happen: 1) the d20 die roll indicates no runs are possible; 2) the batting team’s manager answers the trivia question incorrectly; 3) the pitching team’s manager answers the trivia question correctly. It’s an innovative solution to a problem I didn’t recognize existed.

If you’ve read many of my posts you may have noticed that I’m a little over-the-top when it comes to mathematics and realism. So while I love their solution, I feel I must point out that the realism of the game will be affected if you implement this rule. Simply put, teams will score more runs than they should and relief pitcher ERAs could skyrocket. If that isn’t a problem for you, I think it’s the best solution to this problem because it puts the most pressure on both “managers.”

Another option that solves the realism issue is to use the Ninth Inning Rally Chart shown below and included with the game. I created it only after Earle and Nick pointed out the need for it, so they deserve most of the credit for it, too. You may use this chart whenever your team is down by no more than 9 runs in the ninth inning.* Simply state your intention to do so and roll the dice. Observe the rating of the player indicated (batter, pitcher, fielder) then locate the number in the cell corresponding to the player rating and the number of runs behind. If the d20 is less than or equal to the number shown, read the next trivia question to whosoever turn it is to answer. If you answer correctly or your opponent gets it wrong, you score the runs required to tie the game.

That’s all there is to it! Feel free to use either rule! Both approaches add a much-needed element of excitement to games that might otherwise be decided!

*Note that, in theory, you could use this chart to play an entire game where you selected how many runs you want to “try for” each at bat.

RTG RUNS BEHIND
1 2 3 4 5 6 7 8 9
0
1 1 1
2 2 1 1 1
3 3 2 1 1 1 1
4 4 2 1 1 1 1 1 1
5 5 3 2 1 1 1 1 1 1
6 6 3 2 2 1 1 1 1 1
7 7 4 2 2 1 1 1 1 1
8 8 4 3 2 2 1 1 1 1
9 9 5 3 2 2 2 1 1 1
10 10 5 3 3 2 2 1 1 1
11 11 6 4 3 2 2 2 1 1
12 12 6 4 3 2 2 2 2 1
13 13 7 4 3 3 2 2 2 1
14 14 7 5 4 3 2 2 2 2
15 15 8 5 4 3 3 2 2 2
16 16 8 5 4 3 3 2 2 2
17 17 9 6 4 3 3 2 2 2
18 18 9 6 5 4 3 3 2 2
19 19 10 6 5 4 3 3 2 2
20 20 10 7 5 4 3 3 3 2
21 20 11 7 5 4 4 3 3 2
22 20 11 7 6 4 4 3 3 2
23 20 12 8 6 5 4 3 3 3
24 20 12 8 6 5 4 3 3 3
25 20 13 8 6 5 4 4 3 3
26 20 13 9 7 5 4 4 3 3
27 20 14 9 7 5 5 4 3 3
28 20 14 9 7 6 5 4 4 3
29 20 15 10 7 6 5 4 4 3
30 20 15 10 8 6 5 4 4 3
31 20 16 10 8 6 5 4 4 3
32 20 16 11 8 6 5 5 4 4
33 20 17 11 8 7 6 5 4 4
34 20 17 11 9 7 6 5 4 4
35 20 18 12 9 7 6 5 4 4
36 20 18 12 9 7 6 5 5 4
37 20 19 12 9 7 6 5 5 4
38 20 19 13 10 8 6 5 5 4
39 20 20 13 10 8 7 6 5 4
40 20 20 13 10 8 7 6 5 4
41 20 20 14 10 8 7 6 5 5
42 20 20 14 11 8 7 6 5 5
43 20 20 14 11 9 7 6 5 5
44 20 20 15 11 9 7 6 6 5
45 20 20 15 11 9 8 6 6 5
46 20 20 15 12 9 8 7 6 5
47 20 20 16 12 9 8 7 6 5
48 20 20 16 12 10 8 7 6 5
49 20 20 16 12 10 8 7 6 5
50 20 20 17 13 10 8 7 6 6
51 20 20 17 13 10 9 7 6 6
52 20 20 17 13 10 9 7 7 6
53 20 20 18 13 11 9 8 7 6
54 20 20 18 14 11 9 8 7 6
55 20 20 18 14 11 9 8 7 6
56 20 20 19 14 11 9 8 7 6
57 20 20 19 14 11 10 8 7 6
58 20 20 19 15 12 10 8 7 6
59 20 20 20 15 12 10 8 7 7
60 20 20 20 15 12 10 9 8 7
61 20 20 20 15 12 10 9 8 7
62 20 20 20 16 12 10 9 8 7
63 20 20 20 16 13 11 9 8 7
64 20 20 20 16 13 11 9 8 7
65 20 20 20 16 13 11 9 8 7
66 20 20 20 17 13 11 9 8 7
67 20 20 20 17 13 11 10 8 7
68 20 20 20 17 14 11 10 9 8
69 20 20 20 17 14 12 10 9 8
70 20 20 20 18 14 12 10 9 8
71 20 20 20 18 14 12 10 9 8
72 20 20 20 18 14 12 10 9 8
73 20 20 20 18 15 12 10 9 8
74 20 20 20 19 15 12 11 9 8
75 20 20 20 19 15 13 11 9 8
76 20 20 20 19 15 13 11 10 8
77 20 20 20 19 15 13 11 10 9
78 20 20 20 20 16 13 11 10 9
79 20 20 20 20 16 13 11 10 9
80 20 20 20 20 16 13 11 10 9

When Winning Really Matters…

A while back, I was involved in a fun discussion on the Tabletop Baseball+ Facebook group page about baseball game mechanics when a friend of mine posted the following: “When I’ve got a player who will ultimately hit .220, I will use him as much as I have to but certainly not in clutch situations, even though he was used that way during the season.”

It’s an intriguing point. As a kid playing replays in the late seventies and early eighties, I didn’t much care if the lineups for the teams I was playing were optimized to score the maximum number of runs, just as I didn’t care if the relief pitcher I called in from the bullpen was particularly skilled at his craft. My goal was not to win— I was managing both teams, after all— but rather to faithfully recreate games from that season. Indeed, without cable television (much less the Internet and Baseball Reference or Retrosheet to guide me), trying to settle on a lineup was hard enough!

That all changes when you’re playing to win. All of a sudden that .220 hitter who was relied on to perform in the clutch and didn’t (he did, after all, bat just .220) is no longer a viable option in those situations. Indeed, unless his defense is extraordinary, he is probably not the best choice to start, either.

In this article, I’ll consider position players who either started and put up rotten numbers as well as position players who did not start and put up extraordinary numbers and discuss ways to handle both.

For the most part, this discussion will be limited to dealing with these situations as they apply to one-offs or a short series of games (like you might encounter when playing in a tournament, for example), not an entire season.

We’ll start with the player whose playing time was limited but nonetheless put up monster numbers, a situation I like to refer to as “The Oscar Gamble Problem” because it first came to my attention when I was a kid playing Strat-O-Matic baseball with the 1979 card set.

The Oscar Gamble Problem: The Hot Hitting Reserve

For those too young to remember, Oscar Gamble was a journeyman outfielder who played for seven different teams during a 17 year major league career. By most accounts, Gamble, a lifetime .265 hitter, enjoyed his best year in the majors playing for the Chicago White Sox in 1977, when he finished 29th in the voting for American League MVP. But I can assure you his best season actually occurred two years later when he split time between Texas and New York and didn’t receive a single MVP vote.

Don’t take my word for it. Check out the stats. In 64 games for Texas, Gamble managed to hit a career-best .335 (his previous best was .297 during the aforementioned 1977 season) and compiled enough extra-base hits to record a .522 slugging percentage. But even these gaudy statistics paled in comparison to those he compiled during his brief stint in The Big Apple later that same year, where he batted an incredible .389 and belted 11 homeruns in just 113 at bats*. His slugging percentage was .735 and his OPS a dazzling 1.187, stratospheric numbers shared by players named Ruth, Williams and Bonds.

For players of tabletop games at the time, this presented a real problem. Gamble wasn’t just a good backup outfielder for New York, he was easily their best player. If Jackson was, as he once claimed, “the straw that stirred the drink” in New York, Gamble was the drink. His statistics were clearly superior to Jackson’s and anyone else on the team for that matter. The decision to play him in left field over Piniella was a no-brainer, despite Piniella’s solid stats: .297 BA, 137 hits, and 69 RBI (third best on the team behind Jackson and Nettles).

In my experience, there are a couple of ways to deal with situations like this. One technique is to divide position players into three groups: starters, bench players, and emergency players and refer to the table below.

Pct. Games Played* Description
60% to 100% Starters. No restrictions on how they are used.
40% to less than 60% Bench Players. Can be used as pinch-hitters and defensive replacements.
Less than 40% Should be relegated to emergency situations only!

* You should feel free to adjust these percentages as you see fit.

Since Gamble played only 36 games for New York (22%), he will only be available in emergency situations (e.g., if and when Piniella is injured, etc.). Of course, you can play around with these percentages as you see fit but it’s a good idea to place at least some restrictions on these sorts of players. While less effective for Texas, in the game I created, Baseball Trivia Challenge, Gamble is worth more than 0.5 runs per game to the New York lineup, a figure that could result in as many as 10 additional wins over the course of a 162 game season or a critical victory in a 7-game series.

A second approach is to simply divide the number of games the player played by the number of games his team played during the season, convert that number to a d20 dice roll, and roll to see if the player is available before each game. For example, using this approach, Gamble would need to roll a 4 or less to be eligible to play.

While this approach will generally work to assure each player appears in the appropriate number of games, it isn’t particularly realistic, as players aren’t often shuffled in and out of the lineup every other game for no apparent  reason. Also, in the case of Oscar Gamble the main reason he missed so many games is because he wasn’t with the team until early October. As we’ll see throughout this discussion, there are no perfect solutions.

Next, we’ll consider the starter who compiled unimpressive numbers at the plate. For this example, we may as well consider the standard-bearer in this regard.

The Mendoza Line: What to do about Poor Hitting Starters

Mario Mendoza was a slick-fielding shortshop who struggled at the plate, compiling a lifetime .215 batting average. The term “Mendoza Line” was coined by his Mariner teammates, Tom Paciorek and Bruce Bochte, and was intended as a harmless joke. Today, it is frequently used to define the threshold of mediocre hitting, which is generally defined as a .200 average. If you’re below the Mendoza Line, you’re not long for “The Bigs”, or so the saying goes.

This is a harder case. The rules we discussed earlier to deal with hot-hitting reserves assures they are unlikely to be available to start in place of a weak-hitting starter, but do not prevent other players who played at least 60% of their team’s games from starting in his stead. For example, Larry Milbourne, who played 65 games at shortstop and was a better hitter, could be used to replace Mendoza for the ’79 M’s.

One approach to stop this from happening is to assign starters and backups at each position based on games played.

In the case of Menoza and Milbourne, this approach would work well, since Mendoza played in 148 games and Milbourne only 65; however, in cases when the difference between the starter and backup is much smaller, it may not work as well if it works at all. For example, Leon Roberts played 67 games in left field for Seattle in 1979, but Tom Paciorek (47 games), John Hale (34), Dan Meyer (31) and Joe Simpson (27) all saw significant time at the position.

A second approach— to create d20 ranges for each player and roll to see who plays that game— has all the drawbacks mentioned earlier. In addition, it is not only a work intensive solution, it takes control away from the manager.

A third approach would be to utilize the d20 ranges described for “hot” players but instead of rolling to see who will play that game, roll to see who is “available” to play during the series or tournament. You could eliminate those players deemed “unavailable” entirely. Thus, it would be quite likely that Gamble would not only be unavailable to start, he would be unavailable period!

This situation is precisely the case described by my friend and is very difficult to deal with. In cases when the player is clearly in the lineup due to his defensive prowess, it is not unreasonable to conclude he is more valuable than other players at that position as a result. It isn’t unreasonable, but it may not be true. Indeed, I don’t believe it to be true in the game I created, where Mendoza is a much better fielder than Milbourne but fielding plays are nonetheless relatively rare.

Part of the problem get’s to the heart of the issue my friend described. Managers in a tabletop baseball game not only know the statistics each player compiled that year, but exactly how good they are according to the ratings in the game. For example, tabletop baseball game managers know that George Brett, who led the American League with a .329 average in 1990, batted just .255 the following year, but managers John Wathan and Hal McRae, who played with Brett and witnessed his brilliance, would have had no idea. Even if they assumed Brett wasn’t the .305 hitter he was over his lifetime, they might at least assume he was a .290 hitter or, at worst, a .280 hitter. But .255? Never!

If anyone has a great way of handling situations like this, post your ideas below!

* Starting in about 1973, Gamble consistently showed good power leading to above average slugging percentages. Near the end in his career in 1984, Gamble still managed to slug 10 homeruns in just 125 at bats despite hitting just .184.

 

Pitcher Fatigue In Strat-O-Matic

When I was a kid, I played a lot of sports board games, from Strat-O-Matic (MLB baseball, NFL football, NBA basketball, NHL hockey and even a card and dice NCAA football game), to APBA (NBA basketball and Saddle Racing), to Statis-Pro (MLB baseball, NBA basketball). I played Speed Circuit (an under-rated game in my opinion), Avalon Hill’s USAC Auto Racing, Win, Place & Show, Paydirt… it’s a long list. In addition to all of that, I once played a World War II war game by Metagaming Concepts called “Hitler’s War.” I bought it because it was a “pocket game” and therefore not very expensive.

I am a huge fan of replays (even though I never made it past a few games when I played regularly as a kid) so usually when I played any of the sports games I owned I played solitaire and didn’t really care who won. I was much more focused on the stats, which I assiduously scribbled down on a scoresheet or, more often, a sheet of notebook paper.

Metagaming

Hitler’s War was different. I played against my uncle and brother and I really wanted to win. Indeed, I wanted to win so much I spent the night before preparing. And I don’t mean merely thinking through scenarios, I mean calculating the dice probabilities associated with various strategies. I was a “metagamer” before I even understood what that meant.

These days, I view metagamers and metagaming in a pretty negative light, even if I understand the desire to do it. (For the record, the game dragged on through the night and into the early morning, my early success rolling through Poland was halted and I grew irritable as my losses increased. So much for metagaming!)

Interestingly enough, a number of games seem to encourage it, including the Strat-O-Matic game company which publishes a baseball ratings guide

This article is all about metagaming. In it, I explore Strat-O-Matic’s pitcher fatigue system and compare the performance of a pitcher who has reached his point of weakness to his nominal performance when not fatigued. While I will utilize an equation or two here this is not intended to be anything more than a crude estimate.

This discussion will make more sense with an example so I will be referring to Randy Johnson’s 1995 Strat card throughout. Since I don’t have permission to include his card here, I’ll be sure to discuss only the relevant details.

Against right-handed batters, Johnson’s card includes two strikeouts that are changed to singles once he reaches his point of weakness. These occur in columns 4 and 6 and correspond to dice rolls of 7 and 9, respectively. The dice rolls are the same against lefties but occur in columns 5 and 6 respectively.

Now, I could spend a fair amount of time “reverse-engineering” Johnson’s card to show the probabilities associated with each result on his card and from these determine the probability he allows a walk, single, double, etc. I did a lot of that in preparation for this article to make sure the numbers I was calculating seemed reasonable. They did.

The mathematics of fatigue

Fortunately, trying to assess the effect fatigue has on his performance isn’t so complicated; we need only focus on the outs that get transformed to singles. To do so, we’ll need to consider how the probability of a right-handed batter getting a hit off Johnson increases when he is fatigued. This isn’t hard to do. The number of singles Johnson allows when fatigued that he would not have allowed otherwise is 6 + 4 = 10. This is due to the fact there are 6 chances of rolling a 7 and 4 chances of rolling a 9. This is from a total of 36 + 36 + 36 – 7 = 101 chances. Note that I am subtracting 7 in this case because Johnson allows a walk when a 7 is rolled in column 6 and walks don’t count as at bats and are thus ignored when calculating batting averages. At first glance, this would seem to indicate that Johnson allows opposing RH batters to bat 10/101 = .099 ≈ 100 points higher when fatigued. But this isn’t quite right.

Strat-O-Matic results are obtained from the pitcher’s card only half the time. In general, the formula looks like this:

BAnominal = [(BAb + BAp) / 2]

BAb is the batter’s batting average and BAp is batting average allowed by the pitcher. (Note: These are not raw averages. They have been adjusted to ensure players will duplicate their real-life statistics when facing the same level of competition).

Here is how things look when Johnson has reached his point of weakness and is said to be fatigued.

BAfatigued = [(BAb + BAp + 0.099) / 2] = (BAb + BAp) / 2 + .099 / 2 = BAnominal + 0.099 / 2

The last bit is important. It shows that Johnson will allow opposing batters to hits approximately 50 points higher when fatigued. I might have just said that a few paragraphs ago— it seems intuitive—  but I feel it’s important to be deliberate in these cases since our intuition is sometimes wrong. (Note: The same holds true for left-handed batters, whom Johnson faced far fewer times).

Did Strat-O-Matic get it right?

Does Randy Johnson allow opposing hitters to bat 50 points higher when he is tired? We don’t really know. According to the data (which includes both left- and right-handed batters), Johnson’s performance drop-off occurs during innings 4 thru 6, not innings 7 thru 9, but we don’t know if he reached his point of weakness in any of the 23 games he lasted into the seventh inning or further; we only know that his Strat endurance factor inning is the seventh inning. Of course, with the statistics available today, it would possible (though tedious) to calculate how many innings he pitched in real-life when he would have been considered fatigued by Strat-O-Matic rules but I won’t bother to do that here since the point isn’t to show whether or not the point of weakness rule is realistic or not. I happen to like it and suspect a big reason for it is to encourage realistic usage, which I  support.

Table 1: Randy Johnson’s 1995 Performance By Inning

Split G IP ERA PA AB H BB SO SO/W BA OBP SLG OPS
Innings 1-3 30 90 1.80 357 328 60 26 130 5.0 .183 .245 .265 .510
Innings 4-6 30 87.1 2.68 359 323 69 33 112 3.4 .214 .290 .337 .627
Innings 7-9 23 37 3.65 150 141 30 6 52 8.7 .213 .260 .312 .572

I did want to further explore how me might analyze the effects of fatigue on Johnson’s performance. We know that he will allow opposing batters to hit for a higher average but haven’t yet considered what that implies. I would like to know how many more runs he might be expected to allow under those circumstances. How much higher will his ERA be when fatigued?

The effect of fatigue on run-scoring

The table above provides an answer but it isn’t one I’d entirely trust. Most of Johnson’s stats are better in innings 7 thru 9 than they are in innings 4 thru 6 so it isn’t clear to me whether he was just extremely lucky to have allowed so few runs during those middle innings or simply unlucky in that regard in later innings. I’ll approach this question by tossing up a graph and a few equations that will make it seem like I’m doing real science when really all I’m doing is making a crude guess.

The graph below shows how a pitcher’s WHIP is related to runs scored during the 1995 American League season. Without delving into the details, I eliminated players with fewer than approximately 152 at bats and estimated runs scored using sabermetrics. Next, I plotted  the relationship between WHIP and runs scored (see below). It’s far from perfect but it provides at least a crude estimate. The linear regression equation in the white box on the right shows how changes in WHIP lead to more or less runs. For example, a 1.0 change in WHIP leads to 5.33 more or fewer runs.

Let’s try applying this equation to Johnson. Here are his statistics against right-handed batters in 1995:

Split PA  AB IP† R H BB BA WHIP R/G Estimated R/G
vs RHB 754 688 186.2 54* 145 59 .211 1.093 2.60 3.14

* The number of runs allowed vs RHB (54) and the number of runs allowed vs LHB (7) don’t add up to the 65 runs Johnson allowed. Thus, the R/G could be between 2.60 and 2.80.
† The number of innings pitched were estimated by multiplying the proportion of plate appearances versus RHB by the total number of innings pitched, 214.1.

They suggest that the linear regression equation, applied to Johnson, overestimates the number of runs he should allow by anywhere from about a third to a half of a run per game. While this isn’t the purpose of the equation, it’s important to keep in mind we’re pouring various quantities into beakers and test tubes, not performing real science.

So how do things change when Randy Johnson is fatigued? Since the only thing that changes is the number of hits (specifically singles) Johnson allows, that is all we’ll calculate: (.099 ÷ 2 × 688) ≈ 33.7 additional hits allowed. This produces a marginal WHIP rating of approximately 0.181. Since 0.181 × 5.3302 ≈ 0.96 runs per game, we conclude that Johnson’s ERA will be nearly a run higher when fatigued, an estimate that compares favorably with the actual data, which suggests a 0.97 rise in ERA from innings 4-6 to innings 7-9 (refer to Table 1 above).

Much Ado about nothing?

It obviously isn’t necessary to expend this sort of effort calculate a pitcher’s estimated performance when fatigued, but I wanted to provide a rough assessment of how fatigue effects even a pitcher of Johnson’s caliber and, as we observed, the effects are considerable. Beyond that, knowing how a pitcher performs when tired is really only useful when comparing him to relief pitchers who may replace him, and for this, it is probably enough to glace at their respective cards, noting that Johnson’s strikeouts will count as hits.

 

Just your typical 40-game World Series

A while ago I replayed the 1986 World Series between New York and Boston made famous by Bill Buckner’s infamous error that would have given Boston the victory in six games. Looking back at the box score, I was surprised to learn Buckner’s blunder wasn’t the only error committed by Boston that night— catcher Rich Gedman and right fielder Dwight Evans were both charged with errors, as well. Evans’ throwing error on a Mookie Wilson single in the fifth inning allowed Ray Knight to advance to third base; he later scored on a ground ball double play. New York also made two errors.

In the deciding game, Boston led 3-0 after five innings before New York exploded for 8 runs over the course of the next three innings and won handily, 8-5.

In my replay, I decided a 7-game series just wouldn’t do. I decided to play 40 games!

Not surprisingly, the series was close, with New York winning the series 21 games to 19. After being badly outplayed in the early games of the series, victory was made possible by an improbable run of 10 wins in 12 games which began with a narrow 3-2 win in game sixteen.

Interestingly, if the series had been just seven games, Boston would have dominated, winning in 5 games: 5-0, 6-3, 3-4, 4-2, and 4-1.

Series Stats

1986 New York (N) Batting

Name G AB RN R/G
Wally Backman 38 39 26 .667
Keith Hernandez 40 44 26 .591
Ray Knight 40 41 25 .610
Lenny Dykstra 39 33 21 .636
Gary Carter 36 37 20 .541
Rafael Santana 40 41 14 .341
Howard Johnson 20 21 12 .571
Darryl Strawberry 36 35 10 .286
Kevin Mitchell 14 15 8 .533
pitcher 20 20 3 .150
Mookie Wilson 19 19 3 .158
Lee Mazilli 4 5 2 .400
Ed Hearn 4 4 1 .250
George Foster 8 8 0 .000
Tim Teufel 2 2 0 .000
TOTALS 360 364 171 .470

1986 New York (N) Pitching

Name G GS CG W L Sv IP RN ER ERA
Bruce Berenyi 1 0 0 0 0 0 1 0 0 0.00
Roger McDowell 16 0 0 1 1 7 25 3 3 1.08
Bob Ojeda 7 7 1 5 0 0 48 13 9 1.69
John Mitchell 1 1 0 0 0 0 3 1 1 3.00
Randy Meyers 3 0 0 1 0 0 3 1 1 3.00
Dwight Gooden 8 8 3 4 1 0 59 21 20 3.05
Rick Aguilera 8 8 0 3 2 0 51 26 22 3.88
Ron Darling 8 8 3 2 4 0 54 26 26 4.33
Sid Fernandez 8 8 1 2 4 0 51 31 28 4.94
Doug Sisk 12 0 0 3 1 1 18 11 11 5.50
Randy Niemann 7 0 0 0 2 0 15 10 10 6.00
Rick Anderson 7 0 0 0 0 0 13 11 11 7.62
Jessie Orosco 15 0 0 0 4 2 25 23 22 7.92
TOTALS 101 40 8 21 19 10 366 177 164 4.03

1986 Boston Batting

Name G AB RN R/G
Wade Boggs 39 43 43 1.000
Jim Rice 40 38 25 .658
Tony Armas 40 41 24 .585
Dwight Evans 39 41 21 .512
Bill Buckner 39 40 19 .475
Marty Barrett 40 42 15 .357
Rich Gedman 39 39 11 .282
Ed Romero 11 11 6 .545
Don Baylor 20 16 4 .250
Rey Quinones 20 20 4 .200
Spike Owen 9 10 3 .300
pitcher 20 20 2 .100
Marc Sullivan 2 2 0 .000
Steve Lyons 1 1 0 .000
Dave Stapleton 1 1 0 .000
Mike Greenwell 1 1 0 .000
TOTALS 361 366 177 .484

1986 Boston Pitching

Name G GS CG W L Sv IP RN ER ERA
Calvin Schiraldi 8 0 0 1 0 0 9 0 0 0.00
Sammy Stewart 9 0 0 1 1 0 14 1 1 0.64
Mike Brown 6 0 0 2 0 1 12 2 2 1.50
Joe Sambito 8 0 0 0 0 0 12 3 3 2.25
Oil Can Boyd 8 8 5 2 3 0 64 26 22 3.09
Bruce Hurst 7 7 3 1 4 0 53 20 20 3.40
Roger Clemens 9 9 4 6 3 0 70 29 27 3.47
Steve Crawford 8 0 0 1 0 0 11 7 5 4.09
Tom Seaver 5 5 2 1 3 0 35 17 17 4.37
Tim Lollar 2 0 0 0 1 0 2 1 1 4.50
Al Nipper 6 6 1 4 1 0 39 24 22 5.08
Jeff Sellers 5 5 0 0 3 0 30 29 24 7.20
Bob Stanley 8 0 0 0 2 0 13 12 12 8.31
TOTALS 89 40 15 19 21 1 364 171 156 3.86

 

 

Maris, Mantle and the trouble with Statistics

One of the folks I watch on YouTube from time to time is genial guy named Chris, aka Strat-O-Matic Delaware (you can check him out here). He is currently immersed in a project he calls Project 61: Chase for the Babe. Fortunately, Chris is not 61-years-old and a fan of baseball so this isn’t the sort of “project” that will be used as evidence in a later sexual harassment trial. The chase, in this case, is for Babe Ruth’s 1927 home run record and the 61 refers to both the year the record was broken and the number of home runs it took to break it.

You will, of course, recall it was Roger Maris who broke Ruth’s long-standing record and that his accomplishment was met with equal parts delight and derision. There have been countless books and articles written about the chase and even a Hollywood movie.

Chris’ replay is not centered around the human drama surrounding the chase. As he describes it, it is a “no frills race to the top.” As was true in 1961, Chris is focusing on both Maris and Mantle, who, in real-life, entered September on nearly even terms with 51 and 48 home runs, respectively.

Chris’ project caught my attention about the same time I was monitoring another Strat-O-Matic replay by another affable fellow named Eric from Higher Ground Gaming (check him out here). He recently finished a computer replay of the 1978 Boston Red Sox and I have to say the results were impressive. The home run king in his replay was Jim Rice, who belted 46 round-trippers for Boston that year. In the replay, Rice hit 53.

The accuracy of his results made me think of the project Chris is engaged with. It made me wonder: What are the chances that Chris finishes the replay and neither Maris or Mantle break Babe’s record? What are the chances they both break it? I wondered, too, what the chances were that either simply obliterated Ruth’s mark.

What are the odds?

So I sat down in front of my computer, opened Excel and did what I often do in these situations: I decided I was suddenly tired, walked over to the couch and fell asleep. (Obviously I eventually woke up and returned to the task at hand).

I calculated that Maris stands about a 52% chance of breaking the record while Mantle’s chances are below the Mendoza Line at 18%. The probability of both players failing to exceed 60 home runs is about 40%. I’ll explain how these numbers were calculated in a minute. The results didn’t surprise me, but they also didn’t prepare me for other surprises to come. I’ll get to that later. But first I promised to discuss the methodology I used to calculate them.

Note: I feel I owe it to folks to show how I arrived at these numbers, in the off chance they care. Feel free to skip to the end of this article where I discuss why I found these number interesting in the first place.

A bunch of math…

The probabilities above and those we’ll utilize for this discussion will be calculated in two ways. The first is purely mathematical. I will treat both Maris’ and Mantle’s plate appearances as a series of Bernoulli (or binomial) trials. (If you need to leave your computer or phone to grab a quick nap, go for it! I understand!) Without going into detail, this approach assumes that both Maris and Mantle will hit home runs at a fixed rate or probability. I determine these probabilities by dividing the number of home runs each player hit by their number of place appearances. For example, since Maris hit 61 home runs in 698 plate appearances, his home run probability is calculated as 61 ÷ 698 ≈ 0.0874 ≈  8.74%. In a similar fashion, I calculate Mantle’s home run probability to be 54 ÷ 646 ≈ 0.0836 ≈  8.36%. (Note: You could just as easily use at bats; it makes little difference).

Looking at these numbers it might surprise you that Maris is so much more likely to break Ruth’s record than Mantle but don’t forget Maris had 52 more plate appearances. When I calculated each player’s home run totals it was based on their number plate appearances. Focusing on just Maris for the time being, I used Excel to calculate the probability he will hit 61 or more home runs as follows:

= 1 - BINOM.DIST(60,698,61/698,TRUE)

You may already have noticed a major problem with my approach: it assumes Maris’ probability of hitting a home run doesn’t change from at bat-to-at bat, much less game-to-game, which simply isn’t true†. While I won’t try to model each at bat, we can use Maris’ actual game-by-game results to calculate these probabilities, as well. This is the second method.

In an earlier article about the 50/50 model utilized by Strat-O-Matic and other games, I relied exclusively on the results of simulations to make my point when, strictly speaking, it wasn’t necessary. None of the underlying mathematics was very complicated. Simulations can be invaluable in cases when the underlying probability distributions get even remotely complicated, as is the case here. For this reason, I’ll treat Maris’ game-by-game results as the population data then create a number of “bootstrap” samples to use in our analysis.  Put another way, I’ll simulate 1 million 1961 seasons and see how his home run totals vary.

To do, this I will consider his home run probability on a game-by-game basis. I will essentially ignore those games he didn’t hit a home run since the probability is zero for those cases. For example, Maris did not hit a home run until the eleventh game of the 1961 season, a 13-11 victory over Detroit. Since he had 5 plate appearances and 1 home run in this game, I will simulate this game by giving him a 1-in-5 chance to hit a home run for each plate appearance, then record the number of home runs he hit (which can be any integer from 0 to 5).

These results are compared to the simple model using the binomial distribution in the table below.

Home runs Binomial Model  1,000,000 Trials
50 or less 7.7% 5.5%
60 or less 48.1% 47.4%
61 or more 51.9% 52.6%
65 or more 31.4% 29.6%
70 or more 12.8% 10.2%
74 or more 5.0% 3.2%

Not surprisingly, the two approaches produced similar results. Here are the results for Mantle.

Home runs Binomial Model  1,000,000 Trials
50 or less 31.5% 28.8%
60 or less 82.3% 85.3%
61 or more 17.7% 14.7%
65 or more 7.1% 4.7%
70 or more 1.6% 0.7%
74 or more 0.4% 0.1%
Finally the stuff I want to talk about!

I have taken us on a very long and circuitous route to the facts I find most interesting. Let’s start with Maris. There is a roughly 12% chance that Maris will hit 70 or more home runs, which is about the same probability as rolling a sum of 9 on two dice in Strat, not exactly a rare occurrence. And yet, I wonder how many people would look asconce at those sorts of results. How many people would find them unrealistic?

Even more extreme, there is about a 5% chance that Maris will wind up with 74 or more home runs, breaking Bonds’ future mark of 73. A 5% chance is like rolling a 3 or, put another way, the difference between Mario Mendoza and an average big league hitter. Roll a 4, and that’s about the same chance both players have of breaking Ruth’s record.

These thoughts crossed my mind when Eric (see above) was summarizing the results of his 1978 Boston Red Sox replay. Overall I thought the results looked great. But Rice hit 53 home runs when he should have hit 46 and I remember thinking I wish he’d hit 49.

It will be interesting to see how Chris’ project turns out. I don’t think Maris will reach 70 home runs. But I’ll know not to be too surprised if he does.

† Note: Neither model is as complex as Strat-O-Matic which not only models every at bat against a wide variety of pitchers but also accounts for lefty/righty and righty/lefty match-ups. In addition, there is no guarantee that the number of plate appearances will be the same in Strat-O-Matic as in real life. Indeed, we should expect Strat results that are less predictable with more variance than the numbers here.

Remembering J.R. Richard

I mentioned earlier I first got into baseball in 1979, about the same time I bought my first Strat-O-Matic baseball game that included the 1978 season. One of the teams I enjoyed playing with was the Houston Astros. One reason for that was a pitcher I’d never heard of before: J.R. Richard. Although I’ve long since parted with my 1978 (and 1979) card sets, I recall those 303 strikeouts were featured prominently on his 1978 card. I suspect the 141 walks were there, too, but I don’t really remember them. 

He as even better in 1979, fanning 313 batters while decreasing his number of walks to under 100. Houston finished second in the NL West that near, after finishing fifth the year before. Things were looking up for the team, and Richard was at the heart of it.

He was in the midst of his best season ever in 1980— and the team was, too— when disaster struck.

On July 30, 1980, Richard was warming up for a game against Philadelphia when he suddenly collapsed, the victim of a massive stroke. Although he would try, he never returned to major league baseball.

His life after baseball took a few more unexpected twists— including a period of time when he was homeless on the streets of Houston— but as was true of his pitching career, he found his groove again. I hadn’t been aware of his struggles. I’m pleased he persevered.

In 2015 he wrote a book called Still Throwing Heat chronicling his life from his time in baseball to the present day. I’ve included a picture of the cover. You can buy it on Amazon if you’re interested.

In part due to Richard (and, I must admit, also due to their unique uniforms), I decided to rate two of the teams he played on: the 1978 team and the 1980 team that lost the NLCS to the eventual World Series Champion Philadelphia Phillies. The same Phillies team Richard was due to face when his major league career came to a halt.

As fate would have it, Houston faced the Phillies in the NLCS later that year. The NLCS was only five games in 1980 (it wouldn’t be expanded to seven games until 1985). Houston jumped to a 2-1 lead then lost consecutive extra-inning affairs, allowing Philadelphia to advance to the World Series against Kansas City. No less an authority than The Sports Encyclopedia: Baseball called the 1980 NLCS “the most exciting playoff series ever staged.” Richard, of course, was no longer with the team.

I thought it might be fun to pit them against Philadelphia but with Richard on the roster to see if it changed things. It did. You can read all about it below.

Game 1: Richard outduels Carlton!

Joe Morgan scored 2 runs in the eighth inning to give the Astros a 4-3 lead and closer, Joe Sambito, shutdown slugger Mike Schmidt in the ninth to give Houston a 4-3 win and 1-0 lead in the best-of-seven series.

The game featured two of the best pitchers in the game. J.R. Richard pitched 8 innings and held Philadelphia to just 3 runs, all unearned. Phillies ace, Steve Carlton, pitched well in defeat.

1 2 3 4 5 6 7 8 9 F
Houston 0 2 0 0 0 0 0 2 0 4
Philadelphia 0 0 0 3 0 0 0 0 0 3
W – J.R. Richard (1-0) L – Steve Carlston (0-1) Sv – Joe Sambito (1)

Game 2: Late rally leads Philadelphia past Houston, 5-3!

Bake McBride and Mike Schmidt led a late rally that pushed Philadelphia past Houston to tie the series at one game a piece.

Joe Niekro started for Houston and pitched 8 quality innings, allowing just 3 runs. Reliever, Joaquin Andujar gave up the game winning runs to Schmidt in the ninth inning and was tagged with the loss.

1 2 3 4 5 6 7 8 9 F
Houston 0 0 1 0 0 2 0 0 0 3
Philadelphia 0 0 0 0 0 1 0 2 2 5
W – Kevin Saucier (1-0) L – Joaquin Andujar (0-1)

Game 3: Ryan and bullpen shutout Phillies!

Nolan Ryan pitched 5 scoreless innings before giving way to a strong performance from the Houston bullpen that included a surpise save from 23-year-old, Gordie Pladson, who finished the regular season 0-4 with a 4.35 ERA.

Meanwhile, Bob Walk (11-7 with a 4.57 ERA during the regular season) was the tough luck loser despite pitching 6 innings and allowing just 1 run.

Jose Cruz scored the game-winner for Houston.

1 2 3 4 5 6 7 8 9 F
Philadelphia 0 0 0 0 0 0 0 0 0 0
Houston 1 0 0 0 0 0 0 0 X 1
W – Nolan Ryan (1-0) L – Bob Walk (0-1) Sv – Gordie Pladson (1)

Game 4: Forsch, Houston shutout Phillies … again!

Ken Forsch pitched a complete game gem, shutting out the dangerous Philadelphia lineup for a second straight game and even contributing on offense by scoring the game-winning run in the fifth inning. Rafael Landestoy and Joe Morgan added one run each in the seventh and eighth inning to account for the final margin.

1 2 3 4 5 6 7 8 9 F
Philadelphia 0 0 0 0 0 0 0 0 0 0
Houston 0 0 0 0 1 0 1 1 X 3
W – Ken Forsch (1-0) L – Larry Christensen (0-1)

Game 5: It’s a wrap! Richard propels Houston to 4-1 series win!

J.R. Richard and Joaquin Andujar combined to shutout the slumping Phillies for the third game in a row and Houston won 5-0 to win the NLCS four games to one. Philadelphia could do nothing right and Houston little wrong in handing Steve Carlton his second loss of the series.

Over the course of 5 games, the Phillies tallied just 8 runs to Houston’s 16.

1 2 3 4 5 6 7 8 9 F
Houston 2 0 0 0 0 3 0 0 0 5
Philadelphia 0 0 0 0 0 0 0 0 0 0
W – Ken Forsch (1-0) L – Larry Christensen (0-1)

Recap

J.R. Richard had a fantastic series, pitching 15 innings while allowing no earned runs. Even so, it would be hard to give him all the credit for the victory over Philadelphia. Simply put, the Phillies didn’t play well, averaging just 1.6 runs per game and going 27-straight innings without scoring a run.

Except for game two, Mike Schmidt, who slugged 48 homeruns in the first of what would be two consecutive MVP seasons, failed to produce in critical situations and no one else on Philadelphia did either.

Houston, hardly an offensive juggernaut, received solid contributions from catcher Alan Ashby (5 runs), Jose Cruz and Joe Morgan (3 runs each) but, of course, it was the pitching staff that shined, compiling a remarkable 1.00 ERA for the series. Take away Richard, and their ERA was still just 1.50 (their ERA was 3.49 during the actual series).

It’s worth noting that both Houston and Los Angeles, who finished the season tied with 92-70 records, finished slightly ahead of Philadelphia, who tallied 91 wins against 71 losses. Indeed, the Astros entered their final series against LA with a 92-67 record, lost three straight, then needed an extra game to settle the tie.

The series may not have proved that Richard would have been enough to lead them to victory, but it did confirm that Houston was a good team in 1980 and may well have been the Phillies equal.

The 50/50 Conundrum

One of the complaints I hear most often about baseball board games like Strat-O-Matic is that they utilize a 50/50 model. This is a game mechanic where the result of the at bat is randomly read from the batter’s card half the time and the pitcher’s card the other half. Thus, the classic batter vs. pitcher duel isn’t a duel at all; it’s only ever influenced by one player or the other (and, of course, the occasional fielder).

Growing up, this bothered me, too. In fact, it bothered me most whenever I played Statis Pro Baseball and the pitcher on the mound had a large PB range (e.g., 2-9). I would look longingly at all the doubles and homeruns available on the batter’s card and know the moment I read the next fast action card and realized the result would come from the pitcher’s card they were all for naught.

I noticed it less in Strat-O-Matic because it utilized the 50/50 model and at least that seemed fair. Nevertheless, in critical situations my favorite pitcher or batter was reduced to being a mere spectator.

These days I don’t play board games very often but out of a sense of nostalgia in part driven by the work I’ve put into my own game, Trivia Challenge Baseball, I’ve been watching a lot of YouTube videos produced by folks who do and a few of them complain about it, too— or at least tout those games that include batter-pitcher interactions for each at bat as vaguely superior in this regard.

It got me thinking: does it really matter? The first thing I thought was that if you take a Strat-O-Matic batter and pitcher card and tape them together, you essentially have a single card where the pitcher and batter both influence the results of the at bat. Certainly no one would argue it wouldn’t produce the same results the two cards produced separately before they were taped together.

Of course, some might argue the 50/50 model is still in place in this example, since that batter columns are still 1 thru 3 and the pitcher columns 4 thru 6.

If we wanted to, we could cut out every dice roll result from the pitcher and batter cards and randomly assign them to a column (1 thru 6) on a new card. Indeed, we could physically create this sort of card with a pair of scissors, glue and the grotesque indifference required to defile Strat-O-Matic cards! Such a Frankenstein-like card, when constructed, would also produce the same results. None of the probabilities would have changed.

This would seem to suggest that the influence the pitcher has on the batter and vice-versa is still present in a 50/50 model and that our belief it isn’t is simply a misconception on our part.

If you aren’t buying any of this, that’s OK. Being skeptical is a good thing. So let’s explore things further. We’ll start with a definition: When are two sets of results the same?

I would posit they are the same if they both produce the “same” distribution and measures of central tendency, including streaks. To see what I mean, let’s consider a simple example: a batter’s hits. A hit, of course, is anything from a single to a homerun. I’ll assign a 1 to every hit and a 0 to every out. We’ll consider the following two sets representing 10 at bats:

Set 1 = {1, 0, 0, 1, 0, 0, 0, 1, 0, 1}

Set 2 = {1, 1, 1, 0, 0, 0, 0, 0, 1, 0}

Obviously a set of 10 outcomes is on the small side but don’t worry, I’m not making a scientific point here. Right now, I’m just trying to settle on a definition of “same.”

Note that both sets have the same mean and variance. They have the same mode and median. But they don’t quite look the same. Set 2 appears more “streaky”. When I think of the 50/50 model, I am convinced the mean is the same as the mean obtained from samples involving the combined contributions of the batter and pitcher, but I wonder about streaks.

To explore this further, imagine a .400 hitter facing a pitcher who allowed batters to hit just .150. (Though slightly more extreme, this is a little like pitting Ted Williams from 1941 against Pedro Martinez from 2000 when Ted batted .406 and Pedro allowed opposing hitters to bat just .167). Let’s assume the league average is .250. Thus, for our .400 hitter to bat his average in a 50/50 model, his card would need to represent a .550 average (since .550 × ½ + .250 × ½ = .400). Likewise, our superstar pitcher would allow just a .050 average on his card (.050 × ½ + .250 × ½ = .150). When facing each other, we expect our mythical batter to hit .300, since  .550 × ½ + .050 × ½ = .300.

I chose to use these fictional players for this example precisely because their cards are so different. I think intuitively we might be tempted to believe our .400 batter will be less streaky in a 50/50 model where his probability of getting a hit goes from .550 to .050 depending on whose card is read.

To test this theory, I’ll run a simulation where our .400 batter faces off against our superstar pitcher over a series of 100 million at bats. We will record the number of hits our batter accumulates over those at bats and calculate a few statistics. Note the combined results referred to below are those compiled by a .300 hitter while the split results are those obtained by randomly reading each result from the batter or pitcher card (i.e., our .400 batter and .050 pitcher).

Combined Results Statistic Split Results
.3001 Mean .3001
6,303,213 2 Hit Streaks 6,303,685
1,894,404 3 Hit Streaks 1,892,426
568,379 4 Hit Streaks 566,823
170,349 5 Hit Streaks 169,673
51,232 6 Hit Streaks 51,198
15,470 7 Hit Streaks 15,332
4,610 8 Hit Streaks 4,611
1,373 9 Hit Streaks 1,355
457 10 Hit Streaks 405

* Note: Steaks are defined as consisting of exactly the number of hits shown. For example, every 3 hit streak includes a 2 hit streak but these were not counted as 2 hit streaks. Put another way, all streaks are bracketed by outs.

Not surprisingly, the means are the same rounded to 4 decimals and very close to the .300 average we expected (which isn’t a surprise given the size of the sample). Even more encouraging are the streaks, which are very similar between the two sets.

If you’re the meticulous type and looked closely at the numbers in the table, you may have noticed the combined results seem to include more streaks, even though the sums are close. There are a couple of items worth noting. First, this was not true of other data sets I ran. Second, remember we are viewing the raw totals—not the percentages— and they tend to accentuate differences between the two sets.

The Law of Large Numbers governing probabilities does not imply differences will vanish as the sample size increases, only that the average (or mean) will tend to get closer to the expected value as the sample size increases. The next two graphs provide evidence of this fact. The first graph shows that the difference between the expected number of hits and the number of hits observed appears to be growing while the second graph shows how the difference between the calculated batting average and the expected batting average (.300) seems to be shrinking.

So where does this leave us? Well, for one thing, it seems to indicate that on an at bat-by-at bat basis the results from a 50/50 model are indistinguishable from those produced by models that simultaneously account for the pitcher and batter. This is a powerful statement and should be enough to end the argument.

And yet…

Perhaps, like me, you’re having a hard time getting over the fact that individual at bats are controlled by a single player. If so, I urge you to read the above paragraph again and let it sink in. We aren’t talking above averages or long term trends. If you take any set of results from a 50/50 model— including sets with just one, two or three at bats— you won’t be able to tell which model produced it.

The truth is, the batter-pitcher interactions do exist for every at bat in the 50/50 model and I can prove it. It’s the little white die you roll in Strat-O-Matic, for example, to determine which card to read. It may not look or feel like the sort of interaction you get with, say, Payoff Pitch Baseball or Replay Baseball, for example, but it’s there and produces the same outcomes.

I’ve identified the interactive mechanism but I said I’d prove it and I haven’t yet. I’ll do so by running another simulation featuring our two ballplayers, only this time I won’t roll a little white die to determine whose card to read, I’ll simply alternate between the two cards. In other words, I will use a “deterministic” 50/50 model.

Looking at the results, it is clear at a glance that the deterministic model does not produce the same results. Notice the means are the same but nothing else. It produces the kind of results critics of a 50/50 model would be correct to criticize. But, as we’ve seen, these aren’t the results 50/50 models produce.

Combined Results Statistic Split Results
.3000 Mean .3000
6,298,221 2 Hit Streaks 1,924,607
1,889,300 3 Hit Streaks 748,722
567,063 4 Hit Streaks 52,790
169,950 5 Hit Streaks 20,544
51,095 6 Hit Streaks 1,436
15,234 7 Hit Streaks 578
4,493 8 Hit Streaks 31
1,358 9 Hit Streaks 12
419 10 Hit Streaks 0

Still not convinced? How about we roll for it!