30 May 2008

Context and baseball statistics

From this week's MLB Power Rankings on ESPN.com:
In May, Hideki Matsui has more multi-hit games (nine) than he has games in which he hasn't gotten a hit (five)

We're supposed to take this as to mean that Hideki Matsui is doing great this month. But what does it actually mean? How many hitters would we expect to have more multi-hit games than no-hit games? 1%? 10%? Stats like this out of context drive me crazy. It's pretty easy to figure it out at least for simple cases.

The probability of not getting a hit in any at bat is (1 - Batting Average), so the probability of going 4 at bats without a hit is (1-BA) to the 4th power. Here's a little program (in Python) for figuring this out:

def noHit(BA):
return (1-BA)**4

>>> noHit(.250): 0.32
>>> noHit(.300): 0.24
>>> noHit(.400): 0.13

So we can see that for even an average player, a no hit game happens only 1 in 3 games, and for a good batter, or a good batter on a real roll, these things happen rather seldom. What about Multi-Hit Games? If you have four at bats per game, there are 11 different ways of getting two or more hits (one way of getting four hits, 4 of 3, and 6 of 2). In the chart below, x = hit, and o = not hit:

xxxx = four hits

oxxx = three hits
xoxx
xxox
xxxo

xxoo = two hits
xoxo
xoox
oxxo
oxox
ooxx

xooo = one hit
oxoo
ooxo
ooox

oooo = no hits

(For those interested, the number of ways of getting 4, 3, 2, 1, 0 hits, that is, 1, 4, 6, 4, 1 is the fourth row of Pascal's triangle). So one way of calculating the probability of multi-hit game is to find the probability of a single hit game (4*BA*(1-BA)^3) add to it the probability of a no-hit game, and subtract it from 1:

def multiHit(BA):
return 1-(noHit(BA) + 4*(BA*(1-BA)**3))

>>>multiHit(.250): 0.26
>>>multiHit(.300): 0.35
>>>multiHit(.400): 0.52

So as long as you're getting 4 ABs per game, you don't really need to be a great hitter to expect to get more multi-hit games than no-hit games. In fact, a BA of just .267 will do it for you. Returning to the original post, we see that Matsui is doing better than 1:1 in May, getting a 9:5 Multi:No ratio. How good do you have to be to get that? A .320 BA will suffice. Matsui's BA has actually been slightly better than that in May, .337, but he's also been getting just under 4 ABs a game (3.83).

4 ABs a game (or even 3.8) is really only manageable if you're not having many plate appearances that don't count for at bats -- in other words, if you're not walking much. In April, Matsui was averaging only 3.1 ABs per game. This makes it much harder to have so many multi-hit games: if you have 3ABs per game, you need to be batting above .348 to get more more multi-hit games than no-hits, and have a whopping .411 to have Matsui's ratio.

Looking closely at the numbers, there's much less to cheer about for fans of Godzilla: the rise in multi-hit games came almost entirely from a drop in walks (from 12 to 7), resulting in a OBP 40 points lower in May than in April. He did not make up the difference in SLG either, dropping 50 points there. So upon closer inspection, the numbers tell an entirely different story: in everything but BA Matsui had a May that was worse than April and below his career levels.

No comments: