Interested in Sabermetric Fantasy Football Analysis from the Saberoticians?

Saturday, March 17, 2007

Explanation of Sabermetrics - Power Hitting

(NOTE: I realize a small error in my queries, so some of the numbers I had posted were incorrect. I fixed the error and corrected the numbers, but there was little difference in the results.)

Ask almost any sabermetrician what the most difficult facet of baseball to predict is. You will hear one answer from almost every single one: season to season pitcher performance. While I will certainly agree that this is a very difficult thing to predict, I have found trouble in predicting something else, something from the other side of home plate: power hitting.

While contact hitting remains fairly constant over the years, a hitter's ability to hit for power is always fluctuating. Younger guys are developing power, older guys are losing power. Contact hitting is much more basic. Either you are a good contact hitter or you are not. Sure, you can change your swing or learn more about it as you mature, but overall the ability to hit for contact is fairly linear. A poor contact hitter one year isn't going to become a good one the next. But a player who is a poor power hitter can become a good one the next year, which is what makes power hitting such an interesting - and difficult to study - topic.

So how do you predict power, and why is it so difficult? The most basic way is to use Doubles, Triples, or a combination of the two. The thinking behind this is that as a player matures and gets stronger, these Extra Base Hits will have enough power behind them to clear the fence instead of staying in the yard for a multi-bagger. I tested this theory among players with at least 200 At-Bats in 2004, 2005, and 2006. 73 of these players increase their AB/2B from 2004 to 2005 by more than 1.5 points. From 2005 to 2006, just 27 of these players increased their AB/HR. That’s just 37%. If 2Bs are supposed to predict power, why were they only successful 37% of the time?

Doubles and Triples, I believe, are not the best source for power prediction. Approximately 14% of all doubles are groundballs, and as we know groundballs can never be converted into a home run. Doubles are especially tricky when looking at hitters who have some speed. They may hit a line drive or a bloop into the shallow part of the outfield that would be a single for slow batters, but because of their speed are able to turn it into a double. This is nice to be able to do, but it has nothing to do with potential future home run power. Or, these same players may actually hit a ball well, but because of their speed turn their double into a triple.

This begs two questions. One, what if we exclude fast players from our sample? I took out any player who had 20 or more steals in any of the 3 years, and the percentage actually decreased to 35%.

The second question is why don’t we include triples? Well, because triples are even more flawed than doubles are. Triples are almost the sole property of fast players. Slower players will very rarely hit a triple and will almost never hit more several in a season. Again though, why not include them? They just seem to be an extension of doubles for fast players. The problem is that 42% of triples are groundballs, usually hit down the first base line and into the corner in right field so that the fast runner has a chance to make it to third. This has nothing to do with the type of power we are talking about.

If we do include 3Bs and exclude fast players, we get 59 players increasing their AB/(2B+3B) from 2004 to 2005 and 19 increasing their AB/HR from 2005 to 2006. That's about 32%.

If the flaws with doubles and triples aren't enough to persuade you, there is a whole other category of hits that prove troublesome that you won't be able to find in any box score or stat sheet. Thus far I have ignored the most important condition of a double... that it is a ball in play. Balls in play are never to be entirely trusted because once they are put in play, they are out of the control of the pitcher and the batter and put into the hands of luck and defense, a sabermetrician's biggest adversary. There are plenty of well-hit balls that do not fall for hits because of bad luck or amazing defense. On the flip-side of the coin, there are plenty of balls that fall for hits that don't deserve to because of that slow, injury-ridden left fielder that happened to be playing at the time the ball was put into play. Essentially, because doubles and triples are balls in play, they will never be capable of being fully trusted.

So basically, a plethora of stats seem to be either flawed or meaningless that were otherwise thought to be useful… some even considered to be "sabermetric" in nature. Here is a list of a few common ones:

  • Doubles
  • Triples
  • Total Bases
  • Slugging Percentage
  • Isolated Power
  • On-Base Plus Slugging Percentage (OPS)
  • Secondary Average

Now I realize I have done a good job bashing doubles and triples, but the fact remains: they are our only guide to go on for predicting power. So how should we go about prediciting power if 2Bs and 3Bs aren’t very accurate? The answer to that, right now, is that we can't - at least with the primitive stats that we have to work with. With access to batted ball data for every player, we would be able to set parameters for what is considered a well-hit ball, add up all of these for each player, and defining power could conceivably be quite easy from there. This seems to be the only sure-fire way to see who the players who have a true chance of developing - or losing - power are.

Hit Tracker is a service that seems to have realized exactly what I am talking about. Last year they tracked every home run hit by every player and, using physical science, determined each Home Run’s “True Distance,” in addition to a number of other bits of information about the hit. We use this data to determine our “True Home Run” stat. We take each home run’s true distance and the location where it went over the fence and place it into a theoretical, Average MLB Ballpark. From there, we can determine how many would have actually gone out in this ballpark and which wouldn’t have.

The problem with this is that only Home Runs are included by Hit Tracker. To get a true sense of a hitter’s power, we’d need this data on all airballs, not just the ones to clear the fence. What about the ones that should have but didn’t? Hit Tracker will be correcting this in 2007 by tracking all airballs, but unless they do this for past years, it may be a while before we can develop a very accurate power prediction system.


Unknown said...

Hey, just found your blog, and it's very interesting so far...nice to see someone willing to go out on a limb and against the grain some.

I always thought that flyball % was a big indicator of power potential (or it might have been flyball/groundball ratio), and I agree that the whole "doubles will turn into home runs eventually" theory has some flaws, but also must have some merit once the other noise is filtered out.

I haven't gone through the archives yet, but seeing how you're expecting big things from Bush and Vasquez, I was wondering if you ever have (or would be interested) in looking at pitchers who consistently over/underpeform what their peripherals indicate. Zito is the guy that comes to mind for outperforming his numbers (I'm convinced the curveball and pop-ups have something to do with it), and Vasquez and Bush are the guys that spring to mind as always pitching better than their numbers (W/L and ERA) indicate. Just a thought.

Anonymous said...

Why dont you test this theory with young players, since they are the ones that are generally in the maturation process?

What are the results if it only includes players under, say, the age of 27 or 28?

Derek Carty said...

Great idea. I'll try this out shortly.