Too Many Stats
Manny Machado might have been right about the number of advanced metrics surrounding the game of baseball.
I agree with Manny Machado.
A bit.
In recent comments with the San Diego media, Machado expressed his lack of appreciation for the new advanced metrics that have taken over the game of baseball during the last decade-plus.
“I just wish we can get the analytics out of the way,” Machado told reporters. “I think there’s too many stats out there.”
Machado didn’t know or understand some of the metrics, including some metrics that have become common enough to be placed on a scoreboard. Two examples brought up were wOBA and FIP. Machado didn’t understand either, and on the spot, didn’t immediately grasp the concept when given the explanations. His comments on analytics begin at the 7:45 mark.
The comments from Machado come from a player who doesn’t understand what the stats mean, versus someone with knowledge of the stats saying they’re worthless. To be fair, Machado’s job isn’t to understand FIP or wOBA. His job is to play baseball. The stats evaluate how well he’s playing baseball.
Once you go down the rabbit hole of stats as an evaluation metric, you eventually reach a crossroads. You’ll eventually get two sets of stats trying to tell the same thing in a different way. At that point, you need to decide which stat is legitimate, and which stat is missing the mark.
That’s when stats can become a problem.
RE24 on Pirates Prospects
I took off from baseball on Saturday.
As usual, my day was spent thinking about baseball.
I gave myself credit that I wasn’t on the computer working on PiratesProspects.com, or checking scores throughout the day.
My thought process was centered around the best approach to evaluate past player performance. I was working on adding a scoring system to Pirates Prospects to evaluate who was the best player in a given day, week, month, or year.
What I ended up settling on was RE24.
Honestly, I almost shared the cluelessness of Machado in the video above.
I’m not embarrassed to say I didn’t know what RE24 was without access to a computer all day. I had heard of the stat before, but I didn’t remember it by heart.
My thought process was that the best approach to evaluate past results would be to use the Run Expectancy matrix as a baseline for expectations in every situation, then to use the player’s results of every play to calculate how much positive or negative expected value he added.
When I went looking for an existing stat that accomplished my ideal evaluation method, RE24 was an easy stat to find, and I felt stupid for a moment that I didn’t remember it existed.
Created by Tom Tango in 2009, RE24 measures a player’s offensive or pitching value by calculating the change in expected runs from the start of each plate appearance or play to the end.
I used this Run Expectancy matrix by Ben Clemens, updated for the 2021-2025 seasons, for my own purposes.
After about eight hours of work on Sunday, I had incorporated RE24 into every box score, play by play, and used it to calculate the Player of the Day, Player of the Week, and Player of the Month for every level in the Pirates’ minor league system.
You can find the Player of the Day every night in the Box Score archive.
The Player of the Week and Player of the Month are stored on the Stats page.
Problems with RE24
If you want to evaluate a player’s results, there are plenty of advanced metrics you can use.
WAR is the most popular value metric. FIP for pitchers and wOBA or wRC+ for hitters also are popular.
A lot of the more popular advanced metrics are designed in a way to be context neutral.
They’re meant to show whether a batter would have value regardless of whether he plays at Coors Field or Petco Park. They’re meant to show the value a pitcher has, regardless of park factors or the infield defense behind him.
Mostly, advanced metrics have grown as evaluation metrics for front office decisions.
RE24 isn’t forward projecting. It is one of the metrics that relies on the averages. If a batter comes to the plate with one out and the bases loaded, there’s a 1.61 run expectancy based on the historical averages of similar situations. That expectancy doesn’t factor the unique situation the batter finds himself in, or the batter’s own history.
In the previously mentioned play, if the batter strikes out, pops out in the infield, or grounds into a fielder’s choice that gets the runner at home for the out, the result is the same. The batter left the situation with the bases loaded and two outs, which drops the RE to 0.96. The result of that play would be -0.65 RE.
There are some flaws with this approach. For one, it doesn’t factor in the game situation. The bases loaded in the 9th inning, down by two runs is a much different situation than the bases loaded in the 6th inning, up by eight runs. If you want to evaluate the value of a player, you would want to factor how that player performs in clutch situations.
Flaws exist with all value-based metrics. The previously mentioned FIP (Fielder Independent Pitching) doesn’t factor in the runs actually given up by a pitcher, instead giving an expected result derived from a historical constant and from other stats the pitcher allowed.
No individual stat is going to be perfect, and widely accepted.
That’s why I’m going to spend the rest of this article explaining why we all need to go back to pitcher wins and batter RBIs as the gold standard for evaluating baseball players.
RE24 Standouts
No, I’m not actually going to do that.
I used to never allow contributors to use RBIs in any articles on Pirates Prospects. I’ve softened my stance. That metric shows up throughout the new site. But I do believe advanced metrics are superior to the old counting stats. In some cases, they replace the eye test, at least for the initial filtering of players.
But there are flaws with every stat, and there are bound to be disagreements.
After spending all day implementing the RE24 metric on Pirates Prospects, I noticed two standout situations immediately.
6/7 Player of the Day
First came the Player of the Day results for the 6/7 games, which was the very first set of games under the new system.
Greensboro had a monster game in the hitter friendly confines in Asheville, winning 16-14. Yordany De Los Santos picked up four hits, including two home runs. Murf Gray added three hits, two of which were home runs. Both batters knocked in four runs, accounting for half of Greensboro’s runs on the day.
You’d think one of them would have been the player of the day.
You’d be wrong.
The player of the day ended up being their teammate, Camden Janik, who had zero RBIs. Janik did go 3-for-4 with a walk, but had a quiet day by comparison to the two multi-home run results.
How did this happen? Let’s take a look at the plays for each player.
Yordany De Los Santos:
Solo Home Run: +1.00
Three Run Homer, Runners at 2nd and 3rd, One Out: +1.84
Single, One Out, None On: +0.23
Single, One Out, Runner Moves From 1st to 3rd: +0.83
Strikeout, One Out, Runners at 1st and 2nd: -0.54
Force Out at Home, Bases Loaded, No Outs: -1.08
Strikeout, Two Outs, Runners at 1st and 2nd: -0.42
The three run shot was only worth +1.84, due to the RE for that situation (runners at second and third, one out) being 1.41. De Los Santos ended up plating three runs with his homer. He got credit for all three runs, but essentially got a penalty for being in a favorable situation that expected him to plate at least one run by the averages.
De Los Santos executed twice on the home runs, but had other high expectancy situations where the outs he generated lowered the run expectancy enough to wipe one of the home runs off the board.
Murf Gray:
Two Run Homer, One Out: +1.77
Single, No On, Two Outs: +0.11
Two Run Homer, Two Outs: +1.79
Strikeout, Runner on Third, Two Outs: -0.32
Groundout, Bases Loaded, Two Outs: -0.96
Strikeout, Bases Loaded, Two Outs: -0.96
Gray had two homers that added value. He also came up empty on two inning-ending plays with the bases loaded. Combined, those plays worked to wipe out the value added by one of his home runs.
Camden Janik:
Single, No Outs, Runner Goes From 1st to 3rd: +0.96
Hit By Pitch, Two Outs, Runners at 1st and 2nd: +0.29
Single, No On, No Outs: +0.39
Fly Out, Runner at 1st, Two Outs: -0.21
Walk, Runners at 1st and 2nd, No Outs: +1.14
Single, Runner at 1st, One Out: +0.48
The biggest play from Janik came in the eighth, when he walked to load the bases with no outs, increasing the run expectancy by 1.14. While De Los Santos and Gray both had multiple bigger plays, Janik stood out for producing almost exclusively positive value, only failing in a low-upside situation. The other two brought runs across the plate, but Janik kept the train moving on the bases all day.
At the end of the day, Janik had 3.0 RE24. Gray had 1.43. De Los Santos had 1.86. The closest competitor to Janik was Connor Wietgrefe with a 2.4 in Altoona, as a result of five shutout innings with one hit and no walks. That performance helped to make Wietgrefe the Pitcher of the Week.
From a value standpoint, De Los Santos, Gray, and even Wietgrefe would have likely ended up ahead of Janik.
There would be stats that calculate how improbable it would be for Janik to continue picking up three singles per game, and that would discount the game situation he was in when those singles occurred. Likewise, there would be stats that played up the multiple home runs from De Los Santos and Gray, or included the six strikeouts from Wietgrefe.
I picked RE24 because it accounts for the actual game situations, instead of removing the results from the game situations and placing them in a neutral environment of evaluation.
5/31 Pitcher of the Week
The other situation that stood out to me was the 5/31 Pitcher of the Week.
Carlson Reed got a start in Greensboro that week, throwing seven no-hit innings with seven strikeouts as part of a combined effort no-hitter.
However, the Pitcher of the Week went to Adolfo Oviedo, who made two relief appearances for Bradenton, recording two scoreless innings each.
The kicker was the 5/31 appearance. Oviedo entered the game in the fourth inning with no outs and the bases loaded. He recorded a strikeout (+0.63) then ended the inning with a double play (+1.41). In two plate appearances, he sent a 2.69 RE crashing down to zero. Then, he pitched another scoreless frame, adding to his two scoreless innings earlier in the week.
Reed stood out for a rare night where he was dominant across seven innings. Oviedo stood out for being shut down in the toughest situation that exists in the run expectancy matrix. When comparing the two, Reed had three more innings, one less walk, one less hit, and five more strikeouts.
Advanced metrics would favor the strikeouts from Reed, but for RE, a strikeout is no different than a groundout, as long as the runners don’t advance to a more favorable situation to offset the out.
There would be metrics that project Reed with a better future for his results.
Oviedo got credit for the game situation he was thrown into, and for shutting down a massive scoring threat. Reed almost got penalized for never allowing a scoring threat to materialize. A perfect inning only adds 0.48 total RE. That means Reed needed four perfect innings to match the value of Oviedo’s shutdown fourth inning on 5/31.
The Real Problem With Advanced Stats
I outlined those situations above to display a few situations where advanced stats could view the same situation in a different light.
Oviedo vs Reed has two arguments. One is that Reed put up a rare, dominant performance that speaks to his abilities in a projectable way going forward. The other argument is that Oviedo produced a shutdown result in a historically difficult situation to exit without giving up runs.
De Los Santos/Gray vs Janik offers a comparison of consistent value versus big play ability. The big plays are more projectable for the future, and rate positively when you penalize singles for their difficulty to repeat as consistently as Janik produced them. Meanwhile, there could be an argument for the classic “not trying to do too much” line, where a player simply executes in almost every situation he’s in, consistently improving the chances for runs. Janik didn’t have a big RBI day, or even a big day crossing the plate, but he was rarely providing negative value.
There are so many advanced metrics that tell different stories and evaluate different outlooks. The problem comes when you use one for everything, or worse, when you use whatever metric suits your argument.
I still prefer RE24 for a review of past game results when it comes to evaluating the player of the game, week, month, or year. It’s not a great predictive stat, but I was looking for a metric to evaluate what actually happened during the game — not what those plays say about what could happen in the future.
I have yet to decide how I will evaluate players going forward.
Part of me wants to create a new metric.
Part of me thinks I’ll get through that process and realize I’ve only recreated an existing metric. There would be value of affirmation with that result.
Part of me thinks there are already too many stats.
The biggest problem with the advanced metrics doesn’t come from the analysis outside the game, but from the decision making inside the game.
With every organization having their own proprietary evaluation metrics, it can be easy for any organization to justify any move by pointing to the mystery numbers on their spreadsheet. This works to absolve decision making, by putting those decisions on the stats that the organization created or curated.
You can’t question a Manager for a bad outcome, because he was only following the numbers.
You can’t question a General Manager for a bad acquisition, because the numbers saw something positive about the player.
At the end of the day, the people in charge choose the metrics that best aligns with their own ideas of value. Anything the metrics say about the game is only an extension of how the person in charge thinks about the game.
The challenge comes when the person disagrees with the metrics in a situation.
I’ll be honest. If I were picking the 6/7 Player of the Day manually, I probably would have picked De Los Santos for his four hits and two home runs. If I were picking the 5/31 Pitcher of the Week, I would have gone with Reed. My choice of RE24 didn’t reflect how I’d evaluate those two individual situations.
The more important thing is that I’m in agreement with the vast majority of daily, weekly, and monthly decisions that RE24 has made.
No advanced stat is going to get it right 100% of the time. At best, you’re looking for a high correlation to automate your thinking process, so that you only have to spend time digging into the outliers.
Stats should never be the end result. They should merely be a time saver to replicate the thinking process, and save a lot of unnecessary work.
The problem comes when too many bad results emerge, despite the stat verifying the approach was correct. At that point, whether you’re a website owner or a team employee, the answer isn’t to double down that the stat is always correct. It’s to evaluate the thought process that led to thinking the stat would have high accuracy in automating decisions.
That can be a problem when there are so many stats to choose from, each representing a different view of value, and incorporating different metrics to produce a result. It can be easy to pick the wrong stat for a given situation.
End of the day, advanced stats will never replace the analytical power of the human brain.
At best, if you pick the right stat, it just saves brain power.
Speaking of brain power,
Until the next time I go live…
-Tim Williams


