Total Access Baseball

User login

Who's online

There are currently 0 users and 3 guests online.

Cy Young Award in the Sabermetrics Era: A Study of Who Will Win in 2011 Part 1

This is a 2 part series in which I will analyze a current Cy Young Predictor formula, offer a replacement formula to account for the change in philosophy for the Cy Young voters with the growing influence of new-age statistics (sabermetrics), and use this new formula to project the Cy Young race in 2011 and beyond.

Part 1 will look into a widely accepted Cy Young Predictor formula and explain the flaws in it. As voters are considering sabermetric statistics more and more, the Cy Young formula also needs to adapt to the new ways of thinking and voting. Part 1 will examine and breakdown a new formula and check it with past data for accuracy.

Part 2 will look to the 2011 Cy Young race in the National League. First an analysis will be done to determine how the top pitchers will fare in the 2011 season. Next, a projection will be done to determine how these pitchers will end up placing on the Cy Young ballot. A similar analysis will be done for the American League at a later date.

PART 1: A TILT TOWARDS ADVANCED STATISTICS AND WHAT IT MEANS TO THE CY YOUNG AWARD

INTRODUCTION

Baseball fans are getting smarter.

There’s been a change in the way we watch, discuss and analyze baseball in the past few years. A lot of that is due to fantasy geeks around the world and their constant strive to find an edge in the game. We have Daniel Okrent and the guys from the original Rotisserie League to thank for that. They brought the game into our lives, and now millions play it day in and day out.

Fantasy baseball is really a derivative of what Bill James was trying to do with sabermetrics. James began his work in the 70’s to try and find a better way of assigning value to players on the field and at the plate.  His work won the approval of many, including Okrent. Without James developing the theory and Okrent bringing a “silly little game” to our lives, in all likelihood, baseball wouldn’t be nearly as popular as it is today, and it certainly wouldn’t be dissected as much.

James knew back in the 70’s that many highly regarded baseball statistics weren’t telling the whole story. One of them was the win/loss category. Pitchers can only do so much to win games, so if they don’t have a decent offense behind them their wins will be lower and their losses will be higher than a pitcher with the same arm on a team with a great offense. That seemed obvious to him, yet the mainstream media and baseball gurus around the league had been using certain barometers for good pitchers and bad pitchers for years, so while James had done some incredible work it took years for it to be truly recognized.

The tide has shifted lately though, and many are finally coming around the advanced metrics James and other had been writing about for years. This is especially evident in the 2010 Cy Young voting. Felix Hernandez won the award with a startling 21 of 28 first place votes from the Baseball Writers’ Association of America even though he only won 13 games. If this happened in the 70’s he wouldn’t have even made the ballot.

THE EXISTING CY YOUNG PREDICTOR

James wrote a formula with Rob Neyer of ESPN to calculate a projected Cy Young winner prior to this shift in voting. His formula is as follows:

Cy Young Points (CYP) = ((5*IP/9)-ER) + (K’s/12) + (SV*2.5) + Shutouts + ((W*6)-(L*2)) + VB

(where VB is a Victory Bonus of 12 points awarded for leading your team to the division championship.)

Why he even bothered to write the formula is questionable it itself as the Cy Young almost always went to the pitcher with the most wins prior to 2009. Period. But that’s beside the point. James’ formula worked great up until the past few years. But then a noticeable shift in voting occurred.

In 2008, James’ formula correctly selects Cliff Lee and Tim Lincecum.

In 2009, however, James’ formula selects Felix Hernandez and Adam Wainwright.  Zach Greinke won in the A.L and was ranked 2nd on James’ formula. Tim Lincecum won in the N.L and was only ranked 4th in James’ formula.

In 2010, again we see the shift in voting. James’ formula selects Roy Halladay and CC Sabathia, but Felix The Kid ended up taking home the A.L. award. Felix was ranked 6th in James’ formula! Above him were CC, Price, Lester, Soriano, and Buchholz.

If the trend in voting continues down this path, it’s clear that James’ original formula needs to be modified to fit this new-age thinking. In the following study I’ll explain the flaws in the old formula and provide a new formula to account for the shift in Cy Young voting.

THE ADJUSTED CY YOUNG PREDICTOR

If we take apart James’ formula and break it into variables and constants we have this:

Cy Young Points (CYP) = ((A*IP/9)-ER) + (K’s/B) + (SV*C) + (Shutouts*D) + ((W*E)-(L*F)) + (VB*G)

(where the constants are A through G, and the variables are each pitcher’s individual stats)

As Cy Young voters are becoming more and more accepting of sabermetrics statistics, this formula seems to be leaving out some key data that voters look at. While I could make the case that voters should look at advanced statistics such as WAR, CERA, or DIPS, that isn’t yet a reality. Maybe in the coming years these advanced statistics will be looked at, but that time is not now. But there is one glaring piece of information left out of James’ formula that voters are clearly looking at now, WHIP (walks+hits/IP).

WHIP. It even sounds cool. It’s simple enough for anyone to understand, yet very telling of a pitcher’s dominance on the mound. With a quick glance at WHIP you can get a snapshot of the pitcher and understand how much luck was involved with his ERA and overall record. With Greinke and Hernandez winning the A.L Cy Young the last few years yet not dominating the Win category, it’s clear the voters are looking into a category that those pitchers did well in. WHIP.

Greinke had a 1.073 WHIP in 2009(good for 2nd in the A.L) to go with his 2.16 ERA and 242 K’s, and Hernandez had a 1.06 WHIP (2nd in the A.L) to go along with his 2.27 ERA and 232 K’s. Neither pitcher was in the top 5 in Wins in the A.L, and in fact Hernandez only amassed 13 throughout the entire season.

If we incorporate WHIP into James’ equation and modify the constants we can find an equation much more suitable for the present day. The easiest way to explain how and why I made the changes is to show both the EXISTING equation and ADJUSTED equation, and then provide an explanation and commentary below. The basic equation, including WHIP, is:

Cy Young Points (CYP) = ((A*IP/9)-ER) + (K’s/B) + (SV*C) + (Shutouts*D) + ((W*E)-(L*F)) + (VB*G) + ((H*IP)-(IP*WHIP/J))

And the constants used in both James’ (EXIST) and my (ADJUSTED) study are as follows:

  EXIST ADJUSTED
A 5 5
B 12 5
C 2.5 1.5
D 1 2
E 6 3
F 2 2
G 12 5
H 0 0.5
J 0 3

COMMENTARY

I came to these adjusted constants by analyzing the relative strength each individual constant would add to the overall total. To do this I analyzed the 2009 N.L Cy Young race. Using James’ existing equation, the top 10 finishers should have been the following pitchers in the order shown below in Table 1a (with stats included). The The CYP(exist) is the value calculated with the EXISTING equation and the CYP(adjusted) is the value shown with the ADJUSTED equation.

TABLE 1a 

2009 N.L.                              
RK PLAYER TEAM G GS IP ER K SV SHO W L ERA DC WHIP CYP(exist) CYP(adjusted)
1 Adam Wainwright STL 34 34 233 68 212 0 0 19 8 2.63 1 1.21 189.11 172.37
2 Chris Carpenter STL 28 28 192.7 48 144 0 1 17 4 2.24 1 1.01 178.06 169.33
3 Jonathan Broxton LA 73 0 76 22 114 36 0 7 2 2.61 1 0.96 169.72 132.70
4 Tim Lincecum SF 32 32 225.3 62 261 0 2 15 7 2.48 0 1.05 162.92 184.16
5 Heath Bell SD 68 0 69.7 21 79 42 0 6 4 2.71 0 1.12 157.31 115.35
6 Ryan Franklin STL 62 0 61 13 44 38 0 4 3 1.92 1 1.2 149.56 103.79
7 Javier Vazquez ATL 32 32 219.3 70 238 0 0 15 10 2.87 0 1.03 141.67 158.79
8 Brian Wilson SF 68 0 72.3 22 83 38 0 5 6 2.74 0 1.2 138.08 102.00
9 Josh Johnson FLA 33 33 209 75 191 0 0 15 5 3.23 0 1.16 137.03 138.00
10 Jair Jurrjens ATL 34 34 215 62 152 0 0 14 10 2.6 0 1.21 134.11 130.63

The CYP(adjusted) components were separated into percentages of the sum in order to understand why a certain player received a certain score, i.e. answering the question, “what did they do well in”. Table 1b summarizes those findings.

TABLE 1b 

RK PLAYER TEAM ERA K SV SH W/L DC WHIP
1 Adam Wainwright STL 35.65% 24.60% 0.00% 0.00% 23.79% 2.90% 13.07%
2 Chris Carpenter STL 34.88% 17.01% 0.00% 1.18% 25.39% 2.95% 18.59%
3 Jonathan Broxton LA 15.24% 17.18% 40.69% 0.00% 12.81% 3.77% 10.31%
4 Tim Lincecum SF 34.30% 28.34% 0.00% 2.17% 16.83% 0.00% 18.35%
5 Heath Bell SD 15.36% 13.70% 54.62% 0.00% 8.67% 0.00% 7.65%
6 Ryan Franklin STL 20.13% 8.48% 54.92% 0.00% 5.78% 4.82% 5.88%
7 Javier Vazquez ATL 32.64% 29.98% 0.00% 0.00% 15.74% 0.00% 21.64%
8 Brian Wilson SF 17.81% 16.28% 55.88% 0.00% 2.94% 0.00% 7.09%
9 Josh Johnson FLA 29.79% 27.68% 0.00% 0.00% 25.36% 0.00% 17.16%
10 Jair Jurrjens ATL 43.98% 23.27% 0.00% 0.00% 16.84% 0.00% 15.91%
  AVERAGES: SP 35.21% 25.15% 0.00% 0.56% 20.66% 0.98% 17.45%
    RP 17.13% 13.91% 51.53% 0.00% 7.55% 2.15% 7.73%
    SP(exist) 30-40% 9 to 12% 0.00% 0 to 1% 45-55% 2-5% 0.00%
    RP(exist) 10 to 15% 3 to 7% 60-70% 0.00% 13-17% 2 to 5% 0.00%

Look at the averages to make sense of it all.

As you can see by looking at the averages, with the ADJUSTED equation, the overall score depends on roughly 50% ERA + WHIP whereas the EXISTING equation would account for roughly 30-40% ERA + WHIP for Starting Pitchers (SP). Another big change is the dependence on Wins. In the EXISTING equation, wins accounted for roughly 45-55% of the total score, whereas in the ADJUSTED equation, Wins account for much less (an average of 20.66% in 2009). Strikeouts were also valued higher in the ADJUSTED equation, as the voters seem to value that more now too.

To verify that this ADJUSTED equation would work for more than just one circumstance, it was tested on the past 2 years’ Cy Young races in both the N.L. and the A.L. The data is shown below:

 

TABLE 1c                                
2009 N.L.                              
RK PLAYER TEAM G GS IP ER K SV SHO W L ERA DC WHIP CYP(exist) CYP(adjusted)
1 Adam Wainwright STL 34 34 233 68 212 0 0 19 8 2.63 1 1.21 189.11 172.37
2 Chris Carpenter STL 28 28 192.7 48 144 0 1 17 4 2.24 1 1.01 178.06 169.33
3 Jonathan Broxton LA 73 0 76 22 114 36 0 7 2 2.61 1 0.96 169.72 132.70
4 Tim Lincecum SF 32 32 225.3 62 261 0 2 15 7 2.48 0 1.05 162.92 184.16
5 Heath Bell SD 68 0 69.7 21 79 42 0 6 4 2.71 0 1.12 157.31 115.35
6 Ryan Franklin STL 62 0 61 13 44 38 0 4 3 1.92 1 1.2 149.56 103.79
7 Javier Vazquez ATL 32 32 219.3 70 238 0 0 15 10 2.87 0 1.03 141.67 158.79
8 Brian Wilson SF 68 0 72.3 22 83 38 0 5 6 2.74 0 1.2 138.08 102.00
9 Josh Johnson FLA 33 33 209 75 191 0 0 15 5 3.23 0 1.16 137.03 138.00
10 Jair Jurrjens ATL 34 34 215 62 152 0 0 14 10 2.6 0 1.21 134.11 130.63
                                 
                               
                                 
                                 
TABLE 1d                                
2009 A.L.                              
RK PLAYER TEAM G GS IP ER K SV SHO W L ERA DC WHIP CYP(exist) CYP(adjusted)
1 Felix Hernandez SEA 34 34 238.7 66 217 0 1 19 5 2.49 0 1.14 189.69 187.66
2 Zack Greinke KC 33 33 229.3 55 242 0 3 16 8 2.16 0 1.07 175.56 191.66
3 CC Sabathia NYY 34 34 230 86 197 0 1 19 8 3.37 1 1.15 169.19 156.01
4 Mariano Rivera NYY 66 0 66.3 13 72 44 0 3 3 1.76 1 0.90 163.83 125.49
5 Roy Halladay TOR 32 32 239 74 208 0 4 17 10 2.79 0 1.13 162.11 168.85
6 Justin Verlander DET 35 35 240 92 269 0 1 19 9 3.45 0 1.18 160.75 161.73
7 Joe Nathan MIN 70 0 68.7 16 89 47 0 2 2 2.1 0 0.93 155.08 125.52
8 Brian Fuentes LAA 65 0 55 24 46 48 0 2 2 3.93 1 1.40 150.39 96.59
9 Jered Weaver LAA 33 33 211 88 174 0 2 16 8 3.75 1 1.24 137.72 123.31
10 Josh Beckett BOS 32 32 212.1 91 199 0 2 17 6 3.86 0 1.19 135.42 131.55
TABLE 1e                                
2010 A.L.                              
RK PLAYER TEAM G GS IP ER K SV SHO W L ERA DC WHIP CYP(exist) CYP(adjusted)
1 CC Sabathia NYY 34 34 237.7 84 197 0 0 21 7 3.18 0 1.19 176.47 161.02
2 David Price TB 32 31 208.7 63 188 0 1 19 6 2.72 0 1.19 171.61 159.11
3 Jon Lester BOS 32 32 208 75 225 0 0 19 9 3.25 0 1.2 155.31 145.36
4 Rafael Soriano TB 64 0 62.3 12 57 45 0 3 2 1.73 0 0.8 153.86 121.05
5 Clay Buchholz BOS 28 28 173.7 45 120 0 1 17 7 2.33 0 1.2 150.50 131.87
6 Felix Hernandez SEA 34 34 249.7 63 232 0 1 13 12 2.27 0 1.06 150.06 175.74
7 Justin Verlander DET 33 33 224.3 84 219 0 0 18 9 3.37 0 1.16 148.86 145.83
8 Trevor Cahill OAK 30 30 196.7 65 118 0 1 18 8 2.97 0 1.11 147.11 133.45
9 Neftali Feliz TEX 70 0 69.3 21 71 40 0 4 3 2.73 0 0.88 141.42 112.02
10 Joakim Soria KC 66 0 65.7 13 71 43 0 1 2 1.78 0 1.05 138.92 111.06
TABLE 1f                                
2010 N.L.                              
RK PLAYER TEAM G GS IP ER K SV SHO W L ERA DC WHIP CYP(exist) CYP(adjusted)
1 Roy Halladay PHI 33 33 250.7 68 219 0 4 21 10 2.44 1 1.04 211.53 209.52
2 Adam Wainwright STL 33 33 230.3 62 213 0 2 20 11 2.42 0 1.05 183.69 185.09
3 Heath Bell SD 67 0 70 15 86 47 0 6 1 1.93 0 1.2 182.56 134.59
4 Ubaldo Jimenez COL 33 33 221.7 71 214 0 2 19 8 2.88 0 1.15 170.00 165.83
5 Billy Wagner ATL 71 0 69.3 11 104 37 0 7 2 1.43 0 0.87 166.67 135.35
6 Brian Wilson SF 70 0 74.7 15 93 48 0 3 3 1.81 0 1.18 166.25 128.07
7 Tim Hudson ATL 34 34 228.7 72 139 0 0 17 9 2.83 0 1.15 150.64 142.54
8 Francisco Cordero CIN 75 0 72.7 31 59 40 0 6 5 3.84 0 1.43 140.31 90.89
9 Chris Carpenter STL 35 35 235 84 179 0 0 16 9 3.22 0 1.18 139.47 137.42
10 Carlos Marmol CHC 77 0 77.7 22 138 38 0 2 3 2.55 0 1.18 133.67 114.05

 

CONCLUSION:

Using the adjusted constants, we have a new Cy Young Predictor formula as shown below:

Cy Young Points (CYP) = ((5*IP/9)-ER) + (K’s/5) + (SV*1.5) + (Shutouts*2) + ((W*3)-(L*2)) + (VB*5) + ((0.5*IP)-(IP*WHIP/3))

By looking at the results from Tables 1c-1f, it’s clear that this formula will result in more accurate results in the sabermetrics age.

The Red lines in each chart indicate the Cy Young winner from that year. As you can see, the ADJUSTED equation correctly chooses the Cy Young winner from that year and league. Unfortunately, we still have a relatively small sample size with this new trend of voting, so the ADJUSTED equation doesn’t overemphasize the importance of WHIP or completely disregard the value in Wins.

As years pass, even this equation will most likely need to be updated to account for the new trends in Cy Young voting. There may be an even greater importance placed on sabermatrics in the future. Only time will tell. For now, this ADJUSTED equation seems fairly accurate to predict, with given data, who will win the Cy Young award.

In part 2, I will examine the 2011 N.L. Cy Young race with this ADJUSTED Cy Young Predictor and find each pitcher’s probability of winning the Cy Young. Will a Phillies pitcher take it home, or do the odds rest with another N.L. starter? 

Written By Todd Drager
Republished With Permission From 7thAndPattison.com

Follow Him on Twitter @7thandpattison

Read more MLB news on BleacherReport.com

Poll

Best of the American League
Tampa Bay
19%
Boston
19%
Chicago
7%
Minnesota
10%
Los Angeles
17%
Texas
27%
Total votes: 270

Recent blog posts

Featured Sponsors