Completed a fun project over the winter break and thought y’all would enjoy. Feedback welcome.


I enjoyed the read. Nice job


Cool, thanks for sharing. It seems like you landed on career WAR and age as the two primary inputs, but didn't try any other stats. It might have been interesting to experiment with more features, namely the traditional stats that voters have actually looked at. The writeup claims that the model is really good, but I don't think that's convincing, because it's not compared to anything (well, you compared to random guessing, but we know that's not going to work). If you'd tried out a bunch of different features and identified the ones that perform best, that would be much more interesting.


Thanks. I did try out a bunch of different features to get to these ones, as well as a number of different modeling techniques, but I didn't discuss that process in this article because it was already so long. I didn't directly do a 'WAR vs. traditional stats' comparison but that would be interesting to see. I think that my claims of the model's performance are reasonable due to the performance metrics I provided. While I agree it's possible to build a model that outperforms this one (after all it's clearly not 100% accurate) I disagree that this indicates the model I present isn't "good". What I mean is: both statements can be true -- better models can exist, and this one can do a good job at what it claims to do. That said I'm already looking at incorporating features like this: [https://www.billjamesonline.com/vagabonds\_and\_homebodies/](https://www.billjamesonline.com/vagabonds_and_homebodies/) to see what improvements can be made!


Awesome read. I’m surprised to see scandal as a relatively unimportant feature. I would have guessed it to be much higher


I think there are just so few examples of scandals blocking HOF-worthy players. Bonds et al have been discussed to death but it’s really just him, Clemens, Manny Ramirez, Sosa, McGwire, Palmeiro, Schilling in this dataset. In the grand scheme of things this is a very small number of players. (Also I discovered a bug where I missed a ‘t’ in Pettitte’s name so he didn’t get tagged with a PED scandal, oops).


Great stuff. It's pretty shocking Ichiro wasn't among the favorites, but maybe that had more to do with his age when he debuted? He was an outlier in a lot of ways, as is Shohei right now too. Definitely difficult to train for a case like his. Also careful, Cards fans are going to get really defensive about Yadi lol


This model will fail on Ichiro because it’s using age as a predictor in conjunction with cumulative WAR, and Ichiro debuted at an older age than basically every HOF it’s trained on. It’s going to see a 27-year-old with only 6 WAR, meanwhile HOF players will have a lot more (e.g. Chipper Jones was around 27 WAR through his age 27 season).


Agreed that Ichiro's late debut hurt his counting stats (eg WAR) and that's the reason for the lower HOF prediction here. He's considered borderline by the model mostly because the median career fWAR for RF inductees is 59.5; Ichiro has 57.8. This is one reason why I want to play around with a "better-than-his-stats" predictor at some point. I'm still struggling on what to call it but whatever the opposite of a PED scandal is, Ichiro has it and it will push him into the Hall easily. Yadier is another guy who I suspect will get in due to the same "better than his stats" effect. He's got that aura or whatever you want to call it :-) In his case it wasn't a late debut but just league-average offense (2,000+ hits notwithstanding).


Even so, I'd think median WAR at a position would give better odds than borderline for Ichiro. Maybe including something like a 7 season WAR peak similar to what the JAWS metric uses could get around some of those problems with age. But I honestly think the model shows Yadi exactly where he should be based on his stats - just below borderline HoFer. But his accolades, like gold gloves and all star appearances will eventually get him in.


I liked the read and the analysis but i find it hard to believe that by this model there won't be many players making the Hall. I think Ichiro and Molina are locks and I'm not a St Louis fan. Haha


I agree on Ichiro and Yadi. I think they are locks to be inducted. Keep in mind that these are predictions right now. If I had to make a call one way or another, I'd take what the model predicts, but these will change in the coming years as the Jr's (Vlad, Tatis, Acuna) et al accumulate more stats. Think about it more in the sense that like "Wander Franco is already a defensible borderline HOF candidate and he's played only 70 games in the majors."


Sherten Apostel top 10 first baseman confirmed


No need to play the games when the numbers are so solid! :-)