Predicting Accidents in MLB Pitchers

I’ve made it midway by way of bootcamp and finished my third and favorite project to this point! The previous couple of weeks we’ve been learning about SQL databases, classification models such as Logistic Regression and Help Vector Machines, and visualization tools corresponding to Tableau, Bokeh, and Flask. I put these new skills to make use of over the previous 2 weeks in my project to classify injured pitchers. This submit will define my process and evaluation for this project. All of my code and project presentation slides will be discovered on my Github and my Flask app for this project will be found at


For this project, my challenge was to predict MLB pitcher accidents using binary classification. To do this, I gathered knowledge from a number of sites including and for pitching stats by season, for Disabled Listing data per season, and Kaggle for 2015–2018 pitch-by-pitch data. My aim was to make use of aggregated knowledge from previous seasons, to predict if a pitcher could be injured in the following season. The requirements for this project have been to store our data in a PostgreSQL database, to make the most of classification models, and to visualise our knowledge in a Flask app or 메이저리그중계 create graphs in Tableau, Bokeh, or Plotly.

Data Exploration:

I gathered knowledge from the 2013–2018 seasons for over 1500 Main League Baseball pitchers. To get a really feel for my data, I started by looking at features that have been most intuitively predictive of injury and compared them in subsets of injured and wholesome pitchers as follows:

I first checked out age, and while the imply age in each injured and wholesome players was round 27, the info was skewed a bit in another way in both groups. The most typical age in injured gamers was 29, while wholesome gamers had a much decrease mode at 25. Equally, common pitching pace in injured players was higher than in wholesome players, as expected. The next characteristic I considered was Tommy John surgery. This is a very common surgical procedure in pitchers where a ligament in the arm gets torn and is replaced with a healthy tendon extracted from the arm or leg. I was assuming that pitchers with previous surgical procedures were more prone to get injured once more and the data confirmed this idea. A significant 30% of injured pitchers had a past Tommy John surgery while wholesome pitchers had been at about 17%.

I then checked out average win-loss report in the two groups, which surprisingly was the characteristic with the highest correlation to injury in my dataset. The subset of injured pitchers were profitable a mean of forty three% of games compared to 36% for healthy players. It is smart that pitchers with more wins will get more enjoying time, which can lead to more injuries, as shown in the higher average innings pitched per game in injured players.

The characteristic I used to be most enthusiastic about exploring for this project was a pitcher’s repertoire and if sure pitches are more predictive of injury. Looking at function correlations, I found that Sinker and Cutter pitches had the highest optimistic correlation to injury. I decided to explore these pitches more in depth and appeared on the percentage of combined Sinker and Cutter pitches thrown by particular person pitchers each year. I noticed a sample of injuries occurring in years where the sinker/cutter pitch percentages have been at their highest. Below is a sample plot of four leading MLB pitchers with recent injuries. The red factors on the plots symbolize years in which the gamers had been injured. You’ll be able to see that they usually correspond with years in which the sinker/cutter percentages had been at a peak for each of the pitchers.