Analysis Idea: Projecting Baseball Injuries

Analysis Idea

Build a classifier for injury predictions among baseball pitchers. Try random forest, then try deep learning, compare the efforts. Multiple posts, multiple sessions.

Possibly cool explorations, time permitting

  • main objective: predict DL days year to year
  • secondary: predict TJS
  • can i build a model of a healthy pitcher using pitchf/x data, then identify deviations to create an “identify an injury” in-season


  • (all kinds of injury db links, probably focus 2010-2016 w/ TJS emphasis](
  • MLB offers pitchfx data
  • b-ref for precalculated/historical data
  • raw data - Lahman database for SQL
  • retrosheet for play by play raw data

Validation and Training Sets

  • predict year to year injury rates
  • predict inseason injury rates

