Risk stratification helps guide appropriate clinical care. Our goal was to develop and validate a broad suite of predictive tools based on International Classification of Diseases, Tenth Revision, diagnostic and procedural codes for predicting adverse events and care utilization outcomes for hospitalized patients.


Endpoints included unplanned hospital admissions, discharge status, excess length of stay, in-hospital and 90-day mortality, acute kidney injury, sepsis, pneumonia, respiratory failure, and a composite of major cardiac complications. Patient demographic and coding history in the year before admission provided features used to predict utilization and adverse events through 90 days after admission. Models were trained and refined on 2017 to 2018 Medicare admissions data using an 80 to 20 learn to test split sample. Models were then prospectively tested on 2019 out-of-sample Medicare admissions. Predictions based on logistic regression were compared with those from five commonly used machine learning methods using a limited dataset.


The 2017 to 2018 development set included 9,085,968 patients who had 18,899,224 inpatient admissions, and there were 5,336,265 patients who had 9,205,835 inpatient admissions in the 2019 validation dataset. Model performance on the validation set had an average area under the curve of 0.76 (range, 0.70 to 0.82). Model calibration was strong with an average R2 for the 99% of patients at lowest risk of 1.00. Excess length of stay had a root-mean-square error of 0.19 and R 2 of 0.99. The mean sensitivity for the highest 5% risk population was 19.2% (range, 11.6 to 30.1); for positive predictive value, it was 37.2% (14.6 to 87.7); and for lift (enrichment ratio), it was 3.8 (2.3 to 6.1). Predictive accuracies from regression and machine learning techniques were generally similar.


Predictive analytical modeling based on administrative claims history can provide individualized risk profiles at hospital admission that may help guide patient management. Similar results from six different modeling approaches suggest that we have identified both the value and ceiling for predictive information derived from medical claims history.

Editor’s Perspective
What We Already Know about This Topic
  • Uses for risk stratification tools include setting baselines for health service evaluations, identifying patients who may need higher levels of care, and allocating hospital resources
What This Article Tells Us That Is New
  • From a dataset of more than 9 million patients, a risk score based on administrative claims history was developed to provide individualized risk profiles at hospital admission that may help guide patient management