Risk-stratification helps guide appropriate clinical care. Our goal was to develop and validate a broad suite of predictive tools based on ICD-10 diagnostic and procedural codes for predicting adverse events and care utilization outcomes for hospitalized patients.


Endpoints included unplanned hospital admissions, discharge status, excess length of stay, in-hospital and 90-day mortality, acute kidney injury, sepsis, pneumonia, respiratory failure, and a composite of major cardiac complications. Patient demographic and coding history in the year before admission provided features used to predict utilization and adverse events through 90-days post-admission. Models were trained and refined on 2017-18 Medicare admissions data using an 80-20 learn/test split sample. Models were then prospectively tested on 2019 out-of-sample Medicare admissions. Predictions based on logistic regression were compared with those from five commonly used machine learning methods using a limited dataset.


The 2017-18 development set included 9,085,968 patients who had 18,899,224 inpatient admissions, and there were 5,336,265 patients who had 9,205,835 inpatient admissions in the 2019 validation dataset. Model performance on the validation set had an average area under the curve of 0.76 (Range 0.70, 0.82). Model calibration was strong with an average R2 of 1.00. Excess length of stay had a root-mean-square error of 0.19 and R2 of 0.99. The mean sensitivity for the highest 5% risk population was 19.2% (range: 11.6, 30.1); for positive predictive value it was 37.2% (14.6, 87.7); and for lift (enrichment ratio) it was 3.8 (2.3, 6.1). Predictive accuracies from regression and machine learning techniques were generally similar.


Predictive analytical modeling based on administrative claims history can provide individualized risk profiles at hospital admission that may help guide patient management. Similar results from six different modeling approaches suggests that we have identified both the value and ceiling for predictive information derived from medical claims history.