Using deep neural network (DNN) models to predict in-hospital mortality and identify high-risk surgical patients produces better or similar results as currently published risk scores, a new study suggests. This application of deep learning could lead to more effective care for high-risk patients and better allocation of hospital resources, according to the researchers.
“Patients undergoing surgery are often at higher risk of instability during surgery as well as poor postoperative outcomes such as in-hospital mortality,” said Christine Lee, MS, study author and PhD candidate in the Department of Biomedical Engineering at the University of California, Irvine. “Our study shows that DNN models can be calculated on all patients, leveraging the complexity of both preoperative and intraoperative data to improve the classification of in-hospital mortality in surgical patients.”
There are more than 230 million major surgical procedures performed globally each year (Lancet 2008;372:139-144). Although the overall mortality rate from these procedures is less than 2%, the overwhelming majority of postoperative deaths is attributable to high-risk patients (Crit Care 2006;10:R81; Lancet2012;380:1059-1065), Ms. Lee reported. While current risk scores such as the American Society of Anesthesiologists physical status classification system (ASA status), Postoperative Score to Predict Postoperative Mortality (POSPOM) and Risk Quantification Index (RQI) have shown success in identifying patients at risk for in-hospital mortality, they are limited to preoperative information (Ann R Coll Surg Engl 2014;96:352-358). In addition, the ASA status is thought to be relatively subjective on the part of the physician who analyzes the risk.
Features of the Model
As part of her doctoral thesis, Ms. Lee and her colleagues used surgical data from 59,985 patients from the UCLA Medical Center to build DNN models, a set of algorithms designed to recognize patterns. These data included all surgical procedures performed with general anesthesia since March 1, 2013. Patients older than 89 years or younger than 18 years were excluded from the study, as were cases not performed with general anesthesia. The researchers also calculated or extracted 87 intraoperative features, including descriptive intraoperative vital signs, interventions and anesthesia descriptions.
“We had a pretty wide variety of surgery types, including 1,498 unique CPT [Current Procedural Terminology] codes and 167 unique HCUP [Healthcare Cost and Utilization Project] codes,” Ms. Lee said.
Of the approximately 60,000 cases included in the study, about 20% were used as test data.
Ms. Lee and her team developed four “deep” feed-forward neural networks, which had at least three hidden layers between the input and output layers—where each successive layer computes increasingly complex information. The input layer was data and the output layer was the probability of in-hospital mortality.
“The objective of any neural network or model is to optimize a loss function, which is used to update all the neurons in hidden layers of network,” Ms. Lee explained.
The first DNN m odel used all 87 original features. The second used 46, excluding features such as average, median and standard deviations, and anything from the last 10 minutes of a surgical case. The third model included the 87 original features and added ASA status as a feature. The fourth model used 46 features plus ASA status as a feature.
Model performance was measured using area under the receiving operating characteristic curve (AUC). For comparison, the AUCs of ASA status, surgical Apgar score, RQI and POSPOM also were calculated. Ms. Lee reported that RQI could not be calculated for many patients due to lack of RQI score weights for their CPT codes.
ASA Status Score Counts
As expected, surgical Apgar score had a very low AUC (0.58), Ms. Lee reported. ASA status did “pretty well” with an AUC of 0.84, while RQI was even more reliable with an AUC of 0.91.
The best DNN architecture consisted of four hidden layers with 300 neurons in each layer. Incorporating ASA status score as a feature improved the DNN AUC performance for both feature sets, but the reduced feature set performed slightly better than the full one.
“Overall, the DNN with a reduced feature set and ASA status as a feature set performed the best,” Ms. Lee said. “With an AUC of 0.91, it was comparable to the RQI score, but it should be noted that RQI could not be calculated on more than 50% of the data due to lack of score weights for CPT codes.”
She added, “Our DNN model can be used on all surgical patients and leverages both preoperative ASA assessment as well as intraoperative events.”
Kirk Shelley, MD, PhD, professor of anesthesiology and chief of the Ambulatory Division at Yale University School of Medicine, in New Haven, Conn., called the research thought-provoking but questioned the inherent opaqueness of DNNs.
“It seems like data go in, magic occurs and the answers come out,” Dr. Shelley said. “Are you able to go back into the model and figure out how this brain works and make new observations?”
“Interpretability of DNNs is a formal area of theoretical research,” Ms. Lee said. “Our model has four hidden layers with 300 neurons each, so exponentially that’s a lot of parameters, but people are working on methods to make it more interpretable.”
“If interpretation is possible,” Dr. Shelley noted, “it would be a very interesting hypothesis-generating engine.”