We thank Burton et al. and Zapf et al. for their thoughtful comments on our research on personalized surgical transfusion risk prediction.  Both letters raise important considerations regarding variable selection for predictive models in health care, which are worth discussing in further detail.

As Burton et al. note, race and ethnicity were not included as input variables in our machine learning model for surgical transfusion risk; we would like to clarify that this was intentional for several reasons, which we explain here. First, the inclusion of race in predictive models has been well-described to contribute to inequity.  One major limitation of machine learning is that a model can only learn from its training examples—in other words, real-world clinician behaviors. If such behaviors or the societal factors contextualizing that behavior are biased, the model will also be biased. The citation provided by Burton et al. is a perfect example of this : in this study, researchers evaluated a model trained to predict healthcare utilization after hospital discharge, with the intention to allocate additional resources to patients predicted to have high utilization. Unfortunately, black patients had low utilization because they lacked access to care, which the model learned and perpetuated. Inclusion of race as an input variable in model development encourages machine learning models to explicitly encode such latent biases, and consequently the recommendations of such models will propagate systemic inequities in care.

Second, although race is a frequently collected variable in many datasets, it serves as a proxy for often unmeasured variables such as socioeconomic status, access to care, illness severity (due to poor access to care and delayed presentation), and other social determinants of health.  Thus, although the inclusion of race as a variable may improve model discrimination, it potentially does so for the wrong reasons. Given two individuals, identical except for their skin color, it seems unjust for one to have a “better” prediction based on the population averages of their racial group, which may be due to unmeasured variables not applicable to the specific individual.

Third, to the best of our knowledge, there is little evidence that race itself contributes to risk for allogeneic blood transfusion after adjustment for disease burden, socioeconomic status, and other clinical variables that are known to contribute (e.g., hematocrit). We thank Burton et al. for bringing attention to the potential pitfalls of racial adjustment and the critical importance of fairness in predictive modeling. As machine learning is increasingly used for clinical decision support, model developers must be vigilant for potential sources of bias, which can be introduced at every step of model development and implementation.  As a research community, we share a responsibility to ensure that the decision support tools we create do not exacerbate, and ideally help to reduce, the health disparities that are currently present in modern medicine.

Zapf et al. raise important points about the benefits and limitations of model development using large registry datasets versus institution-specific datasets. We agree that inclusion of surgeon and anesthesiologist identifiers may further improve predictive performance. Variation in transfusion risk can occur due to differences in surgeon technique or case complexity, and it would be appropriate to adjust for these; however, they can also occur due to differences in preference for discretionary transfusion, which may be less appropriate to adjust for. By training our models on a large national database, we captured the average transfusion behavior of U.S. physicians, which we believe, on average, to be appropriate. Further customizing model predictions based on individual behavior patterns risks encoding undesirable physician practice patterns into the model; nonetheless, we acknowledge that such adjustment might be necessary for widespread adoption. Our transfer learning approach (i.e., hospital-specific procedure-specific transfusion rate) could easily accommodate the addition of a surgeon- or anesthesiologist-specific adjustment, and it would be interesting to investigate such modifications in future work.