“This ‘field of dreams’ will quickly become a ‘fire swamp’ if AI is not designed and implemented with a human-centered approach.”

Artificial intelligence (AI) is generating widespread enthusiasm due to its potential to transform the way we work and live. All of healthcare’s major stakeholders from funders to those delivering care can appreciate AI’s tremendous opportunity. For better or for worse, AI will be embedded in our electronic health records, our medical devices, and in hospital systems with the goal of optimizing cost, production, and quality. However, this “field of dreams” will quickly become a “fire swamp” if AI is not designed and implemented with a human-centered approach.

In this issue of Anesthesiology, Han et al.  outline the promises and challenges of integrating AI into anesthesia practice. If AI is inevitable, how can we best proceed to optimize its implementation and use? To complement Han et al., we highlight issues that we feel are most critical to designing, deploying, and maintaining AI in perioperative medicine. The key considerations are summarized in table 1.

Table 1.

Key Considerations for the Implementation and Use of AI in Perioperative Care

Key Considerations for the Implementation and Use of AI in Perioperative Care

AI at its core is a tool that creates and presents statistical relationships based on historical data and knowledge. As your investment advisor warns, it can be dangerous to make predictions on historical data, even by the most sophisticated machine-learning algorithm. Like healthcare AI, many investment decisions are based, at least in part, on historical performance and thus are brittle to unanticipated events (i.e., COVID, the war in Ukraine). Where AI may fail most spectacularly is in anomalous situations. Such “black swan events”  are more frequent than believed, especially in healthcare. Moreover, to the extent that AI is used to standardize, streamline, and integrate care, unduly increasing system complexity and reducing resilience, the harm caused by inevitable anomalous events can increase exponentially. Emerging AI techniques (e.g., causal AI, zero-shot learning) may strengthen AI capabilities in these cases and could eventually lead AI to outperform clinicians in anomalous situations. Additional research is needed before this may be achieved.  Furthermore, because historical data contain embedded biases, there is a real potential for AI to exacerbate health disparities.  Numerous efforts are underway to address and mitigate these issues.

AI tools must be implemented within the larger system of perioperative medicine. This complex sociotechnical system consists of many people with different roles and priorities, numerous interacting tools and technologies, within a physical environment and a complex organizational culture.  The ultimate success (or failure) of AI will depend on how well the technology integrates within this larger sociotechnical system.  For instance, an AI tool to support diabetic retinopathy screening failed due to poor lighting (which limited image quality) and insufficient bandwidth to upload images.  One could imagine similar challenges arising with the introduction of AI for ultrasound-guided regional anesthesia or for guiding endotracheal intubation. To avoid the false starts, patient harm, and clinician frustration seen with the previous implementation of other advanced medical technology (e.g., electronic health records), the context of use and end users of AI cannot be ignored. 

We want to emphasize the key role of human factors engineering and human-centered design in the ultimate success of AI in anesthesia. Human factors engineering is a scientific discipline focused on understanding the complex interactions between humans and other system elements and then designing solutions to promote overall system performance and human well-being.  Anesthesiology was one of the first medical disciplines to engage human factors engineers in the pursuit of enhanced safety and quality.  Human-centered design, which includes iterative cycles of design and end-user evaluation, will be essential to the safe and effective deployment of AI. Although there is hype (and concern) that AI will replace the work of some clinicians (e.g., pathologists), it is much more likely that AI will instead alter and change the nature of human work. Clinicians and human factors engineers will need to collaboratively design the human–AI interface to ensure that AI supports and does not hinder their work. Figure 1 depicts the five steps of the human-centered design process and corresponding considerations for AI development across each step.

Fig. 1.
Human-centered design process and corresponding artificial intelligence (AI) considerations across each step.

Human-centered design process and corresponding artificial intelligence (AI) considerations across each step.

There are numerous opportunities for AI to improve perioperative safety, including prediction of difficult airways, broadening differential diagnosis, preventing adverse events, and supporting procedures including intubation and ultrasound guidance.  AI will invariably increase task automation and standardization. This means that clinicians will need to monitor a robust and dependable system that will only rarely fail, at which time they will be expected to intervene. Human–automation interaction is rife with examples of failed automation leading to poor outcomes because the human was inadequately informed, unaware of the state of the automation, or ill prepared to intervene.  The use of AI in anesthesia could cause similar safety challenges as have occurred with automation in aviation and ground transportation.  One concern is automation-induced complacency, in which clinicians over-rely on AI and follow the AI’s advice even when it is wrong. Overdependence can also lead to skill or knowledge degradation from lack of use.  If AI-based automation routinely performs a clinical task, will the anesthesiologist have the ability to recognize an AI failure (likely in the most challenging cases), maintain adequate situation awareness of the current system state, and then have the skill to regain “manual control” and rescue the patient?

When the anesthesiologist–AI system fails and there is patient harm, who will be liable? Current legal doctrine would place the “blame” on the clinician if they fail to go along with the AI’s recommendation/actions when they are correct. Anesthesiologists may also be at fault if they fail to intervene when the AI is wrong. These complex and other important topics have become the center of attention of numerous federal agencies (e.g., the Office of the National Coordinator for Health Information Technology [Washington, D.C.], the National Institute of Standards and Technology [Gaithersburg, Maryland], the Food and Drug Administration [Silver Spring, Maryland])  and nonprofit entities (e.g., the Coalition for Health AI).

As Han et al.  describe, strategies are needed to establish trust in AI to foster adoption. For safe and effective use, it also will be essential to develop “appropriate trust,”  in which clinicians calibrate their trust based on the context of use and the system’s performance within that context. This will require users to understand the capabilities and limitations of AI in different settings and for different patients. Because AI can be challenging to understand, the transparency and explainability of AI tools is important. Thus, educational initiatives will be important to develop anesthesiologist competencies in safe and effective AI use. 

Without proper testing before and after implementation, AI can pose serious risks to patient safety, cybersecurity, and care quality (including care disparities). The application of Red Teaming, borrowed from Cold War simulation gaming and already used in cybersecurity risk analysis, has been promoted to enhance AI safety. Before AI system deployment, a Red Team conducts a thorough risk analysis and then actively tries to elicit the failure modes and pathways that could cause harm (especially to patients). Each hospital will need to establish such a process.

Even after deployment, AI algorithms can drift or change over time as they are embedded and used within the healthcare system leading to new risks and suboptimal outcomes. “Algorithmovigilance” is an emerging concept focused on the evaluation and monitoring of AI use over time.  Generative AI will require even closer monitoring as the tool’s outputs continuously change based on new data and the prompts used. Beyond monitoring AI performance, there is a need to continuously engage front-line clinicians from AI design to ongoing use. AI tool developers and implementers must have processes in place to continuously collect end-user feedback on the usability and safety of AI tools over time.

For any substantive innovation in healthcare, the initial enthusiasm about the huge promise of the “bright shiny ball” is often replaced over time by disillusionment. Ultimately, there is more measured use based on recognition of the advantages and indications versus the disadvantages and limitations. Despite the current hype and rush to commercialize, there still needs to be appreciable research, social discourse, and policy making to clarify the optimal path to AI’s successful integration throughout healthcare. Serious issues under investigation and national discourse include: model bias and drift; trust, transparency, and explainability; AI-induced errors; systems interactions; risk and medicolegal liability; and myriad ethical (e.g., privacy and surveillance) and public health (e.g., justice, fairness, and accountability) issues.

To realize the potential of AI and prevent safety disasters, we need to consider the contexts of use and overall sociotechnical system, leverage human factors engineering expertise, prioritize safety, support trust and clinical competencies, and ensure continuous monitoring. Unless AI tools are designed and implemented wisely and deliberately, we worry that many more patients will be harmed than is necessary, clinician well-being will be degraded, and the full benefits of AI will take much longer to be achieved.