graphic

Artificial intelligence (AI) is moving into every aspect of our lives: selecting the online advertisements that we see, recognizing our voices, showing us the best route home, and protecting us from credit card fraud. Machine learning (ML) refers to programs that use complex mathematical functions to give computers the ability to perform human-like actions such as problem-solving, object and word recognition, decision-making, and generating predictions about future events. ML commonly analyzes thousands of columns and millions of rows to identify complex nonlinear relationships between multiple variables. It offers the potential for deeper insight than traditional biostatistics and also has the potential to personalize treatment and make tailored recommendations for individual patients.

In anesthesia, AI has potential applications in decision support, patient monitoring, drug delivery, ultrasound-guided regional anesthesia, and training. Algorithms can analyze individual patient data in real time to predict critical events, need for treatment modifications, and to support decision-making. They can identify patterns, predict outcomes, and recommend optimal treatment strategies to the perioperative team. AI is not intended to replace the expertise and judgement of physicians in our specialty but aims to deliver on the five rights of decision support: providing the right information, to the right person, in the right format, through the right channel, at the right time in the workflow (J AHIMA 2013;84:42-7).

A great deal of hype currently surrounds the development of AI tools, fueled by the introduction of the large language model ChatGPT and by headlines espousing the future societal dangers of advanced AI (Anesth Analg 2020;130:1111-3). In reality, few anesthesiologists currently encounter AI-based tools in their practice, despite experiencing the benefits of AI daily in their lives outside of work. So, why is it such a challenge to get AI projects out there to help our patients?

Let’s start with data. Medicine generates data in both high volume and high veracity, and it is collected into the electronic medical record (EMR), primarily for documentation and medicolegal purposes. Rarely is the EMR optimized for data-based research because such optimization doesn’t happen by accident. We need to design the ways we collect data to optimize it for future use with AI. Contemporary medical practice has become so safe and effective that many adverse outcomes have become rare. While this development is admirable, it also means that few records are available to train ML models that can recognize events where patient safety was threatened. Large data sets are often used to power high-performing and clinically meaningful models, necessitating data sharing across institutions. These large data sets have the potential to become unwieldy without initiatives like the Multicenter Perioperative Outcomes Group (MPOG) to improve data collection, storage, and validation. Privacy and data security are major concerns, as AI systems rely on vast amounts of sensitive patient data for training and analysis. Ensuring proper safeguards and adhering to strict data protection regulations are essential to maintain patient confidentiality and trust.

While data availability expansion is helpful in creating ML models, there is a formidable variety of data sources. Incompatible formats from different sources, varied data structures, and messy data are all challenges. There are no universally agreed upon data collection standards to support what is known as “interoperability” between data sources. For an AI model to have wide adoption, it must be able to interface with the data within different EMRs. More EMRs are embracing FHIR (Fast Healthcare Interoperability Resources) as a standard for health care data to facilitate exchange, but this is far from a perfect solution.

For some potentially wonderful AI tools, we currently don’t even capture the necessary data. ML-targeted drug delivery may improve on current pharmacokinetic and pharmacodynamics models, but we need highly granular information on dose timing and patient physiologic effects. Even with availability of that granularity, capture and analysis of available data is still only, at best, observational data, with all of the inherent limitations of observational study design. Ideally, for machine learning, we need data that is of high quality, clinically relevant, with few missing values and is closely related to the population we wish to study. This doesn’t just happen organically; we will need to do some work before we can let our AI algorithms start to learn. This preprocessing or data cleaning is time and labor intensive and is therefore costly.

We also need to be sure to only use features that would be available at the time the model is clinically useful. This is why AI development must be guided by subject matter expert physicians working closely with data scientists to select the appropriate data fields and to examine the feasibility of values that are fed into the model. There are a limited number of clinicians who can translate between the computational aspects of model building and the clinical insights around the problem to be solved and then integrate the model into clinical workflow. This fact leaves a skills gap between model development and implementation. There is a huge market demand for data scientists across multiple industries.

When developing our projects, we need to choose the right question. This might be in an area of high clinical or monetary value and should represent clinical or operational areas where there is provider consensus on appropriate management, where the necessary data are already available, and where the application of AI has the capacity to reduce cognitive burden and provide useful insight for the provider. What bedside clinicians most value may not match what the nonclinician purchasers and funders of ML tools value, nor what is easy to produce by developers working with existing datasets. ML systems are situated to recommend evidence-based clinical actions where data exists, with greater perspective than any individual clinician. However, AI lacks the ability to contextualize a clinical decision to the wider care of an individual patient. Therefore, AI systems are better deployed in support of clinician knowledge, rather than as clinician replacement. They are also more likely to be trusted and accepted in this way. The term artificial intelligence is therefore potentially misleading. The near future may be one of augmented intelligence in which computers become indispensable, helping us care for our patients and allowing physicians and other health care providers to devote more time to patient care.

Recent high-profile AI safety disasters such as the Boeing 737 Max and Tesla Model S crashes have been attributed to user error from a lack of familiarity with automated piloting systems and use outside their intended design. It’s therefore important for clinical users to understand the model’s limitations and appropriate application. Trust in any new technology or technique is a massive concern – without it, we simply will not change our practice. But machine learning presents relationships between variables in a fashion that is unreadable to the human eye. DARPA, the Defense Advanced Research Projects Agency, is investigating what we as humans require to understand and trust AI systems. The FDA is currently designing procedures to guide premarket review of proposed clinical ML applications.

Researchers and developers must consider the full ethical, bias, and safety implications of deploying these new technologies. As with the introduction of any new treatment, the need for careful appraisal, validation, and monitoring of tools using AI does not stop with implementation. The proposed FDA framework includes a type of “Phase IV” post-marketing surveillance that will be essential to assess model performance. Many models are trained on real-world datasets – the treatment delivered to actual patients by physicians – rather than based on a known gold standard of care. There is extensive evidence that the health care outcomes of minority groups fall below expected outcomes due to systematic bias. Although AI aims to add no additional bias, if we do not adjust for bias in the training data, it becomes baked into the model.

As we develop models, we must be sure to develop not what we can achieve with progressive new technologies, but what we should create, based on sound ethical principles. This will help establish and maintain public and clinician trust, which is essential to acceptance of AI as it matures into widespread clinical use. We are now in a stage of discovery, with efforts focused on model development. There is a divide between the promise of ML applications and their implementation. Progress is hindered by the need for multiple complementary skill sets, which may not be present in specific institutions or teams. We’re getting there, but it’s slow as we continue to tackle these challenges.