An anesthesiology resident was caring for a patient scheduled for removal of a baclofen pump due to suspected infection, and during the preoperative period the neurosurgery resident assisting on the case requested the anesthesiology resident consider an appropriate dose to cover until the pump was working again. The patient was receiving 100 mcg/day by continuous intrathecal infusion. The resident, not having performed this conversion before, queried ChatGPT. The first response was to consult a health care provider, and when the resident responded that they indeed were, the application suggested that 1 mcg of intrathecal baclofen is equivalent to 50-100mg of oral baclofen, which in this patient would have been 5,000-10,000 mg divided QID, or at the low end, over 1,000 mg every 6 hours. As the resident did not find these doses in the electronic health record, he queried ChatGPT about the error and it stated, “Generally, conversions from intrathecal to oral baclofen start with a much lower dose of oral baclofen than the intrathecal dose.” Reading this, the resident smartly discontinued the use of ChatGPT for clinical queries and suggested the surgical resident check with their attending.

The medical field has seen a notable influx of artificial intelligence (AI) tools aimed at enhancing efficiency, diagnostic accuracy, and patient care. Among these, generative pretrained transformers, particularly ChatGPT, have garnered significant attention. ChatGPT, developed by OpenAI, is a language model that interacts in a conversational manner. For anesthesiologists and anesthetists, understanding how ChatGPT works, and its potential applications and limitations, can provide valuable insights into its role in health care, as well as avoidance of the error described in the case.

ChatGPT is an AI model based on the transformer architecture, first described in a 2017 paper by Vaswani et al. It’s a type of large language model (LLM) that uses deep learning techniques to produce human-like text based on the input it receives. The model is “pretrained” on a diverse range of internet text. After this initial training, it undergoes “fine-tuning,” where it is further trained on specific tasks to improve its accuracy and relevance in those areas. If you are interested in additional detail on how ChatGPT works, MIT’s Rama Ramakrishnan provides an excellent primer (asamonitor.pub/3K8i9bn).

Understanding how ChatGPT functions can explain why it may deliver incorrect information. This has been studied by a team of pharmacist researchers from Long Island University, led by Dr. Sarah Grossman. The team challenged ChatGPT with 39 medication-related queries, and a satisfactory result was returned for 10. Of the 29 remaining, ChatGPT did not directly address the question (11) or was inaccurate (10) and/or incomplete (12). ChatGPT was also requested to provide references, and in the eight cases that it did, all contained nonexistent references (asamonitor.pub/4dDQU6f).

The author asked ChatGPT (version 4.0) why it cannot cite accurate medical references, and this was the response:

ChatGPT can’t cite medical references for a few key reasons:

  1. Fixed Knowledge BaseChatGPT’s training includes a vast array of texts from the internet up to a specific cutoff in December 2023, but it does not have real-time access to new publications or databases. This means it can’t access or cite the most recent studies or articles directly from medical journals.
  2. No Internet AccessDuring its operation, ChatGPT does not have the ability to browse the internet or access external databases, which would be necessary to retrieve and cite current medical literature or specific studies accurately.
  3. Lack of SpecificityWhile ChatGPT is trained on a diverse set of data, it doesn’t store information in a way that can be directly retrieved with specific details about sources, such as author names, journal titles, or publication dates. Instead, it generates responses based on patterns and information it learned during training.
  4. Generalization Over PrecisionThe model is optimized for language understanding and generation rather than for providing precise academic citations. It aims to assist with information and conversation based on a generalized understanding of topics, rather than functioning as a research tool that offers specific citations.
  5. Ethical and Legal ConsiderationsThere are also ethical considerations around generating citations without verifying the latest research or the integrity of sources. Misrepresenting medical data or studies could lead to misinformation, which is particularly critical in the healthcare field.

Of note, the above response references the paid version of ChatGPT, whereas the free version is based off information available up to September of 2021. When the author asked ChatGPT what its limitations were for use in the medical field, it cited the list above and also added:

  1. Risk of MisinformationGiven the vast amount of medical information and the complexity of healthcare topics, there is a risk of generating responses that might be inaccurate or not applicable to specific situations.
  2. Ethical ConcernsThere are ethical considerations around the use of AI in medical decision-making, including concerns about bias, transparency, and accountability.

ChatGPT has the capability to generate what AI experts refer to as hallucinations. As ChatGPT itself explained above, the model is trained on a diverse set of information but does not have the capability to ensure the responses it is making are actually accurate. This also explains why it is capable of making up a reference.

At this point, you might be wondering what ChatGPT is good for and why there is so much interest in this application. ChatGPT has some remarkable attributes – in particular, as a digital assistant. EHR vendors have leveraged this skill to draft messages to patients and make replies to patients more efficient (asamonitor.pub/3QFd7ae). The key difference between this use and the case presented is that a human can review the information for accuracy and correct any errors before sending to the patient. “Human in the loop” AI decision support has great potential to advance health care by helping a human operator (e.g., the anesthesiologist in the room) recognize hidden threats. For example, several companies are now selling arterial-line-based analytic packages that can predict incipient hypotension and thus enable earlier mitigation. Analyzing the electronic medical record to draft a preoperative assessment for clinician review is another potentially labor-saving application.

I have a confession to make. In every AIRS Case Report to date, a report was submitted, and the discussion was based on a real medical error that happened. In this specific case, we used one of the examples found in Dr. Grossman’s research. However, when the author entered the same queries, the results were wrong; in this case, it would have resulted in a massive overdose, whereas Dr. Grossman’s group found a significant underdose.

While we have not had a submission of harm related to ChatGPT, we felt it prudent to present how this tool could be misused and result in patient harm. While there is a disclaimer “ChatGPT can make mistakes. Consider checking important information” posted at the bottom of the website, given the ubiquitous use of AI tools across society, we felt it was prudent to highlight the limitations. Has ChatGPT or other AI tools resulted in patient harm in your practice? Even if it was a near-miss, please submit an AIRS report and we will follow up in a future edition.

The AIRS Committee recommends using validated sources for medical reference and to use any AI tools with caution. While there may be a point in the future where a ChatBot can reliably answer questions at a professional level, that future is not today.

This entry was written by Patrick Guffey, MD, on behalf of the AIRS Committee.

∗In addition, the entry was produced with the assistance of OpenAI’s ChatGPT utilizing both the free and paid versions.