Case presentation:
A 10-year-old boy with a complex past medical history notable for neonatal stroke resulting in cortical blindness, tracheoesophageal fistula repaired in infancy, and ASD/VSD repair requiring subsequent pacemaker placement and pacemaker dependence presented for iliac bone graft for correction of an alveolar maxillary cleft defect. During the procedure, the patient suddenly coughed with resulting desaturation and increased peak airway pressures, followed by the development of profound hypotension. Treatment was started for presumed anaphylaxis despite an unclear inciting agent. The patient remained profoundly hypotensive after treatment with multiple vasopressors (which included IM epinephrine), famotidine, steroid, diphenhydramine, and albuterol. Through careful investigation and collaboration, the medical team eventually discovered that Gelfoam®, a product used during the surgery, was likely the cause of the allergic reaction. The surgeon promptly reopened the incision, irrigated the area, and removed the Gelfoam®. This intervention led to a significant improvement in the patient’s condition. Within a few hours after transport to the pediatric intensive care unit, the patient became hemodynamically stable and he was extubated the following day. Detailed interview with the patient and his family afterward revealed that he had a history of “mouth tingling” when eating certain gummy candies. Further workup with allergy confirmed elevated intraoperative tryptase level, elevated porcine gelatin IgE, and positive skin prick test for porcine gelatin.
“AI won’t replace humans – but humans with AI will replace humans without AI”
This is a fascinating case that was previously discussed before the meteoric rise of large language models (LLMs) such as ChatGPT (ASA Monitor 2023;87:11). LLMs are machine learning/artificial intelligence (ML/AI) models trained on large quantities of text to learn how to interpret and generate natural language content. Each LLM has parameters that are adjusted during training. A higher number of parameters results in a higher level of complexity and an improved ability to create sophisticated communications of a quality that rivals or exceeds the linguistic capability of many carbon-based rhetoricians (Nat Med 2023;29:1930-40; asamonitor.pub/46s5Qjx).
LLMs were first released widely to the public in late 2022. Today, multiple LLMs are publicly available, including OpenAI’s GPT-4 (powering the infamous ChatGPT), Google’s PaLM 2 (powering Bard), Microsoft’s Turing-NLG, Meta’s LLaMa 2, and multiple other open-source offerings. The rapid proliferation of such models makes it difficult to imagine a future where conversationally fluent machines are not ubiquitous in society.
“This case underscores the fact that while large language models (LLMs) can potentially provide valuable insights and generate reasonable considerations, they have yet to fully replicate the depth of clinical judgment and contextual understanding that experienced medical providers frequently demonstrate.”
It must be noted that LLMs fundamentally generate text by predicting the most likely next word in a response, based on its parameters and training material. LLMs have no intrinsic reasoning or deduction capabilities, which can lead to “hallucinations,” i.e. confident-sounding answers that are factually inaccurate (Nat Med 2023;29:1930-40). Nevertheless, LLMs have been shown to comfortably pass medical board exams and have been increasingly tested in medical diagnosis and decision-making situations (JAMA 2023;330:792-4). However, questions abound regarding their usefulness and efficacy in complex cases. We introduced the above case to publicly available LLMs as of August 2023 (i.e., OpenAI’s ChatGPT and Google’s Bard) to see if they could have done better than this medical team in resolving this potentially fatal situation.
On initial presentation (including the gummy bear history and Gelfoam® usage but without revealing what improved the patient’s condition), the LLMs provided general recommendations that were aligned with the medical team’s approach, highlighting the importance of airway and cardiovascular assessment, pacemaker evaluation, and consideration of an allergic reaction. However, the LLMs did not explicitly identify the correlation between gummy bear allergy and Gelfoam® usage. Upon further nudging the LLMs with relational questions (e.g., “Is there cross-reactivity between gummy bears and surgical foam?”), the LLMs correctly identified the possibility of porcine gelatin as a trigger.
This case underscores the fact that while LLMs can potentially provide valuable insights and generate reasonable considerations, they have yet to fully replicate the depth of clinical judgment and contextual understanding that experienced medical providers frequently demonstrate (JAMA 2023;330:792-4). The successful outcome of this case hinged on the medical team’s in-depth understanding of medical nuances, strong multidisciplinary collaboration, ability to connect seemingly unrelated information (such as the gummy bear history), and timely and targeted diagnosis and intervention based on their extensive medical knowledge.
Today’s LLMs are already potent tools, but they are an emerging technology whose maturation date is still years in the future. Currently, they can fulfill a wide range of natural-language information requests with an impressive degree of accuracy and eloquence. However, they cannot independently connect rationally adjacent dots to make leaps in logic, despite appearing to possess all the facts necessary to do so. As LLMs improve through advances in training and expansions in training material (i.e., their knowledge base), their output will more closely approximate the human ability to reason (ASA Monitor 2023;87:11; Nat Med 2023;29:1930-40; JAMA 2023;330:792-4).
In future health care settings, LLMs will likely run in real-time alongside other forms of AI, flagging concerning anomalies and providing decision support as needed (JAMA 2023;330:792-4). In the field of anesthesiology, one can imagine a system with access to live patient data, within which a health care-specific LLM constantly assesses every patient’s intraoperative status and provides alerts when it detects significant changes in patient conditions or risk. Additionally, the LLM might tediously examine each line in a patient’s chart, incorporating every note, lab result, and comment to rapidly answer questions about the patient’s history, to identify potential individualized complications for upcoming procedures, and to unearth useful clues regarding an evasive diagnosis.
Perioperative anaphylaxis is rare but can be potentially lethal if not effectively managed in a timely fashion. This case emphasizes the importance of thorough investigation and the value of history and knowledge in accurate diagnosis and treatment. A multidisciplinary approach empowers health care providers to effectively address edge cases such as anaphylaxis. In the near future, LLMs and other forms of AI will likely begin to augment the health care team by providing real-time monitoring and decision support. As always, effective collaboration with all members of the health care team, including AI, is key in improving patient safety and minimizing complications.