Background

Research on electronic health record physiologic data is common, invariably including artifacts. Traditionally, these artifacts have been handled using simple filter techniques. The authors hypothesized that different artifact detection algorithms, including machine learning, may be necessary to provide optimal performance for various vital signs and clinical contexts.

Methods

In a retrospective single-center study, intraoperative operating room and intensive care unit (ICU) electronic health record datasets including heart rate, oxygen saturation, blood pressure, temperature, and capnometry were included. All records were screened for artifacts by at least two human experts. Classical artifact detection methods (cutoff, multiples of SD [z-value], interquartile range, and local outlier factor) and a supervised learning model implementing long short-term memory neural networks were tested for each vital sign against the human expert reference dataset. For each artifact detection algorithm, sensitivity and specificity were calculated.

Results

A total of 106 (53 operating room and 53 ICU) patients were randomly selected, resulting in 392,808 data points. Human experts annotated 5,167 (1.3%) data points as artifacts. The artifact detection algorithms demonstrated large variations in performance. The specificity was above 90% for all detection methods and all vital signs. The neural network showed significantly higher sensitivities than the classic methods for heart rate (ICU, 33.6%; 95% CI, 33.1 to 44.6), systolic invasive blood pressure (in both the operating room [62.2%; 95% CI, 57.5 to 71.9] and the ICU [60.7%; 95% CI, 57.3 to 71.8]), and temperature in the operating room (76.1%; 95% CI, 63.6 to 89.7). The CI for specificity overlapped for all methods. Generally, sensitivity was low, with only the z-value for oxygen saturation in the operating room reaching 88.9%. All other sensitivities were less than 80%.

Conclusions

No single artifact detection method consistently performed well across different vital signs and clinical settings. Neural networks may be a promising artifact detection method for specific vital signs.

Editor’s Perspective
What We Already Know about This Topic
  • Modern perioperative and critical care clinical research often uses electronic health record physiologic data.
  • The ideal physiologic data artifact detection algorithm remains unclear.
What This Article Tells Us That Is New
  • In a single-center retrospective analysis of 53 operating room and 53 intensive care unit (ICU) patients, 5,167 of 392,808 (1.3%) electronic health record measurements (heart rate, oxygen saturation measured by pulse oximetry [Spo2], blood pressure, temperature, and capnometry) were annotated by human reviewers as an artifact.
  • A comparison of classic artifact detection methods (cutoff, multiples of SD [z-value], interquartile range, and local outlier factor) and a supervised neural network against the human reviewer standard demonstrated that no single method was superior from a sensitivity or specificity perspective.
  • The highest performing method’s sensitivity ranged widely for operating room patients, from 36% for diastolic noninvasive blood pressure to 89% for Spo2. The highest performing sensitivity was greater than 70% for capnometry, Spo2, temperature, and invasive mean arterial pressure for operating room patients.
  • For ICU patients, the highest performing method’s sensitivity ranged from 34% for heart rate to 74% for Spo2. The highest performing sensitivity was greater than 70% for capnometry, Spo2, and temperature.