Comparing multiple people each to the grand mean of log-normally distributed endpoints

Authors: Chen P-F et al.

Journal of Clinical Anesthesia, 2025. DOI: 10.1016/j.jclinane.2025.112083

Summary
Many anesthesiology performance metrics—such as anesthesia time, surgical time, epidural placement time, extubation time, and PACU time—follow log-normal distributions. This study evaluated how different statistical approaches perform when comparing each individual anesthesiologist or surgeon (“person”) with the overall grand mean to identify outliers. Using Monte Carlo simulations, the authors tested fixed effects models, generalized pivotal methods, and mixed effects models under a variety of conditions including unequal sample sizes and the presence or absence of variance adjustments.

Fixed effects models without Bonferroni correction performed very poorly, with Type I error rates more than tenfold higher than acceptable levels. Adding Šidák corrections improved accuracy only modestly because the comparisons were not fully independent. Robust (heteroscedastic-consistent) variance estimators performed better than default variance estimates, but were unreliable when each person contributed only about 10 cases. Even with Bonferroni correction and robust variance, large inaccuracies persisted when sample sizes differed among clinicians, even if the median was around 60 cases per person.

Mixed effects models (using shrinkage/empirical Bayes estimates) performed well only when not paired with Bonferroni correction; when combined, error rates increased substantially. In contrast, generalized pivotal inference—applied separately for each person—produced very small error rates, especially when paired with a Šidák adjustment.

Key Points
• Anesthesia performance endpoints are typically log-normally distributed, complicating comparisons across clinicians.
• Fixed effects models without Bonferroni correction have extremely high Type I error rates (>10-fold).
• Robust variance estimators help, but remain inaccurate with small or unequal sample sizes.
• Mixed effects models perform poorly when Bonferroni correction is added.
• Generalized pivotal inference with a Šidák adjustment is the most reliable method, especially when sample sizes are not in the hundreds per person.

Thank you for allowing us to use this article from the Journal of Clinical Anesthesia.

Leave a Reply

Your email address will not be published. Required fields are marked *