Authors: Sun T et al.
Anesthesia & Analgesia, February 3, 2026, 10.1213/ANE.0000000000007927
This national retrospective cohort study examined whether gender or race is associated with differences in ACGME Milestone ratings during anesthesiology residency. Using linked ACGME Milestone data and AAMC demographic data, the authors analyzed 4,997 residents from 141 programs who graduated between 2017 and 2019.
The primary objective was to determine whether baseline ratings, growth trajectories, or final Milestone scores differed by gender or race using three-level mixed-effects modeling across 25 subcompetencies.
Key Findings
-
Baseline ratings
At entry into training, there were no statistically significant differences in Milestone ratings by gender or race. -
Growth trajectories by race
Residents identified as underrepresented in medicine (URiM) demonstrated slower growth in Milestone scores over time.
By graduation, URiM residents had significantly lower ratings across all 25 subcompetencies. -
Growth trajectories by gender
Overall growth rates did not significantly differ between male and female residents.
However, by graduation, women received lower Milestone ratings in 7 of the 25 subcompetencies. -
Areas of greatest difference
The largest effect sizes were observed in Medical Knowledge-1 (MK-1).
Gender differences were primarily concentrated in Patient Care (PC) and Medical Knowledge (MK) domains.
Most other observed differences had small effect sizes.
Interpretation
The most important insight is that disparities were not present at baseline. Instead, modest but cumulative differences in growth rates over time resulted in measurable differences at graduation. This suggests that differential evaluation patterns may emerge during training rather than being explained by initial ability differences at entry.
The fact that MK-1 demonstrated the largest disparities raises important questions, as this domain typically reflects foundational medical knowledge expected for independent anesthesiology practice.
Because Milestones are criterion-referenced rather than norm-referenced, systematic differences by demographic group raise concerns about evaluation bias, differential mentorship, assessment culture, or structural factors within competency-based medical education (CBME) frameworks.
Key Points
• No baseline gender or racial differences in Milestone ratings at entry to residency.
• URiM residents demonstrated slower growth across training, resulting in lower graduation ratings in all subcompetencies.
• Women had lower graduation ratings in 7 of 25 subcompetencies, particularly in Patient Care and Medical Knowledge domains.
• The largest observed disparity was in MK-1.
• Differences were generally modest in magnitude but cumulative over time.
What You Should Know
For program leadership and department chairs, this study emphasizes the importance of longitudinal evaluation patterns rather than focusing only on initial trainee performance. Small rating differences that appear negligible early in training can compound over years.
Given your leadership roles in multiple anesthesia groups and oversight of education environments, this type of data reinforces the need for:
• Structured evaluator training
• Routine equity audits of Milestone data
• Transparent faculty feedback calibration
• Objective anchors for Medical Knowledge assessments
The study does not demonstrate causation or intentional bias, but it clearly identifies a persistent pattern that warrants closer examination at the program level.
Thank you to Anesthesia & Analgesia for allowing us to summarize and share this article.