Back to List

Comparison of digital and artificial intelligence (AI)-computational algorithms for quantifying low/ultralow human epidermal growth factor receptor 2 (HER2) protein expression in metastatic breast cancer (mBC) from clinical samples

Published 2026

Comparison of digital and artificial intelligence (AI)-computational algorithms for quantifying low/ultralow human epidermal growth factor receptor 2 (HER2) protein expression in metastatic breast cancer (mBC) from clinical samples

Savitri Krishnamurthy, Dhanrajan Tiruchinapalli, Clara lam, Simon M. Collin, Rosemary Taylor, Linlin Luo, Anupriya Dutta, Ehab A. Elgabry, Michele S. Woo, Grace E. Kwon, Robert Egger, Jennifer A. Hipp, Lauren Brunner, Jeppe S. Thagaard, Thomas W. Ramsing, Henrik Høeg, Wonkyung Jung, Heon Song, Chang Ho Ahn, Vladimir Kravtsov, Patrick Frey, Ralf Banisch, Stella Redpath

AACR, 2026

Abstract

Background Based on DESTINY-Breast04 (HER2-low) and -06 (HER2-low/-ultralow) trials, T-DXd is approved for HER2-low (immunohistochemistry [IHC] 1+ or IHC 2+/in situ hybridization negative) or -ultralow (IHC 0 with membrane staining in ≤10% of tumor cells) mBC. Whole slide images (WSIs) from mBC biopsy samples scored HER2 IHC 0/1+ were rescored by pathologists and using digital pathology (DP) to evaluate concordance.

Methods This retrospective real-world evidence study included 384 WSIs collected 2020-2023, stained with PATHWAY HER2 (4B5) assay scored as HER2 IHC 0 (n = 246) or 1+ (n = 138). Three pathologists each performed 2 blinded readings per WSI using 2023 ASCO/CAP guidelines; if readings differed, a reconciled score was used. Consensus was agreement by ≥2 of 3 pathologists. The same WSIs were analysed with 4 AI-computational DP tools in development. Concordance vs manual consensus was measured by overall percentage agreement (OPA) and Cohen κ, with review time recorded.

Results Of 384 WSIs, 375 had aligned HER2 IHC scores by pathologist review; 9 had discordance. Among consensus cases, 2/3 agreement occurred in 154 WSIs (41.1%) and 3/3 in 221 (58.9%); 81 (21.6%) WSIs were reclassified as IHC 0 absent membrane staining, 85 (22.7%) as IHC 0 with membrane staining, 203 (51.4%) as IHC 1+, and 6 (1.6%) as IHC 2+. HER2 IHC rescoring results with the 4 DP tools are shown in Table 1. OPA (95% CI) between consensus and DP-assisted scores across all HER2 IHC score categories was 74% (69-78%), 73% (68-77%), 69% (64-74%), and 55% (50-60%). Cohen κ (95% CI) ranged from 0.33-0.59. Median review times were shorter with DP vs manual review.

Conclusion Preliminary analysis suggests integrating AI-computational DP tools into HER2 IHC clinical workflows may reduce pathologist review time. Further analysis is underway to assess concordance of DP tools with manual scoring.

Table 1. HER2 IHC Re-scores (including HER2-low/ultralow) of WSIs and Median Time to Review
HER2 IHC 0 absent membrane staining
n (%)
HER2 IHC 0
with membrane staining

n (%)
HER2 IHC 1+
n (%)
HER2 IHC 2+
n (%)
HER2 IHC 3+
n (%)
Missing resultsa
n (%)
OPA score between manual consensus and DP Tool,
% (95% CI)
Cohen κ
(95% CI)
Review time,
median (range), minutes
Manual consensus

 
N = 375
81 (21.6) 85 (22.7) 203 (54.1) 6 (1.6) 0 0 Not applicable Not applicable 6.7 (3.0-11.5)
DP tool
RV73X
N = 375
72 (19.2) 100 (26.7) 176 (46.9) 5 (1.3) 0 22 (5.9) 74 (69-78) 0.59
(0.52-0.66)
2.1 (0.3-50.2)
DP tool
MQ52G
N = 375
77 (20.5) 126 (33.6) 168 (44.8) 4 (1.1) 0 0 73 (68-77) 0.57
(0.51-0.64)
0.7 (0.1-8.5)
DP tool
KL84Q
N = 375
82 (21.9) 137 (36.5) 145 (38.7) 10 (2.7) 1 (0.3) 0 69 (64-74) 0.53
(0.46-0.60)
2.4 (0.5-25.8)
DP tool
ZX19P
N = 375
27 (7.2) 201 (53.6) 130 (34.7) 6 (1.6) 3 (0.8) 8 (2.1) 55 (50-60) 0.33
(0.27-0.39)
Not available
a’Full dataset: 375 WSIs; Missing results = WSIs without a DP tool output

 

View Abstract