Meet Lunit at EUSOBI 2025, UK book a meeting

Agreement Across 10 Artificial Intelligence Models in Assessing HER2 in Breast Cancer Whole Slide Images: Findings from the Friends of Cancer Research Digital PATH Project

Brittany McKelvey et al. et al. - SABCS 2024

AUTHORS

Brittany McKelvey1, Pedro A. Torres-Saavedra2, Jessica Li2, Glenn Broeckx3, Frederik Deman3, Siraj Ali4, Hillary Andrews1, Salim Arslan5, Santhosh Balasubramanian6, J. Carl Barrett7, Peter Caie8, Ming Chen9, Daniel Cohen10, Tathagata Dasgupta11, Brandon Gallas12, George Green13, Mark Gustavson14, Sarah Hersey15, Ana Hidalgo-Sastre14, Shahanawaz Jiwani16, Wonkyung Jung4, Kimary Kulig17, Vladimir Kushnarev18, Xiaoxian Li19, Meredith Lodge8, Joan Mancuso20, Mike Montalto21, Satabhisa Mukhopadhyay11, Matthew Oberley9, Pahini Pandya5, Oscar Puig22, Edward Richardson23, Alexander Sarachakov18, Or Shaked22, Mark Stewart1, Lisa M. McShane2, Roberto Salgado3, Jeff Allen1

1Friends of Cancer Research, 2Division of Cancer Treatment and Diagnosis, National Cancer Institute, 3ZAS Hospitals, 4Lunit, 5Panakeia, 6PathAI, 7Univeristy of North Carolina at Chapel Hill, 8Indica Labs, 9Caris Life Sciences, 10GlaxoSmithKline, 114D Path, 12Center for Devices and Radiological Health, U.S. Food and Drug Administration,

13GA Green Consulting LLC, 14AstraZeneca, 15Bristol Myers Squibb, 16Molecular Characterization Laboratory, Frederick National Lab, National Cancer Institute, 17Kulig Consulting, 18BostonGene, 19Emory University, 20Patient Advocate, 21Amgen, 22Nucleai, 23Merck & Co., Inc.

PUBLISHED

SABCS 2024

Introduction

• Recent successes of HER2 antibody-drug conjugates (ADCs) have expanded patient eligibility for HER2-targeted therapy.

• Accurate and consistent identification of patients who may benefit from ADCs, by assessing HER2 expression, is critical.

• Previous studies of agreement in HER2 scoring between pathologists highlight areas of discordance.

• AI models have the potential to deliver more quantitative and reproducible HER2 assessments.

• Large-scale comparative evaluations of these models’ performance are currently lacking.

• Friends of Cancer Research created a research partnership, The Digital PATH Project, to describe and evaluate the agreement of HER2 assessment across independently developed AI models.


Materials & Methods

Patient Samples

Whole slide images (WSIs), both H&E-stained and HER2 IHC (N=1,124), from patients diagnosed with breast cancer in 2021 (N=733) were obtained from a single laboratory (ZAS Hospital, Antwerp, Belgium). Available pathology and specimen metadata include HER2 (ASCO/CAP3) scoring by three breast pathologists and information on

slide processing and digitization.

Computational Pathology Models

Known commercial developers of HER2 computational pathology models were invited to participate in the project, resulting in 9 developers representing 10 models. Model attributes (e.g., input WSI type, HER2 output, key training/validation methods) were provided by the developers. The 10 AI models assessed HER2 status on all cases.

Statistical Analysis

A defined reference standard was not used. Agreement was evaluated using the overall percent agreement (OPA) and Cohen’s kappa coefficient for all possible pairings of models across samples. Statisticians from the NCI Biometric Research Program performed independent analyses of pairwise comparisons of each models’ HER2 outputs to determine the level of agreement. Results shown evaluate agreement across the 7 models providing predicted ASCO/CAP scores.


Conclusions

This unique partnership allowed us to assess the agreement of HER2 biomarker assessment across computational pathology models developed independently.

• Cases reported as HER2 3+ had the least variability and highest level of agreement across models.

• Cases reported as HER2 1+ and 2+ had larger inter-model variations observed.

• The trends in level of agreement between models across HER2 reported scores is similar to published agreement measures between pathologists.

This ongoing partnership will enable a greater understanding of the variability across AI models under development and support establishing best practices for measuring and reporting AI-driven biomarker assessments in drug development and clinical practice, as well as informing approaches for the use of reference sets.

Read the full paper
Lunit SCOPEOncologyPathologySABCS

More from Blog

No Data