Stability and reliability of artificial intelligence models in embryo selection for in vitro fertilization.

Overview

abstract

OBJECTIVE: To evaluate the stability and reliability of artificial intelligence (AI) models and approaches in embryo selection and rank ordering for in vitro fertilization (IVF). DESIGN: A laboratory-based study evaluating the stability and consistency of single instance learning models that assess embryos individually, predicting live-birth outcomes based solely on each embryo's morphological features. Fifty replicate convolutional neural networks with varying initialization parameters were trained and tested across two independent fertility center datasets. Model performance was assessed through embryo rank ordering, critical error rates, and intermodel variability. Interpretability analyses using gradient-weighted class activation mapping and t-distributed stochastic neighbor embedding were conducted to explore decision-making discrepancies among replicate models. SUBJECTS: The study utilized retrospective embryo datasets from Massachusetts General Hospital and Weill Cornell Fertility Center, including images from 1,258 patients and 10,713 embryos from Massachusetts General Hospital, and 53 patients with 648 embryos from Cornell. MAIN OUTCOME MEASURES: Consistency in embryo ranking (Kendall's W), frequency of critical errors (instances where low-quality embryos were top-ranked), and intermodel variability across datasets. RESULTS: The AI models demonstrated poor consistency in embryo rank ordering (Kendall's W approximately 0.35) and exhibited high critical error rates (approximately 15%), often ranking lower-quality embryos above viable ones. Significant intermodel variability was observed even among models with similar predictive accuracies (area under curve approximately 60%). When tested on data from a different fertility center, model instability increased (error variance delta: 46.07%²), highlighting sensitivity to distribution shifts. Interpretability analyses revealed divergent decision-making strategies among replicate models, despite identical architectures and training protocols. CONCLUSION: Single instance learning AI models for IVF embryo selection exhibit substantial instability and inconsistency, undermining their clinical reliability. High intermodel variability and critical error rates raise concerns about their suitability for real-world deployment. This study highlights the need for more stable AI frameworks and robust evaluation metrics tailored to the clinical demands of IVF.

authors

Thirumalaraju, Prudhvi

Kanakasabapathy, Manoj Kumar

Kandula, Hemanth

Kandula, Tinendra

Reddy Katkuri, Aditya Vardhan

Cipriano, Cameron

Malmsten, Jonas E.
Zaninovic, Nikica
Bormann, Charles L
Shafiee, Hadi

publication date

August 26, 2025

published in

Fertility and sterility Journal

Research

keywords

Artificial Intelligence
Embryo Transfer
Fertilization in Vitro
Infertility

Identity

PubMed Central ID

PMC12494150

Scopus Document Identifier

105016848601

Digital Object Identifier (DOI)

10.1016/j.fertnstert.2025.08.021

PubMed ID

40876725

Additional Document Info

has global citation frequency

1

volume

125

issue

2

VIVO Weill Cornell Medical College

Stability and reliability of artificial intelligence models in embryo selection for in vitro fertilization. Academic Article

Overview

abstract

authors

publication date

published in

Research

keywords

Identity

PubMed Central ID

Scopus Document Identifier

Digital Object Identifier (DOI)

PubMed ID

Additional Document Info

has global citation frequency

volume

issue