Stability and reliability of artificial intelligence models in embryo selection for in vitro fertilization. Academic Article uri icon

Overview

abstract

  • OBJECTIVE: To evaluate the stability and reliability of artificial intelligence (AI) models and approaches in embryo selection and rank ordering for in vitro fertilization (IVF). DESIGN: A laboratory-based study evaluating the stability and consistency of single instance learning models that assess embryos individually, predicting live-birth outcomes based solely on each embryo's morphological features. Fifty replicate convolutional neural networks with varying initialization parameters were trained and tested across two independent fertility center datasets. Model performance was assessed through embryo rank ordering, critical error rates, and intermodel variability. Interpretability analyses using gradient-weighted class activation mapping and t-distributed stochastic neighbor embedding were conducted to explore decision-making discrepancies among replicate models. SUBJECTS: The study utilized retrospective embryo datasets from Massachusetts General Hospital and Weill Cornell Fertility Center, including images from 1,258 patients and 10,713 embryos from Massachusetts General Hospital, and 53 patients with 648 embryos from Cornell. MAIN OUTCOME MEASURES: Consistency in embryo ranking (Kendall's W), frequency of critical errors (instances where low-quality embryos were top-ranked), and intermodel variability across datasets. RESULTS: The AI models demonstrated poor consistency in embryo rank ordering (Kendall's W approximately 0.35) and exhibited high critical error rates (approximately 15%), often ranking lower-quality embryos above viable ones. Significant intermodel variability was observed even among models with similar predictive accuracies (area under curve approximately 60%). When tested on data from a different fertility center, model instability increased (error variance delta: 46.07%2), highlighting sensitivity to distribution shifts. Interpretability analyses revealed divergent decision-making strategies among replicate models, despite identical architectures and training protocols. CONCLUSION: Single instance learning AI models for IVF embryo selection exhibit substantial instability and inconsistency, undermining their clinical reliability. High intermodel variability and critical error rates raise concerns about their suitability for real-world deployment. This study highlights the need for more stable AI frameworks and robust evaluation metrics tailored to the clinical demands of IVF.

publication date

  • August 26, 2025

Research

keywords

  • Artificial Intelligence
  • Embryo Transfer
  • Fertilization in Vitro
  • Infertility

Identity

PubMed Central ID

  • PMC12494150

Scopus Document Identifier

  • 105016848601

Digital Object Identifier (DOI)

  • 10.1016/j.fertnstert.2025.08.021

PubMed ID

  • 40876725

Additional Document Info

volume

  • 125

issue

  • 2