mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis. Academic Article uri icon

Overview

abstract

  • Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g., slide-level). However, there is no effective way to integrate multi-scale image representations with text data in a seamless end-to-end process. In this study, we introduce Multi-Level Text-Guided Representation End-to-End Learning (mTREE). This novel text-guided approach effectively captures multi-scale WSI representations by utilizing information from accompanying textual pathology information. mTREE innovatively combines - the localization of key areas ("global-to-local") and the development of a WSI-level image-text representation ("local-to-global") - into a unified, end-to-end learning framework. In this model, textual information serves a dual purpose: firstly, functioning as an attention map to accurately identify key areas, and secondly, acting as a conduit for integrating textual features into the comprehensive representation of the image. Our study demonstrates the effectiveness of mTREE through quantitative analyses in two image-related tasks: classification and survival prediction, showcasing its remarkable superiority over baselines. Code and trained models are made available at https://github.com/hrlblab/mTREE.

publication date

  • January 1, 2025

Identity

PubMed Central ID

  • PMC12662735

Scopus Document Identifier

  • 105000833089

Digital Object Identifier (DOI)

  • 10.2352/ei.2025.37.12.hpci-183

PubMed ID

  • 41323017

Additional Document Info

volume

  • 37