Multimodal out-of-distribution individual uncertainty quantification enhances binding affinity prediction for polypharmacology. Academic Article uri icon

Overview

abstract

  • Polypharmacology, a single drug that targets multiple proteins, holds promise for addressing unmet medical needs. Achieving accurate, reliable and scalable predictions of protein-ligand binding affinity across multiple proteins is crucial to realizing the potential of polypharmacology. Machine learning offers a powerful tool for multitarget binding affinity prediction. However, three major challenges remain: generalizing predictions to out-of-distribution compounds that are structurally different from those in the training data; quantifying the uncertainty of predictions in out-of-distribution scenarios where the assumption underlying existing methods does not hold; and scaling to billions of compounds, which remains unattainable for current structure-based methods. Here, to overcome these challenges, we propose a model-agnostic anomaly detection-based individual uncertainty quantification method: embedding Mahalanobis Outlier Scoring and Anomaly Identification via Clustering (eMOSAIC). eMOSAIC features the divergence between the multimodal representations of known cases and unseen instances and quantifies individual prediction uncertainty on a compound-by-compound basis. We integrate eMOSAIC with a multimodal deep neural network for multitarget ligand binding affinity predictions, leveraging a structure-informed large protein language model. Comprehensive validation in out-of-distribution settings demonstrates that eMOSAIC significantly outperforms state-of-the-art sequence-based and structure-based methods as well as existing uncertainty quantification approaches. These findings underscore eMOSAIC's potential to advance real-world polypharmacology and other applications that require robust predictions and scalable solutions.

publication date

  • December 18, 2025

Identity

PubMed Central ID

  • PMC12798713

Scopus Document Identifier

  • 105025357622

Digital Object Identifier (DOI)

  • 10.1038/s42256-025-01151-2

PubMed ID

  • 41537122

Additional Document Info

volume

  • 7

issue

  • 12