Highly accurate prophage island detection with PIDE. Academic Article uri icon

Overview

abstract

  • As important mobile elements in prokaryotes, prophages shape the genomic context of their hosts and regulate the structure of bacterial populations. However, it is challenging to precisely identify prophages through computational methods. Here, we introduce PIDE for identifying prophages from bacterial genomes or metagenome-assembled genomes. PIDE integrates a pre-trained protein language model and gene density clustering algorithm to distinguish prophages. Benchmarking with induced prophage sequencing datasets demonstrates that PIDE pinpoints prophages with precise boundaries. Applying PIDE to 4744 human gut representative genomes reveals 24,467 prophages with widespread functional capacity. PIDE is available at https://github.com/chyghy/PIDE , with model training code at https://zenodo.org/records/16457629 .

publication date

  • August 20, 2025

Research

keywords

  • Genomic Islands
  • Prophages
  • Software

Identity

PubMed Central ID

  • PMC12366036

Scopus Document Identifier

  • 105013789299

Digital Object Identifier (DOI)

  • 10.1186/s13059-025-03733-0

PubMed ID

  • 40836306

Additional Document Info

volume

  • 26

issue

  • 1