Deciphering Phage-Host Specificity Based on the Association of Phage Depolymerases and Bacterial Surface Glycan with Deep Learning. uri icon

Overview

abstract

  • Phage tailspike proteins are depolymerases that target diverse bacterial surface glycans with high specificity, determining the host-specificity of numerous phages. To address the challenge of identifying tailspike proteins due to their sequence diversity, we developed SpikeHunter, an approach based on the ESM-2 protein language model. Using SpikeHunter, we successfully identified 231,965 tailspike proteins from a dataset comprising 8,434,494 prophages found within 165,365 genomes of five common pathogens. Among these proteins, 143,035 tailspike proteins displayed strong associations with serotypes. Moreover, we observed highly similar tailspike proteins in species that share closely related serotypes. We found extensive domain swapping in all five species, with the C-terminal domain being significantly associated with host serotype highlighting its role in host range determination. Our study presents a comprehensive cross-species analysis of tailspike protein to serotype associations, providing insights applicable to phage therapy and biotechnology.

publication date

  • June 16, 2023

Identity

PubMed Central ID

  • PMC10370184

Digital Object Identifier (DOI)

  • 10.1101/2023.06.16.545366

PubMed ID

  • 37503040