SimText: a text mining framework for interactive analysis and visualization of similarities among biomedical entities. Academic Article uri icon

Overview

abstract

  • SUMMARY: Literature exploration in PubMed on a large number of biomedical entities (e.g. genes, diseases or experiments) can be time-consuming and challenging, especially when assessing associations between entities. Here, we describe SimText, a user-friendly toolset that provides customizable and systematic workflows for the analysis of similarities among a set of entities based on text. SimText can be used for (i) text collection from PubMed and extraction of words with different text mining approaches, and (ii) interactive analysis and visualization of data using unsupervised learning techniques in an interactive app. AVAILABILITY AND IMPLEMENTATION: We developed SimText as an open-source R software and integrated it into Galaxy (https://usegalaxy.eu), an online data analysis platform with supporting self-learning training material available at https://training.galaxyproject.org. A command-line version of the toolset is available for download from GitHub (https://github.com/dlal-group/simtext) or as Docker image (https://hub.docker.com/r/dlalgroup/simtext/tags.). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

publication date

  • November 18, 2021

Research

keywords

  • Data Mining
  • Software

Identity

PubMed Central ID

  • PMC9502138

Scopus Document Identifier

  • 85127613794

Digital Object Identifier (DOI)

  • 10.1093/bioinformatics/btab365

PubMed ID

  • 34037702

Additional Document Info

volume

  • 37

issue

  • 22