The impact of commercial health datasets on medical research and health-care algorithms.
Review
Overview
abstract
As the health-care industry emerges into a new era of digital health driven by cloud data storage, distributed computing, and machine learning, health-care data have become a premium commodity with value for private and public entities. Current frameworks of health data collection and distribution, whether from industry, academia, or government institutions, are imperfect and do not allow researchers to leverage the full potential of downstream analytical efforts. In this Health Policy paper, we review the current landscape of commercial health data vendors, with special emphasis on the sources of their data, challenges associated with data reproducibility and generalisability, and ethical considerations for data vending. We argue for sustainable approaches to curating open-source health data to enable global populations to be included in the biomedical research community. However, to fully implement these approaches, key stakeholders should come together to make health-care datasets increasingly accessible, inclusive, and representative, while balancing the privacy and rights of individuals whose data are being collected.