Simulants: Synthetic Clinical Trial Data via Subject-Level Privacy-Preserving Synthesis.
Academic Article
Overview
abstract
Clinical trials capture high-quality data for millions of patients each year, yet these data are largely unavailable for research beyond the scope of any individual trial due to a combination of regulatory, intellectual property, and patient privacy barriers. Synthetic clinical trial data that captures the analytical properties of the source data, could provide significant value for research and drug development by making insights widely available while protecting the privacy of the participants. We present a method "Simulants" for generating research-grade synthetic clinical trial data from a real data source. We compared the fidelity and privacy preservation performance of Simulants to the state-of-the-art deep learning synthesizers and found that Simulants had superior performance when applied to clinical trial data as assessed both by established metrics and when considering critical clinical features. We also demonstrate how Simulants' privacy settings may be configured to conform to specific privacy policies governing data sharing.