SeqTrial: Utility Preserving Sequential Clinical Trial Data Generator.
Academic Article
Overview
abstract
Clinical trial data used to evaluate new treatments have value beyond the original studies, but limitations in data access due to privacy concerns make further use of these data challenging. Digital twins offer a solution by simulating patient outcomes, providing less restricted data access, reducing costs and increasing sample sizes. However, existing research focuses on synthetic Electronic Healthcare Records (EHRs) and lacks personalized patient record generation. This paper introduces SeqTrial, a framework for generating personalized digital twins for sequential clinical trial event data. The method uses BioBERT word embeddings to capture biomedical term semantics, an attention mechanism to understand visit relationships, and synthesizes digital twins for each patient. SeqTrial generates utility-preserving digital twins capable of estimating clinical outcomes, while addressing data scarcity through self-supervised pretraining. The method demonstrates high fidelity and utility in generating synthetic sequential clinical trial data for patient outcome prediction while ensuring privacy protection. The code is available at.