Longitudinal Synthetic Data Generation by Artificial Intelligence to Accelerate Clinical and Translational Research in Breast Cancer

Industrial

This study investigates the use of AI-generated synthetic data (SD) to overcome challenges in breast cancer research, such as privacy concerns, incomplete data, and fragmentation. By leveraging advanced generative models like Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Language Models (LMs), synthetic longitudinal datasets were created to replicate disease progression, treatment patterns, and clinical outcomes. These datasets maintain high fidelity (score 0.94), ensuring privacy protection while improving predictive modeling and supporting clinical trial designs.

The study evaluated synthetic data across three key areas: integrating privacy-preserving longitudinal synthetic datasets with platforms like i2b2, improving translational research by enhancing multi-state disease progression models to predict changes in disease states, and accelerating clinical research by generating synthetic control arms for clinical trials. The results demonstrated how synthetic data improved predictive performance, supported clinical trial designs by producing valid control groups, and enhanced the modeling of disease progression. This AI-driven approach significantly improves data accessibility, ensures privacy protection, and provides scalable solutions, ultimately advancing breast cancer research and clinical studies.

#ArtificialIntelligence #BreastCancer #DataAugmentation #GANs #LLMs #LongitudinalData #MachineLearning #PrivacyProtection #SyntheticData #VAEs