Enhancing Data Simulation & Augmentation With AI
When recruiting patients and running a clinical trial, it’s ideal to enroll patients that represent the broader population to the closest extent possible. But this is not always feasible. Recruiting any patient to a clinical trial is challenging—achieving diversity and assembling a patient pool that accurately reflects real-world demographics poses additional hurdles. Underrepresented demographics in clinical trials has been an ongoing problem (Tufts University, 2022). Without having an adequately diverse group of patients, organizations may be less likely to get their drug(s) approved, and the overall drug development process becomes slower or stops.
Synthetic clinical trial data, created using generative AI (gen AI), is a potential solution to some of the common problems in clinical trials. Companies in the life sciences industry can leverage gen AI and synthetic data to augment their existing datasets, letting them:
- Upsample underrepresented groups within a clinical trial sample with synthetic data, helping to make the dataset more similar to that of the overall population—making findings more generalizable.
- Create faster, more successful clinical trial programs, saving the sponsor money and reducing burden on both patients and sites.
- Model multiple different trial scenarios and designs in order to determine the most effective way to run their clinical trial program.
Medidata Simulants is a fit-for-purpose, generative AI algorithm that can be used to generate synthetic data for clinical development, enabling sponsors to gain information about the effects of their drugs while protecting patient and sponsor privacy.
High-fidelity synthetic data has the ability to augment existing datasets in indications where medical need remains unmet, and current real-world data and literature do not provide adequate, in-depth information. Companies may leverage these datasets to proactively identify which patients are likely to experience adverse events, or other negative outcomes, and can mitigate accordingly.
Not only does generative AI have the ability to augment patient-level data, but it can also help sponsors gather valuable insights from their own data, as opposed to spending unnecessary time and money trying to accurately identify which sites to use and which patients to enroll. Instead, they can use generated synthetic data/augmented existing datasets for insights on site performance, patient demographics, and other relevant factors.
There has already been early success in implementing gen AI in clinical trials for data augmentation purposes, highlighting the potential for what is yet to come (NIH, 2024).