Skip to main content
Learn more about advertising with us.
Image: [image credit]
Photo 130409802 | Ai © Funtap P | Dreamstime.com

GE HealthCare to Lead Consortium on Synthetic Data Generation for AI in Healthcare

GE HealthCare will take a leadership role in Synthia, a consortium project to evaluate synthetic data generation methods, both for the creation of synthetic datasets and for their use in the development of artificial intelligence (AI) algorithms. GE Healthcare will join other healthcare industry organizations including Gates Ventures, NovoNordisk, and Pfizer and we well as academic partners including La Fe University, Fraunhofer Institute, and the University of Bologna as part of an effort that will focus on building new synthetic or synthesized datasets to be used to train AI algorithms.

Data is the cornerstone of AI product development from development to deployment. Synthetic data, artificially generated to replicate real patient data, could serve as a valuable alternative to overcome challenges such as the scarcity of real datasets, biased or non-generalizable training data, and privacy concerns. However, the use of synthetic data also raises questions about the reliability of data generation tools and the quality of these datasets.

The Synthia project aims to evaluate and deliver proven methods, standards and frameworks to build reliable tools for synthetic data generation and their use in the development, training and validation of AI algorithms. It brings together expertise from healthcare providers, academics and industry to address the challenges of algorithm development with synthetic data along legal, ethical and regulatory considerations, while exploring methods to increase the availability of high-quality training datasets.

The goal is for data generation workflows, assessment frameworks to help evaluate privacy, quality and applicability to the generated database will be made available to the research community via a platform. The platform will be intended to host a repository of high-quality synthetic data sets, each of which will be labelled with its suitability for specific applications.

The tools developed through Synthia will cover multiple data types including lab results, clinical notes, genomics, imaging and mobile health data and allow the generation of longitudinal data. The project will address six diseases to assess the usefulness of synthetic data in oncology (lung cancer, breast cancer), hematology (multiple myeloma and diffuse large B-cell lymphoma blood cancers), neurology (Alzheimer’s disease) and metabolic health (type 2 diabetes).

Ultimately, the platform could help to build trust among stakeholders regarding the usefulness of synthetic data and facilitate the responsible use of synthetic data by the health research community.