Cedars-Sinai Turns to Synthetic Data to Scale Research and Protect Privacy

Cedars-Sinai is expanding its artificial intelligence and machine learning initiatives by adopting synthetic data as a core component of its digital infrastructure strategy. Through a new partnership with Syntho, the Amsterdam-based developer of privacy-enhancing synthetic data tools, the health system aims to accelerate research, reduce privacy risk, and enable broader collaboration across clinical and academic teams.
The decision reflects a growing trend in healthcare data science. As research demands grow more complex, and privacy regulations tighten, organizations are re-evaluating how patient data is accessed, protected, and shared. Synthetic data offers a potential solution by mimicking the statistical properties of real clinical records without exposing identifiable patient information.
Cedars-Sinai’s move signals more than an experiment. It positions synthetic data as a foundational asset in the organization’s broader effort to build a connected data ecosystem capable of supporting precision medicine, digital research environments, and scalable innovation.
Speed Without Sacrificing Privacy
Unlike de-identified data, which retains some risk of re-identification, synthetic data is artificially generated. It preserves patterns and relationships within a dataset but contains no original patient information. This distinction carries operational advantages.
Because synthetic data is not tied to real individuals, it does not trigger the same legal or regulatory restrictions around use. Internal review board approvals, data use agreements, and cross-institutional access barriers are significantly reduced or eliminated. That translates into faster study launches, more rapid iteration cycles, and easier collaboration between clinical, academic, and commercial stakeholders.
Cedars-Sinai leaders say this agility is key to supporting both discovery and delivery. Synthetic datasets can be generated within hours and used to simulate real-world scenarios, whether testing new clinical decision tools, refining trial protocols, or educating students and trainees through the Cedars-Sinai Health Sciences University.
Embedding Synthetic Data in Institutional Infrastructure
The partnership with Syntho is only one part of Cedars-Sinai’s synthetic data strategy. The organization is also integrating synthetic datasets into its recently launched Digital Innovation Platform, a comprehensive suite of tools designed to address systemic healthcare challenges through cross-functional collaboration.
The initiative includes participation from staff, external investors, and Redesign Health, a venture-building firm focused on healthcare startups. Together, the group aims to leverage synthetic data to develop new solutions in diagnostics, care coordination, and population health, while reducing the dependency on real patient data for every phase of product development.
This approach opens the door to faster prototyping, broader stakeholder involvement, and increased protection of patient privacy. It also helps Cedars-Sinai standardize innovation workflows without repeatedly navigating the complex regulatory terrain of real-world data access.
Education, Equity, and Early Testing
Beyond operational acceleration, Cedars-Sinai is positioning synthetic data as a democratizing tool. According to the leadership team overseeing computational biomedicine, synthetic data lowers barriers to entry in research. Investigators can test hypotheses without waiting for access to real datasets, and early-career researchers can gain hands-on experience using high-fidelity data simulations.
This accessibility could expand the number and diversity of voices contributing to research and innovation. It also aligns with broader institutional goals around inclusion and equity in medical education and data science.
However, synthetic data is not without limitations. It may not perform well with all data types, particularly genomic or other highly discrete variables, and must be carefully validated to avoid overgeneralization or bias amplification. Cedars-Sinai leadership emphasizes the importance of training users on where synthetic data adds value and where real data is still required.
Balancing Innovation With Guardrails
The adoption of synthetic data also raises important questions around governance. As synthetic data becomes easier to generate and share, institutions must develop frameworks to ensure responsible use. That includes clear policies on data labeling, performance validation, and communication of synthetic data limitations to users and collaborators.
Synthetic datasets cannot be treated as interchangeable with real clinical data. Their outputs must be tested in real-world settings, and the boundaries of their utility clearly marked. For clinical trials, regulatory submissions, or safety-critical applications, synthetic data may support the exploratory phase but must ultimately be supplemented with verified patient data.
Still, the appeal is clear. As privacy concerns grow and data demands increase, synthetic data offers a middle path, enabling speed, flexibility, and protection in equal measure.
A Test Case for System-Level Data Strategy
Cedars-Sinai’s implementation of synthetic data tools is one of the clearest examples to date of a health system embedding the technology into enterprise-level strategy. It is not a side project or academic trial. It is part of the system’s effort to create a durable data infrastructure capable of supporting clinical, research, and commercial innovation simultaneously.
The outcome will likely influence how other health systems assess the value of synthetic data. If Cedars-Sinai can demonstrate faster research cycles, enhanced privacy protection, and more scalable innovation, all without compromising data quality or scientific validity, synthetic data will move from technical novelty to operational necessity.
The challenge now is to translate theoretical benefit into measurable performance. Institutions considering synthetic data adoption must track not only speed and access improvements, but also the downstream effects on research output, model performance, and patient safety.
In an era where data is both an asset and a liability, synthetic data may help health systems recalibrate. The goal is not to replace real data but to use it more strategically, protecting what matters most while expanding what is possible.