Skip to main content

Mount Sinai: New AI Tool Addresses Accuracy and Fairness in Data to Improve Health Algorithm

September 9, 2025
Image: [image credit]
Photo 130409802 | Ai © Funtap P | Dreamstime.com

Jason Free, Senior Managing Editor

Artificial intelligence in healthcare has long carried a promise: faster diagnoses, predictive risk scores, and precision interventions, all at scale. But recent years have exposed a caveat baked into that optimism: if the training data isn’t representative, the output can’t be trusted. The result is a clinical inequity.

Researchers at the Icahn School of Medicine at Mount Sinai are attempting to move this conversation from problem statement to engineered solution. With the development of AEquity, a software tool designed to detect and correct bias in health dataset, Mount Sinai is signaling that fairness cannot be an afterthought in algorithm development. It must be built in from the start.

Published in the Journal of Medical Internet Research, the study outlines how AEquity works across multiple types of data inputs, from medical images to lab records, and can be applied to a wide range of machine-learning models, including those underlying generative AI and large language models. But the real innovation may not be technical. It is conceptual: bias is an implementation risk.

Audit, Detect, Adjust: Making Fairness Programmable

AEquity was developed to interrogate training data before models are deployed, identifying underrepresentation or misrepresentation of specific demographic groups. In trials, it uncovered both expected and previously undetected forms of bias within datasets ranging from public health surveys to clinical imaging banks.

The tool’s architecture allows developers to examine model inputs and outputs, flagging disparities in data distribution, prediction accuracy, and subgroup learnability. In short, it lets researchers test whether the model “knows less” about certain populations, and offers pathways to correct for those gaps before clinical harm occurs.

According to lead author Faris Gulamali, MD, the intent was to create something useful at the point of development, not just for post-hoc academic critique. “We want to help ensure these tools work well for everyone,” Gulamali stated, “not just the groups most represented in the data.”

This distinction matters. Algorithmic fairness tools are often theoretical or applied retroactively. By contrast, AEquity is intended as a step in the standard pipeline, functionally no different than checking for model drift or overfitting.

The Feedback Loop of Digital Inequity

The stakes are not academic. AI systems already influence clinical pathways, prior authorization decisions, and diagnosis workflows. A 2024 study in Health Affairs found that risk-scoring algorithms used by payers and health systems routinely underestimate the needs of Black patients due to proxy data like healthcare spending rather than clinical severity.

Left unaddressed, these feedback loops reinforce disparities. Patients who are poorly represented in training data may receive lower risk scores, leading to less intensive care management, which in turn suppresses their recorded utilization data, further skewing future predictions.

AEquity’s intervention point, early in development, before models are locked and scaled, offers a practical method to prevent these cycles. But it also raises the bar for developers. If tools now exist to audit bias before deployment, claiming ignorance is no longer a defensible position.

Fairness Requires More Than Code

While AEquity represents technical progress, Mount Sinai leaders are careful not to frame it as a silver bullet. Girish N. Nadkarni, MD, MPH, Chief AI Officer of Mount Sinai Health System, emphasized that tools like AEquity must be paired with cultural and procedural shifts around how data is collected, labeled, and governed.

“Technical fairness is necessary, but not sufficient,” Nadkarni said. “The foundation matters, and it starts with the data.”

This insight echoes growing sentiment among AI ethicists: that bias is not simply a quirk of algorithms, but a byproduct of systemic design decisions. If marginalized populations are underdiagnosed, misdiagnosed, or underrepresented in medical literature, that pattern will replicate in training data, regardless of how advanced the model architecture may be.

A recent commentary in JAMA Network Open highlighted this point, noting that efforts to “de-bias” algorithms must also confront the legacy data environments from which they’re built. Otherwise, AI will simply inherit and automate historical inequities with greater speed.

A Tool for Regulators, Not Just Developers

One notable aspect of AEquity is its intended audience. Beyond engineers and data scientists, the tool is positioned as valuable for regulators, auditors, and institutional ethics boards. As frameworks like the FDA’s AI/ML Software as a Medical Device (SaMD) evolve, real-time auditability is becoming an operational requirement, not just a compliance formality.

By enabling transparent evaluation of how models behave across subgroups, AEquity could be used to validate models submitted for regulatory clearance or federal funding. In theory, tools like this may eventually be required as part of premarket submissions or reimbursement pathways tied to value-based care.

For health system CIOs and compliance officers, this reframes fairness auditing from a theoretical concern to a strategic need. Algorithm procurement, implementation, and ongoing monitoring must now include documentation of bias mitigation measures, and the ability to defend those measures under scrutiny.

Equity, Re-Engineered

What Mount Sinai is offering with AEquity is not a final answer but a framework shift. Rather than asking whether AI is fair, the question becomes: How was fairness designed, tested, and verified? That evolution reflects a larger industry maturity. The debate is about who is responsible for correcting it, and when.

For digital health leaders, this means the burden has shifted. It is no longer sufficient to rely on vendor claims or hope that broader datasets will dilute systemic flaws. Bias detection is now a task that must be resourced, tracked, and continuously improved.

The long-term value of AI in health care will be judged not only by its speed or scale, but by its capacity to serve all patients with equal accuracy. That starts by recognizing that data bias is a design flaw. And like any flaw, it can only be corrected when acknowledged and addressed directly.