Skip to main content

AI-Enhanced Chart Review Is an Urgent Redesign of Clinical Work

June 9, 2025
Image: [image credit]
Photo 698703 | Healthcare © Amy Walters | Dreamstime.com

Victoria Morain, Contributing Editor

Stanford’s ChatEHR pilot is more than a clever proof of concept. It represents a necessary confrontation with the cognitive overload and administrative drag embedded in modern EHR systems. Rather than tacking on another layer of vendor gloss, Stanford has begun reengineering the clinical experience by embedding large language model capabilities directly within existing workflows. The result is not just faster access to patient data but a systemic reorientation of how clinicians prepare, reason, and act.

The current model of chart navigation treats clinicians as data archaeologists. Faced with fragmented, note-heavy interfaces, they must excavate care histories through dozens of clicks and scrolls. Encounter prep often consumes thirty minutes or more. The promise of ChatEHR is not novelty but subtraction—stripping away the barriers to information access and collapsing routine tasks into seconds. By enabling clinicians to ask specific, natural-language questions like “Does this patient have any contraindications to heparin?” or “Summarize this patient’s last hospital admission,” Stanford’s tool refocuses clinical attention where it belongs.

Crucially, ChatEHR is integrated with the Epic backend, not layered on top of it. This is what distinguishes it from generic AI chatbots that sit outside the record and hallucinate from detached data. Stanford’s model is trained on actual EHR content, delivering answers with citations to source fields. The ambition here is not just speed, but trust. And that is the design philosophy other academic and enterprise health systems need to observe closely.

Stanford’s initial deployment in the emergency department tells a larger story. In high-intensity environments, clinicians face decision fatigue and fragmented context. Here, ChatEHR is not a novelty; it is a necessity. Its ability to retrieve medication histories, locate prior imaging findings, and summarize hospital stays in real time has already proven its value in acute care triage. The implications are broader than documentation support. This is about how we structure cognition under pressure.

The project is also pushing beyond passive assistance. Stanford is now piloting automation use cases—having the AI screen patients for hospice eligibility, flag them for transfer readiness, and alert physicians to missed advance care planning documentation. These are not just workflow improvements. They hint at a near future in which clinical reasoning is distributed across AI agents that proactively monitor record patterns and prompt human action. In this model, the AI becomes a participant in care navigation, not just a query tool.

That shift raises critical governance questions. Stanford has wisely partnered with MedPerf and the MedAL platform to evaluate ChatEHR using MedPerf’s MedHELM framework, an open-source benchmarking protocol for healthcare LLMs. This initiative represents an emerging best practice for validating AI in high-risk settings. But it is only the beginning. As tools like ChatEHR move from pilot to platform, institutions will need persistent governance structures to manage drift, retraining, and domain-specific safety thresholds. One-off validation will not suffice.

Policy frameworks are trailing this reality. The FDA’s current approach to clinical decision support software does not yet address the specific characteristics of generative models, especially those embedded inside EHRs and used in real-time decision pathways. ONC’s recent guidance on AI transparency in its HTI-1 rule offers early traction, but no cohesive regulatory apparatus exists to monitor AI-generated summaries, eligibility suggestions, or predictive flags inside clinical records. That leaves institutions responsible not just for implementation, but for the legal and ethical design of entire AI ecosystems.

The Stanford effort sends a signal to CIOs, CMIOs, and vendor leadership alike. Generative AI is no longer an R&D sideshow. It is becoming clinical infrastructure. The challenge ahead is not just deploying powerful tools, but embedding them in ways that respect clinician agency, promote traceability, and align with safety-first cultures.

Health systems that want to follow Stanford’s lead should start by focusing on the use cases that cause the most pain: chart review, care transitions, eligibility documentation, referral prep, and consult triage. These are the frontiers where AI has the clearest path to impact and the lowest risk of misalignment. But none of it works without infrastructure. That includes secure model access, data provenance, real-time audit trails, and clinician training. And it means treating AI deployments not as IT projects, but as clinical transformation campaigns.

Stanford has not solved the EHR problem. But it has reframed it. Instead of chasing marginal efficiency gains through new forms and dashboards, it has asked a deeper question: what if the EHR could speak in plain language? What if it could answer, not just store? That is the future being prototyped now. Health systems that treat this as a curiosity will fall behind. Those that treat it as a foundational redesign of clinical cognition may finally deliver on the long-promised digital dividend.