New Genomics Tool Accelerates Biomedical Breakthroughs
![Image: [image credit]](/wp-content/themes/yootheme/cache/62/xdreamstime_xxl_275512545-scaled-62965056.jpeg.pagespeed.ic.ZgQOBgzsW3.jpg)

A scientist at the University of Virginia School of Medicine has created a new tool that could significantly streamline genomic research and fast-track medical breakthroughs.
Nathan Sheffield, PhD, and his collaborators have spent four years developing a new data standard called refget Sequence Collections. It helps researchers ensure they’re comparing the same reference sequences when analyzing genomic data — a major challenge in the field.
Genomic analysis involves sifting through massive datasets to understand how our cells work and what goes wrong in disease. But researchers often use different naming systems for “reference sequences” — essential datasets that serve as the baseline for identifying genetic differences. These inconsistencies make it harder to replicate results and collaborate across studies.
Refget Sequence Collections solves this by making it easier to identify, compare, and track which reference sequences are being used. That means less guesswork, more automation, and better reproducibility in research.
Sheffield compares the problem to a classroom where every student has a different version of the same textbook — with pages out of order and chapter titles changed. “If students could identify each version and see how they differ, it’d be much easier to learn together,” he said. “That’s what this tool does for genomics.”
The new system builds on an earlier GA4GH project called refget, which assigned unique IDs to individual sequences. Sheffield’s update goes further, grouping sequences — like those making up an entire reference genome — under consistent, trackable names.
GA4GH (Global Alliance for Genomics and Health) is a nonprofit that promotes standards and data sharing in genomics. Refget Sequence Collections becomes the latest in over 40 tools developed through its network.
By reducing tedious manual work, the new standard gives scientists more time to focus on interpreting data and pushing discoveries forward.
“I hope this helps solve long-standing issues in integrating genomic and epigenomic data,” Sheffield said. “A shared standard for referencing datasets can help us learn faster from more experiments.”
Collaboration Behind the Tool
This work was led by Sheffield in collaboration with:
-
Timothé Cezard and Andy Yates at EMBL’s European Bioinformatics Institute
-
Sveinung Gundersen at ELIXIR Norway
-
Shakuntala Baichoo at Peter Munk Cardiac Centre-Artificial Intelligence
-
Rob Davies at Wellcome Sanger Institute
It was supported by Reggan Thomas and co-leads Oliver Hofmann (University of Melbourne) and Geraldine Van der Auwera (Seqera).
Sheffield holds positions in UVA’s School of Medicine (Departments of Genome Sciences, Biochemistry and Molecular Genetics), the School of Data Science, and the Department of Biomedical Engineering.