Rare Daily Staff
Long-read sequencing has the potential to improve the rate of rare disease diagnosis while reducing the time to diagnosis from years to days through a single test and at a much lower cost, researchers at the University of California, Santa Cruz report in a new study.
The study, which focused on rare monogenic diseases, was published in The American Journal of Human Genetics and led by core members of the UCSC Genomics Institute.
Scientists diagnose genetic diseases by searching through a person’s genetic material to find differences in a gene that may prevent it from functioning properly. The typical approach for finding these so-called variants uses short-read sequencing, a technique that reads the genetic base pairs in sequences of about 150-250 at a time.
The problem is that short-read sequencing can miss crucial information in certain regions of the genome, like patterns of base pairs that are much longer than just 250 base pairs. It also can’t determine which variants are inherited from the mother and which are from the father. This can be a very useful piece of information for genetic diagnoses, especially when parental data is not available.
In contrast, long-read sequencing can read lengthy stretches of DNA at once, eliminating gaps that may lead scientists and clinicians to miss important information about gene variation. Long-read sequencing also provides data about which parent a variant is inherited from, as well as information about methylation, a chemical process in DNA that causes genes to be “turned on or off,” and can contribute to disease.
“Today, the diagnostic yield of genetic sequencing is frustratingly low. One likely cause is the incomplete sequencing methods used in clinical practice,” said UCSC Genomics Institute Professor of Biomolecular Engineering Benedict Paten. “In this work, we test the hypothesis that new, more comprehensive long-read sequencing can generate additional information useful for genetic diagnosis.”
Paten said the use of long-read sequencing allowed the researchers to discover numerous additional potentially interesting genetic variants and epigenetic signals in their cohort. While there is great promise in this information, he said it was still early days and it will take time for the community to interpret and fully understand much of this new information.
The UC Santa Cruz team partnered with clinicians to work on the cases of 42 patients with rare diseases. Some of these people received a diagnosis through the use of short-read methods or other specialized testing, and some were still undiagnosed. In some cases, the researchers had access to parental genetic information, but in others, they did not.
After sequencing and analyzing the patient data, the researchers found that long-reads provided a more exhaustive dataset as compared to what can be derived with short-read sequencing.
Long-read sequencing delivered conclusive diagnosis for 11 of the 42 patients in the cohort, providing everything that was known from the short-read data as well as additional information, including additional rare candidate variants, long-range phasing, and methylation — all in a single, cost-efficient, and rapid protocol.
On average, each patient had 280 genes (including some Mendelian disease genes, which are linked to inherited disorders caused by single-gene mutations) with significant protein-coding regions uniquely covered by long reads and undetected by short reads.
The diagnosed cased include four cased of congenital adrenal hypoplasia, a rare condition where the adrenal glands are enlarged and fail to function properly. The gene responsible for this disease is in a particularly challenging region of the genome—it can’t be characterized with short read sequencing technology, and they said the current clinical test is cumbersome and incomplete.
To solve the cases, the researchers developed a new pangenomic tool that integrates high-quality assemblies like the ‘telomere-to-telomere’ reference genome. They said many rare diseases involve regions of the human genome that have been historically difficult to study and noted their results encourage them to extend their approach to more of those diseases that have been at a standstill for a long time.
Shloka Negi, a UC Santa Cruz biomedical engineering doctoral student who is the paper’s first author, said long-read sequencing can serve as a single diagnostic test, reducing the need for multiple clinical visits and transforming a years-long diagnostic journey into a matter of hours.
“There’s so much more of the genome that the long reads can unlock,” said Shloka Negi, a UC Santa Cruz biomedical engineering doctoral student who is the paper’s first author. “But, it will take some time until we can fully interpret this new information revealed by long reads. This data has been absent from our clinical databases, which were built using short-read analysis and mapping to the standard reference. We showed that long reads are uncovering about 5.8 percent more of the telomere-to-telomere genome that short reads simply couldn’t access.”
Photo: Shloka Negi, a UC Santa Cruz biomedical engineering doctoral student and the paper’s first author

Stay Connected
Sign up for updates straight to your inbox.
