How New Sequencing Technology Promises to Alter the Diagnostic Odyssey
October 28, 2022
Just three years after Children’s Mercy Research Institute launched its Genomic Answers for Kids program, it reported that it had hit the milestone of providing 1,000 rare disease diagnoses to families. One reason for the success of the GA4K program has been the use of advanced genomic sequencing that captures the full genome and methylome to reveal part of the human genome that has never been clinically tested to interpret changes beyond the genetic code. We spoke to Tomi Pastinen, director of the Genomic Medicine Center at Children’s Mercy Kansas City, about the GA4K program, how new sequencing technology is allowing it to diagnose rare disease patients who previously were undiagnosable, and how it has the potential to alter the diagnostic odyssey for patients with rare, genetic diseases.
Daniel Levine: Tomi, thanks for joining us.
Tomi Pastinen: Thanks, Dan for inviting me, and happy to be here and talk about our science.
Daniel Levine: We’re going to talk about children, Mercy’s genomic answers for kids or GA4K program, it’s use of cutting edge sequencing technology, and what the results have been so far. We’ve discussed the GA4K program on the show previously with your colleague Tom Curran, but for listeners who may not be familiar with it, can you explain what it is and how it works?
Tomi Pastinen: Genomic Answers for Kids was launched in 2019, almost exactly three years ago as the first of its kind pediatric data repository for children’s rare disease. And our goal over several years is to collect 30,000 children and their family members to build a catalog of nearly a hundred thousand genome datasets to understand and diagnose rare disease better.
Daniel Levine: Well, what have the results been so far? How many rare disease patients have you been able to diagnose with the technology?
Tomi Pastinen: So, currently in our three year anniversary of the program, we’ve reached over 10,000 individuals participating into the study of which 5,112 are patients from 4,230 families. Among those families, we’ve already done 1145 diagnoses. And it’s remarkable because we’ve only analyzed in completeness about 2,300 genomes. So, the diagnostic rate currently is over 40 percent. And we are quickly approaching the remaining of the analysis with several sets of tools that we’ve developed here in Kansas City.
Daniel Levine: One of the frustrations for people suspected of having a rare genetic disease who get whole genome sequencing performed is that it often doesn’t provide the diagnosis that they were expecting. How often does it result in a diagnosis and why is it that many people are still left without a name to put to their condition?
Tomi Pastinen: That’s a great question. The diagnostic rate in our own institution at the initiation of our program was just below 30 percent. Currently, like I said, we are beyond 40 percent, but we believe that with some of the new technologies, like 5-base sequencing that we recently launched, we will be getting over the 50 percent mark so that most of the kids that enter sequencing in our program with a suspected genetic diagnosis will actually get the answer. That would be a major milestone for rare disease. The reasons why it’s still not going to be a hundred percent are multiple. One of the greatest reasons that we don’t reach 100 percent in the immediate near future is that we still don’t know many thousands of rare diseases that are out there, and for discovery of these new diseases, we collaborate widely with other researchers and other institutions to combine resources of their genomic sequencing with our genomic sequencing and technologies to define new rare diseases. So, the lack of a full catalog of rare disease knowledge is one of the reasons. The second reason is that there might be diseases out there that look like genetic diseases, are severe early onset diseases where the physicians suspect the genetic disease, but there might be an unknown environmental course. These are very difficult to study when the primary assumption is that it would be a genetic course. So, I think by the time we are good enough to say that a genome is negative, we can start to study other mechanisms of early onset severe pediatric disease that is rare.
Daniel Levine: In October 2020, you began a collaboration with Pacific Biosciences to bring long read, or so-called HiFi sequencing into GA4K. How does HiFi sequencing differ from short read whole genome sequencing?
Tomi Pastinen: So, HiFi sequencing has several distinctions to short read sequencing. The first and most prominent and most discussed feature of HiFi sequencing is that it produces read lengths of individual DNA sequences of over 10,000 base pairs in length when typical short rate sequencing for whole genome sequencing produces DNA molecules of 100 to 200 base pairs in length. So, there’s a 100-fold difference in the length of the individual DNA pieces that one can look at in a single sequence. And what that allows you to do is to see very complex variation and especially what we call structural variation in the genome. So, bigger changes than single nucleotide mutations; for example, insertions of new pieces of DNA multiple base pairs in length, or deletions, removal of DNA pieces of hundreds and thousands of kilo bases in length. So, it allows us to see all types of variation at a resolution that is not possible with short read. The second feature of long read sequencing, or HiFi sequencing specifically, is that the reads are very accurate. The long molecules are actually sequenced multiple times over to read high accuracy in each read. And this allows us to do various different things including building personal assemblies or genomes from scratch putting these long DNA molecules, one after each other, and building what we call a “reference free personal genome” from this long read. And the last part that is unique to HiFi genome sequencing over short read sequencing is that the DNA molecules themselves are unaltered long pieces of DNA directly from patient cells. And the feature of single molecule sequencing allows you to see how fast the DNA is being read through this long molecule and through the speed of reading through the molecule, we can call it “additional information from the DNA,” which is the additional information that the PacBio sequencing system currently produces automatically, the 5-base readout, which is, includes not only As, Cs, Ds, and Gs, but also methyl C or methylation of DNA. So, this 5-base information is a completely novel feature that became available only this year.
Daniel Levine: If you’re able to get a sense of methylation, I take it this is giving you an epigenetic view where you can see how molecules outside of the DNA may be either suppressing or inactivating a gene. How might this accelerate the search for answers?
Tomi Pastinen: That’s a great question. There’s two features of DNA methylation in all of our genome. One of them is exactly what you say. Environment can influence how our genome is methylated, how the cytosine base in each of our cells and tissues is getting methylated. So, that’s completely independent of the inheritance of genetic variation from our parents, but actually about half of the variation, and my lab has studied this before, is the methylation variation in our genomes is actually inherited from parents. And that’s the component that we are very excited about, because now we can suddenly interpret a genetic variation through the lens of DNA methylation variation. And why is this important? It is because DNA methylation variation tells us about genome function outside the coding region. So methylation of DNA is highly linked to gene regulation, and if we see, for example, rare genetic variation that leads to rare methylation variation, we can ask the question, could this potentially inactivate a gene in and lead to disease.
Daniel Levine: There’s a gap in our knowledge of genes that drive rare disease. What portion of the world is thought to be undetectable by short read whole genome sequencing? And to what extent do you think HiFi sequencing can provide answers?
Tomi Pastinen: So, among the about 4,400 well-characterized disease genes in the genome, 400 are difficult to some extent to sequence by short read genome sequencing. There are pieces of those genes that look very much like some other piece in the genome that can confuse short read genome sequencing. When you do this with long reads, you’re actually able to distinguish easily that this longer piece of DNA is unique and stems from that disease gene. So, this 5e to 10 percent of disease genes that are in duplicated and difficult to sequence regions of the genome benefits directly from the better nucleotide resolution in this duplicated region of the human genome. The second part where long reads help is, what I already earlier mentioned, structural variation. In many cases, we have, for example, a recessive disease where one allele is inherited from one parent, but then the second allele that we expect to be inherited from the other parent is missing in the short read sequencing data. In many cases, that second hit on the gene is actually a structural variant that is that is not seen by short read sequencing, but can be detected by long rate sequencing. So, those are two of the key mechanisms in which we can assign more a nuclear type based diagnosis using long rate sequencing. But the third aspect is related to this methylation detection, which opens up vast areas of the genome for inspection. As I said before, methylation variation can be linked to gene regulation variation, which allows us now to interpret regions in the noncoding DNA. Ninety-eight percent of human genome is noncoding, and that 98 percent of the genome is not currently interpreted in clinical tests.
Daniel Levine: Based on the success of the early collaboration with Pacific Biosciences, you’ve expanded on it. What have you done to date and what’s the ultimate goal of the collaboration? You mentioned 30,000 genomes. Is that an increase from what you previously expected to sequence?
Tomi Pastinen: Well, I think the scope of the collaboration with long rate genome sequence of HiFi genome sequencing, 5-base sequencing with Pacific Biosciences, is really to define the best algorithm when to use long rate sequencing. Long rate sequencing continues to be more expensive than short rate genome sequencing. So, we want to find the optimal approach to deploy long read sequencing as early as possible in finding answers for a family, but doing it efficiently, because not every patient needs long read sequencing, we can find most answers still by short read sequencing. So, one of the current emphasis areas is really to try to find the best algorithm to define which patients would benefit from long read sequencing and do that as early as possible. The second part of the current extension of our collaboration with Pacific Biosciences relates to trying to do a single ensemble test, a single test to detect all genetic variation in the human genome diagnostically. Currently, even when you do short read sequencing, you typically couple that with other technologies to find other types of variation. In principle, the long read sequence should be able to yield all types of genetic variants that can be diagnosed today in a single test and we are in the process of getting that single genomic test validated clinically. I think that would be a major milestone for long read sequencing and for rare disease investigation in general, to have everything packaged in a single test.
Daniel Levine: You mentioned that one of the things you’re trying to do is determine who might be most appropriate for this type of long read sequencing. Is it something other than someone suspected of a genetic disease who went through whole genome sequencing and didn’t get an answer?
Tomi Pastinen: That’s exactly how we are currently prioritizing patients in our own project, or receiving patients from elsewhere who had gone through the current clinical and research sequencing by short reads, and the suspicion of the physicians and the research teams is still high that they missed something. Those are the samples that are currently streamlined into HiFi genome sequencing, which is the long genome of sequencing, in our center. We do believe that we could potentially identify types of rare diseases, for example early onset hypotonia, what sometimes is called floppy babies, which is really a neurological dysfunction in newborns. In many of those cases, the genes underlying the hypotonia is actually very resistant to be diagnosed by short read genome sequencing and requires different tests today. So, we are trying to find these areas, indications among patients who would benefit from primary HiFi genome sequencing, but most cases, like you said, are channeled to us after everything else has failed.
Daniel Levine: You mentioned that HiFi sequencing is more expensive than whole genome sequencing. We’ve seen a big drop in the cost of sequencing. Is HiFi sequencing enjoying the same type of price drop, Moore’s Law, in genetic sequencing, or is it something that’s significantly more expensive and something that’s going to remain more expensive?
Tomi Pastinen: Well, I do believe that our work and work by many other groups that have in the recent years started to use long read sequencing—the user pool has significantly increased in the past few years. And there are not only Pacific Biosciences, but there’s also a company called Oxford Nanopore, which are competing. So, this competition, I believe, will lead into a lower cost sequencing, perhaps not as quickly as in the early years of short sequencing, but we have high hopes that the price gap between long and short read sequencing will become smaller over time and maybe even in the near future. That will slightly change the landscape then on when to deploy long read sequencing as a primary test rather than the traditional short read sequencing.
Daniel Levine: There are many institutions that will sequence a patient and not be able to come up with an answer. Are you collaborating with any other institutions that have undiagnosed patients?
Tomi Pastinen: We do collaborate in many different ways with other institutions and with other rare disease programs. One way is to accept samples into our HiFi genome sequencing pipeline from centers that have lower access to this technology. For example, we have for year and a half now worked with the NYU’s Undiagnosed Disease Project and they have been sending us families that have gone through extensive molecular analysis in New York without finding the answer. And, we have solved cases for them using HiFi genome sequencing. Other ways of collaborating with our data is we’ve built this large repository, actually world’s largest disease oriented long read sequencing database, which currently includes 1072 human genomes. And with that large compendium of long read genome sequences from different individuals and different rare disease patients, we are able to provide other groups that are independently pursuing HiFi genome sequencing in their rare disease programs a background database where they can compare their own sequences to our large data repository. This is a way we collaborate, for example, with University of Utah as well as the Canadian Care for Rare program run from Toronto and Ottawa.
Daniel Levine: If there are patients who are undiagnosed, are there are ways they can go about getting access to this technology? Can they participate in a study you’re doing if they were interested?
Tomi Pastinen: Yes. We have accepted patients, as I said, from this institutional level collaboration. But we’ve also accepted patients from individual physicians and physician scientists and researchers that have compelling cases where they feel that long read sequencing might provide the answer. So, we do receive individual samples from various different locations in the U.S. and Canada currently.
Daniel Levine: Ultimately, how do you think this technology will change the diagnostic odyssey, and what will it take to make it more available to patients suspected of having a genetic disease more broadly?
Tomi Pastinen: I think starting from the last point of how to get this to a greater number of patients, I think we already experience with various genetic tests in children with rare disease suboptimal reimbursement by insurers. So either Medicaid or private insurers may decline genetic tests. And we see that in our own program. We have a number of families that have entered the program where their insurance carrier declined coverage of even clinical genetic testing, and we can still find a diagnosis in these families. So, I think accelerating the cycle of bringing these new technologies into the reimbursement landscape for molecular tests is key. We do want to provide sufficient scientific evidence and follow up evidence so that would help the insurers to accelerate their plans for reimbursement for this modern molecular test. I think beyond that reimbursement barrier, the data that we are producing now from the HiFi genome sequences, we are now at over a thousand sets of human genome sequence data. But we probably have to get into the area of ten to even hundreds of thousands of patients with 5-base sequencing to really capture all types of new variants that can be seen by these new technologies. Short read sequencing has benefited from over 10 years of community efforts in building these reference databases. We’re only in the start with this long read genome sequencing and its full benefit comes into life when multiple groups start to use it, and like us, shares the data with the community to build that large resource to compare every new patient sequence and to extract all the benefits from the long reads and the 5-base sequence.
Daniel Levine: Tomi Pastinen, director of the Center for Pediatric Genomic Medicine at Children’s Mercy. Tomi, thanks so much for your time today.
Tomi Pastinen: Thanks for having me.
This transcript has been edited for clarity and readability.
Sign up for updates straight to your inbox.