Powering a New Era of Genetic Medicine

April 29, 2022

Genomics England is working to embed genomics into healthcare, enable research, and improve the diagnosis and treatment of patients. In 2018, it completed enrollment of its first initiative—the 100,000 Genomes Project—and is working on a new initiatives to explore the benefits and challenges of sequencing and analyzing the genomes of newborns. We spoke to Ellen Thomas, clinical director and director of quality for Genomics England, about the outcomes from the 100,000 Genomes Project, its Newborn Genomes Programme, and the potential for genome sequencing to alter the diagnostic odyssey for people with rare disease


Daniel Levine: Ellen. Thanks for joining us.

Ellen Thomas: Thank you very much for the invitation.

Daniel Levine: We’re going to talk about the work of Genomics England, the 100,000 Genomes Project, and the chance to change the diagnosis and understanding of rare diseases with genomic sequencing. Perhaps we can begin with the 100,000 Genomes Project. Can you explain what that was?

Ellen Thomas: Yes, of course. The 100,000 Genomes Project was launched 10 years ago now. And it was in the context of a time when there had been an exponential expansion in our capacity to sequence DNA. What we’d seen with that expansion was that a lot of genomic diagnoses were being made for patients with rare diseases in the context of individual research projects. But there was a bit of a gap in terms of knowing how to use this technology in the context of a live healthcare setting. So, what the U.K. government did was set up the 100,000 Genomes Project, which was a hybrid project both to carry out research into rare diseases, but also to look at the implementation of how you use genomic technologies in the healthcare context. The project was strongly founded within the U.K.’s National Health Service and the goals were to investigate the real world application of using whole genome sequencing in rare disease patients. These were patients who had been through all the standard diagnostic testing that was available to them at that time and still didn’t have a diagnosis. The project did also have a second arm, which was looking at sequencing the genomes of tumors in cancer patients, but that’s clearly not the area that Rarecast is interested in, so we’ll be focusing and discussing the rare disease part of the project today.

Daniel Levine: In November of 2021, the 100,000 Genomes Project issued a preliminary report in the New England Journal of Medicine on the role of genome sequencing and patients with undiagnosed rare disease. This was a pilot study that involved 4,660 patients. Can you explain what you did in that group and who was included in this study?

Ellen Thomas: Yes, absolutely. This was a report of a cohort of early participants who joined right at the first stage of the project. Now, all these patients had samples taken for whole genome sequencing. They also had their healthcare data collected to contribute to the analysis, and results were produced from the analysis of their genome data and the health data, which were then returned to genetics laboratories in the NHS for diagnostic reporting back to those patients. The cohort included over 2000 rare disease patients and a number of their family members and this was a really powerful part of the project. It really helps when you are doing a genomic analysis to be able to compare the genome of a person who has a rare disorder with the genomes of their close relatives, both those who share the same disorder and those who don’t share the same disorder, and this is called segregation. So, having those family members in the project helped us very strongly with doing that segregation analysis.

Daniel Levine: And what did you find? How many patients were able to get a definitive diagnosis?

Ellen Thomas: Yeah, so the cohort really went through two phases of analysis. In the first phase of the analysis, there was a semi-automated pipeline, which took in the genomic data in the clinical data, and then compared those and returned some prioritized variants back to the NHS clinical scientists for review. And then there was a second round of analysis, which was much more manual and done by a team of researchers who really looked at the data from every possible angle to try to find out the full breadth of the sorts of diagnoses that you could mine from a whole genome sequence, but which were difficult at that stage to do in an automated kind of way. And some of those conditions were childhood onset conditions, and some were adult onset conditions. There are over 150 different conditions in the program overall and we found a diagnosis in 25 percent of the families that took part.

Daniel Levine: One thing I find surprises people in the rare disease community is when they learn that whole genome sequencing often fails to deliver a definitive answer. Why is that?

Ellen Thomas: Yes, it’s a very interesting question. And I think the first element to the answer to that question is really that it isn’t always possible to tell in advance of a genomic investigation whether a patient does have a genomic condition or not. There are some conditions that may be environmental in their origin, or may be caused by a more complex or less strong type of genomic predisposition. And for those sorts of conditions, there is no answer to find with a genome. In every cohort of patients who go through genome sequencing, there will be some patients where there isn’t a diagnosis to find. So, really with all genomic projects, we are not aiming for a hundred percent diagnostic yield because we know that doesn’t exist. There isn’t a hundred percent diagnostic yield. That said, though, there are a number of reasons why there may be a genome sequence, which is done in somebody who does have a genomic diagnosis, but we don’t immediately find that. And probably the biggest cause of that is due to genetic changes, which we can’t interpret. So, I’m sure many of your listeners will be familiar with the fact that all of us are very different from each other in terms of our genomic sequence. We have about 5 million places across our genomes, where we all differ from each other. And it’s really very difficult to tell the difference between which of those 5 million variants are just part of what makes us all different from each other, and which are the specific, rare, and important changes, which are having a really big impact on our health. So, understanding the genetic causes, which for example, don’t affect the parts of the genome that code for proteins, is really an area of research that is accelerating fast. But at the moment, there are many variants there that probably do cause people’s diseases, but we can’t at the moment be clear about or clear enough about to be able to use that information in their healthcare. There are also likely to be other sorts of variants that we could interpret, but at the moment we can’t detect technologically. So, the whole genome sequence can detect a large range of different types of changes in the genome, but there are always still some tricky regions left where those may be still difficult to mine using a genome. And then finally, there’s also probably a group of very rare disease genes that we haven’t discovered yet. And some of these may only affect a handful of families in the world. So, until we can really share that data about different families with different conditions all over, some of these disease genes may be very difficult to pick out and spot.

Daniel Levine: Within the realm of rare genetic diseases is one approach better at detecting some types of diseases than another? Is it just a matter of how well defined a specific condition is?

Ellen Thomas: Yes. In general, the more specific and well informed the questionnaire is that you are targeting at genome data, the greater the chance you have of success at answering the question. In my experience, if you carry out a very extensive genomic test without any sense of what your diagnostic hypothesis might be, without a sense of where you are looking in the genome, then there is quite a lot lower of a chance of success in terms of making a diagnosis compared with a situation where you are really very clear what you are looking for and where you might find it. And there are some conditions that have a very clear relationship between a very recognizable clinical presentation and variance in a specific set of genes, and you do get a higher diagnostic rate in those circumstances. For example, retinal disorders in this cohort had one of the highest diagnostic yields. And if you are an expert ophthalmologist, if you look at the back of somebody’s eye and you see the pattern of changes in their retina, you can often be sure that you are dealing with a genetic condition. You can get a readout just directly by looking at the back of the eye with that eye of experience. And there are some conditions where an ophthalmologist can even look at the back of the eye and tell you which gene you should be looking at because the pattern is so specific. in those circumstances, you are able to really be sure that you are looking in the right area for the right thing. There are other conditions that may be quite difficult to differentiate from environmental causes of the same condition. One example of that would be hearing loss, for example, where we know that congenital infection can cause hearing loss and so can a genetic condition. So, if you’re testing a mixture of patients with different causes, then your yield is likely to be lower because there is less there to find. There are other conditions where you get quite a big overlap with a more complex kind of genetic predisposition. If you take some of the disorders of development, for example, there are some of those that are likely to, in some cases, be more due to multiple genetic effects or each of which are much smaller in their effect, a sort of bad hand of genetic variants inherited in a bad combination from both parents, rather than a single gene cause of the condition. And at this time of genomic testing, we can detect those variants, but we can’t clearly understand the combined impact of those less unusual and less powerful variants on any one individual. In those circumstances, the output of a genome sequence is, at the moment, less helpful for individual patients. I guess there’s also some other complexities in this area. There are some specific genes that are much more complex than others. There are some genes that are more amenable to analysis than others, and some genes are much more variable than others. So, those genes are often quite difficult to interpret. You are definitely right that there is quite a wide range of outcomes when you target genome sequencing at different disorders and different conditions. And that’s one of the things that we’ve been really looking at and understanding and working out how we apply these technologies most effectively using the data from the 100,000 Genomes Project.

Daniel Levine: These patients were recruited between 2014 and 2016. Has the technology and our ability to interpret results improved since then? Are there new approaches, such as long read technology, improving the diagnostic rate?

Ellen Thomas: Yes. These patients were indeed recruited to the project in the early phase. The analysis of the data from these patients has really continued over a number of years with different approaches evolving over that time to add extra diagnoses. One example would be the triple repeat disorders. Your listeners may have heard of Huntington disease, which is a neurological condition caused by a sort of stutter in the DNA sequence where the same little DNA sequence repeats over and over again. And if that expands and causes that stutter to get bigger, then over the generations that can turn into a gene that then causes Huntington disease. And when we first started the 100,000 Genomes Project, we would’ve said categorically that a whole genome sequence was not going to be able to detect conditions like Huntington disease, which were caused by these triplet repeat disorders. But then during the course of the early project it became clear that groups around the world working with short read genome sequencing had developed a way of specifically targeting questions about triple repeat disorders to the genomes. So, we were then able to go back to the genomes of the patients in the pilot and carry out that triplet repeat analysis, and that yielded quite a few extra diagnoses for patients in that variant type. So, over time we have been using pipelines like the triplet repeat pipeline looking at more specific ways to detect much larger pieces of either extra or missing DNA known as copy number variants, and also using alternative prioritization approaches such as maximizer, really to go through this cohort of patients in a lot of detail and pick out as many diagnoses as possible. Another big factor with this is that gene discovery continues apace with each year many more disease genes being discovered so that, over time, definitely increases diagnostic yield. We do also have increasingly better ways to follow up on genomic results and work out whether a particular variant is a protein and its function and could therefore be causing disease. That includes tests like RNA-based tests, a transcriptome sequence, which tells us whether a change in the gene is leading to a change in the RNA sequence, which then goes on to determine the protein sequence. We also have the beginnings of artificial intelligence approaches to predict things like splicing variants, which are variants that change the way in which the DNA is then processed into a protein. As you say, there are new technologies such as long-read technologies coming online. It’s likely that long-read technology is going to offer quite a lot of advantages in terms of structural bigger variants, bigger changes in the genome, also knowing which variants sit on which copy of our genome. We all have two copies of all our genes and knowing which variants sit on which copy of our DNA is quite important in reporting disease causing variants. So, that’s also really helpful at the moment. I think long-read technology is in the phase where we can see those technological advances, but there’s a difference between knowing that you can use a technology to detect variants and being able to scale that and implement it robustly and cost effectively within routine healthcare. At the moment, our diagnostic genome sequencing in the U.K. is still based on the short-read technology, but there are a number of research projects going on that are investigating how we can convert the advantages of long-read sequencing into a robust diagnostic offering. And I guess the final thing in this area that I would say that probably the biggest immediate win for diagnostic yield is bigger and more detailed variant databases, which contain information about the health situation of the patients who have those variants (not the details of those patients)—just the link between the variant and the health condition, and matching services, for example, which match genes across different data sets across the world. Obviously there are worries about data sharing and confidentiality in genomics, but there are increasingly ways in which you can link up different data sets in such a way that you can target specific questions at them without being able to see an inappropriate amount of the dataset and those sorts of matching services and linking services between different genomic data sets are probably the biggest root that we have at the moment in order to increase a diagnostic yield from genome sequencing.

Daniel Levine: At the end of a sequencing test, there is a patient and a family. You’ve been able to give diagnosis to a large number of undiagnosed patients. What’s the impact of that diagnosis? How does it change outcomes? How does it change their ability to just deal with their situation?

Ellen Thomas: Yeah, so I think the first element of that is ending the diagnostic odyssey that many of these patients have been through. We know that it is very important to patients and families to understand why a condition has happened and to be able to contact other families who are similarly affected and maybe to predict something of what will happen in the future. So that sense of understanding and control is probably the first element of receiving a diagnosis. In this context, it can also be very useful for predicting for families what the chances of the same condition arising again in the future in that family would be and informing people’s reproductive choices. In some cases, making a diagnosis can mean that somebody is eligible for additional healthcare surveillance. For example, if you have a rare predisposition to a cancer syndrome, then if you can demonstrate that somebody does have that, then they’re likely to be eligible for screening and the need to try and pick up those cancers sooner. In some cases, we found that our patients, when they had a diagnosis, became eligible for a clinical trial, and particularly for retinal disorders, there are clinical trials now, which require the genetic explanation for the retinal disease for a patient to be able to enter those trials. And we had examples of that in the project. There were some key examples where there was a major change in the management for a patient. For example, there was a girl who was diagnosed with a type of immunodeficiency, which is caused by a gene called CTPS1, which is a relatively high risk immunodeficiency and the recommended treatment for that is actually a stem cell transplant. So having that diagnosis really did change the way that little girl was then managed. So, it was a relatively small minority at the time, when we were returning these diagnoses, for which there was a targeted treatment, but the I think the sea change that we see happening now, and I know that you’ve talked on Rarecast to a number of guests about the sorts of nuclear acid therapies, which are really coming online now for patients who have a genetic diagnosis. That means that that making these diagnoses in the future is likely to lead to a whole new world of opportunity in terms of trials or N=1 therapies. So, while the proportions of patients five years or so ago who were able to come out of a diagnosis and go straight into a therapeutic option was relatively small, we can definitely see that world opening up dramatically in the forthcoming years, which makes the diagnostic part of the endeavor so very much more important.

Daniel Levine: Can anything be said from the study about how it alters the diagnostic odyssey for patients? Is whole genome sequencing something that will clearly get patients to answers faster?

Ellen Thomas: Yes, absolutely. The median duration of the diagnostic odyssey for the patients in this patient cohort was over six years, and that was the median diagnostic odyssey from the beginning of symptoms to the point where they received this diagnosis, and their median number of hospital visits was 68 over that time. We know that because patients consent to have their de-identified hospital data available alongside their genome. So, you are able to see the uptick in activity in the run up to a rare disease diagnosis through that data, which is a very powerful way of examining this difficult phenomenon, which is the rare disease diagnostic odyssey. So, the aim with whole genome sequencing really is to achieve a single test that can be the wet lab part, the lab part of the test that is just a one off. And then you can generate that data and target multiple different questions at it, much more efficiently, either simultaneously or in short a short space of time. You can do some testing, which is based on comparing the genomes of a patient and their parents. You can target particular panels of known disease causing genes and look for both small genomic variants and also the large copy number and structural variants. At the same time, you can look at the triplet repeat disorders and we are increasingly coming online now with our specific modules for specific complex genes, such as the gene for SMA and also mitochondrial variants. So, it really is a sort of one stop shop for the wet lab part of the testing, whereas in the past, you would have to do those tests one by one. So, many patients who came into this project had first of all, a less detailed chromosome test, and then a more detailed chromosome test, and then maybe tests of one or two individual genes, and then maybe some smaller gene panel tests, one after the other, after the other as those things became available or became relevant to them, and then they ended up with their diagnosis coming from a genome test. What we are now doing in the U.K. is starting with the genome test as the first diagnostic test, and then targeting all the questions at the same data, and that is definitely a much more efficient process, a much more efficient way of getting to a diagnosis for patients.

Daniel Levine: The cost of the technology has fallen considerably, but many patients still, at least in the United States, have to fight for access. Did your study make a case for the cost effectiveness of whole genome sequencing?

Ellen Thomas: Yes. We did definitely look into that with this study. We know that the cost of the diagnostic odyssey is huge, and obviously, the most important cost of the diagnostic odyssey is the impact on patients—the fact that they’re coming back again and again to have more and more tests, some of which may be quite invasive. That’s the most important impact, but we also know that the cost to the healthcare system is very high. We found that there are over 180,000 episodes of hospital care that had gone into trying to diagnose and treat the patients in this cohort and it was estimated that the cost of that was nearly £90 million, which is over $120 million. Obviously not all those costs would’ve been the diagnostic odyssey, some of them we would be treatment and management of those patients, but a substantial proportion of those costs will have been diagnostic investigations. So, if you can really cut to the chase with a genomic investigation, which for example means that you don’t necessarily have to do a muscle biopsy or a kidney biopsy or a scan, or other forms of imaging, then that really does lower the cost of the testing and does match well with the costs that you are saving; and because when you use a genome, you can use that wet lab test to target different types of questions. For example, in the past we had to do the triplet repeat testing separately, and that meant that you had to pay for two tests. You had to pay for the gene sequencing for the genes of interest, and then separately for the triplet repeat test. Now you don’t have to remember to do the triple repeat test separately, and you don’t have to do the separate wet lab test for all the patients looking for the triplet repeat disorder. So, there are definitely savings there in terms of the lab costs as well.

Daniel Levine: At the end of last year, Genomics England embarked on the newborn genomes program. What is that program and how does it work?

Ellen Thomas: This is a research pilot project, which is really designed to answer questions about how we might use genome sequencing in the context of newborn screening. The intention is that babies who are born in NHS hospitals will be offered this screening and under consent, which will be provided by their parents. The genome sequencing will be carried out and analyzed as quickly as possible, following the birth of the baby to try and identify babies who do have rare genetic disorders, which could be treated before any irreversible harm has occurred.

Daniel Levine: There are still parents who seem nervous about anything from privacy to finding out information that’s not actionable. What’s your sense of parent willingness to participate in this?

Ellen Thomas: Yes, absolutely. That’s an important question in this very sensitive area and the program has been designed incorporating a very extensive public dialogue, which has been going on for the last year or so now. And that’s been held with multiple stakeholders, including patients who themselves have lived experience of rare diseases, with parents and new parents, with different groups within society. And really the consensus that’s emerging from that public dialogue is that if we maintain a carefully targeted focus on disorders that affect babies and very small children where there’s also an effective treatment and in the context of strong information governance and with informed consent, that people generally feel that that is the right way to be starting to ask these questions. And I think the key thing here is that there are a lot of questions, which we don’t know the answers to about how we might use genome sequencing in this context and the logic of we can use genome sequencing to make diagnoses after a baby has presented with symptoms. Can we use that technology to look before the symptoms have started and, where possible, prevent those symptoms from causing irreversible damage? It is such a strong driver. So, we really need to answer the questions about how we do this. And the questions include scientific questions, for example, how well can you detect disorders in advance of any symptoms presenting using genome sequencing? There are some crucial patient experience questions: how are families affected by the process of undergoing this kind of screening or by the process of receiving a positive result? And there are also health system questions: what samples can you take from a small baby like that, that you collect quickly and practically and effectively in the pathway of a baby being born and then being discharged from hospital? And then also research questions: can we, for example, use data from the project to run clinical trials following pre-symptomatic identification of rare disorders? So, it is going to take time to generate all this data. And now with the knowledge that the nucleic acid-based therapies are just coming around the corner and will be more effective if they can be applied in advance, and that there’s a workup time as described so eloquently by Julia Vitarello in one of your recent Rarecast episodes, it’s important that we try to make the most of that window. And I think the consensus that’s developing is that the best way to answer these questions is really to address them in the context of a national program that has been set up based on public dialogue and under conditions of informed consent.

Daniel Levine: And how long is that program expected to run, and what’s the ultimate goal of it?

Ellen Thomas: Well, we’re currently in the planning and consultation phase, we’re due to sequence the first babies during 2023 and run the pilot over a period of three years. Our primary endpoint is really exploring the role of early diagnosis in conditions for which treatment is available. The data set that is generated will also allow extensive research into the diagnosis and treatment of rare disorders. And then it also allows the potential to research the role of genome data during the whole of life and the role of the genome as a resource to come back to during later life. But that will be for very much in the research context and isn’t in the information about disorders that could present later in life and will not be returned to the families. That will be kept entirely in a de-identified context where it’s answering questions in the context where that data won’t be returned, which is in line with the comfort zones that we discovered in our public dialogue.

Daniel Levine: And we talked a little bit about the hesitancy some parents might have to participate, but what case would you make to them? Why should people participate?

Ellen Thomas: Well, I think this research pilot is open to parents who want to participate. Anybody who doesn’t want to participate will continue to have the completely standard commissioned newborn screening via the blood spot test via completely standard care. So, it doesn’t affect standard care. It’s very much a research pilot. I think the aim of the program is very much to explore how we can treat severe rare disorders at a pre-symptomatic stage. I think that that goal is one which potentially resonates with parents and I think getting this program launched now so that we are discovering how to generate the data, how to run the program, how to operate in a way that is acceptable and comfortable for parents, how to present results back to families, how to follow up those results before we get to the phase where the nucleic acid based therapies and other sorts of gene trials are really scaling up and wanting to recruit these patients for future research is really important. I think there’s a mixture of potential outcomes for individual families from the research, but also the potential to answer some very important questions for society about how we want to take this forward and how we want to use genomics in the context of healthy babies.

Daniel Levine: Genomics England is building an important set of genomic data. As with anything with rare disease, the more of it you can get, the better. Is Genomic England doing anything to share that data with others?

Ellen Thomas: Yeah. All of our data from the 100,000 Genomes Project and prospectively from other cohorts, is made available to researchers via our national genomic research library and according to the terms of our consent, this is a reading library, not a lending library. We have a system where we have an access review committee, which has members who are participants, whose data is in the data set, who join in the process of approving researchers for access to the data. And then researchers can only access data within our protected research environment and results can only be removed after they’ve been fully analyzed without identifying patients. I think a lot of big genomics projects around the world have similar sorts of undertakings to ensure that data is being used in a way in which the people who’ve donated it are comfortable with. The question then is how do we federate that data across multiple different data sets to harness the power of those data sets without compromising people’s privacy. And that is a very big area of international discussion and collaboration at the moment, which I think will yield excellent dividends in the coming years.

Daniel Levine: What would you say the long term potential to use whole genome sequencing for newborn screening is? What will it take to see this technology used that way? And are there cost, interpretation, or other issues that remain a barrier?

Ellen Thomas: Yes, I think the cost of genome sequencing is likely to continue falling and as a larger proportion of rare disorders do become treatable, the health economic case for early detection and treatment will strengthen over the coming years. Interpretation, as with all genomic analysis, is the biggest hurdle and making improvements in our ability to interpret genomic variation will be really crucial to maximize the sensitivity of the test while minimizing false alarm results. I think there are a few other things that need sorting out. There are some practicalities that need to be investigated. For example, what sample types you can take, how quickly you can turn around the sequencing and the analysis. But I think these are likely to be amenable to quite rapid improvement as we pilot and iterate the potential approaches to those practical questions it’s likely that other approaches are going to continue being needed alongside whole genome sequencing. A very good example is hypothyroidism, which is obviously a treatable condition, which is currently screened for by the newborn blood spot screening. There are some forms of hypothyroidism, which are detectable by genome sequencing because they’re genetic forms of hypothyroidism. There are some that are not genetic, they’re structural, congenital malformation type forms of hypothyroidism, which can only be detected by looking at thyroid hormone levels and can’t be detected by looking at the genome. And there are others that can be detected by both. So, you really need the genomic approach and the protein based approach to pick up all cases of hypothyroidism. We certainly had patients in the 100,000 Genomes Project, for example, who had one of the forms of hypothyroidism that could be detected by a genome, but couldn’t be detected by current newborn screening. And those patients missed out on five years of treatment with thyroxine, which could have helped their development at an earlier stage and health at an earlier stage in their life. So, there’s likely to need to be a balanced set of approaches. We do need to make sure that genomics is really working for everyone. We know that at the moment, we do have certain populations who are under-sequenced, and thus, it is more difficult to interpret their genomic data. So, making sure that we reach better equality in the quality of interpretation is important. And then I think finally, we need to think about the evidence thresholds, which we need for adopting commissioned screening more broadly, regardless of the technology that we use. There are some conditions which are so rare that there will never be sufficient evidence to meet the classical criteria that we use for deciding on whether to implement population screening. But if those are very, very rare conditions but they are still very treatable conditions, and given that there are thousands of such conditions, a route does need to be found to make pragmatic decisions about the scope of future newborn screening. I think that sort of commissioning evidence conversation is a important one to have a alongside investigating the technology and also the societal impact and societal levels of comfort with the ways in which we use this powerful data.

Daniel Levine: Ellen Thomas, clinical director and director of quality at Genomics England. Ellen, thanks so much for your time today.

Ellen Thomas: Thank you very much, Danny. I very much enjoyed talking to you.

This transcript has been edited for clarity and readability.



Stay Connected

Sign up for updates straight to your inbox.