Everything’s up to Date in Kansas City, at Least When It Comes to Genomics

January 5, 2021

by Danny Levine

At the end of October, the National Institutes of Health introduced a new data management and sharing policy to see that publicly funded biomedical research is shared and made available broadly.

NIH said it hopes the Final NIH Policy for Data Management and Sharing, which replaces an existing policy from 2003, will help move toward a culture change in which data management and sharing are seen as a central component of research. The new policy will go into effect in January 2023.

“Responsible data management and sharing is good for science,” wrote Carrie Wolinetz, associate director for Science Policy on the NIH website. “It maximizes availability of data to the best and brightest minds, underlies reproducibility, honors the participation of human participants by ensuring their data is both protected and fully utilized and provides an element of transparency to ensure public trust and accountability.”

It has taken five years for NIH to develop the policy, which incorporates feedback from stakeholders. It said the policy sought to strike a balance between reasonable expectations for data sharing and flexibility to allow for a diversity of data types and circumstances. It applies to all research funded or conducted by NIH that results in the generation of scientific data.

The policy doesn’t require data to be shared. Instead, it requires researchers to submit a data management and sharing plan. It acknowledges that there may be legal or other factors that impede the ability to save and share data.

The new policy also emphasizes the importance of good data management practices. It establishes the expectation for maximizing the appropriate sharing of scientific data generated from NIH-funded or conducted research, with justified limitations or exceptions.

John Wilbanks, chief commons officer of Sage Bionetworks and a governance and policy advisor to RARE-X, praised NIH for what he called “a pretty good policy,” but said it will take time for the agency to implement and refine it.

“Until we see it in deployment, we don’t really know what it’s going to look like,” he said. “With open policies, there’s a long tradition where you have to iterate on them because they represent such a big change. You have to push them out piece by piece, monitor them, and keep fixing them.”

Wilbanks notes a new policy was needed because there has been a proliferation of new kinds of data since the 2003 policy was put into place. iPhone, digital photography, inexpensive genomic data, wearable sensors, and other technologies transform the open data concept.

He said the challenge with biological data is that it requires time and investment to annotate and curate to make it useful. “It’s not like GPS data where what you need is a database on the Internet and some rules over who gets to download it,” he said. “You have to have so much professional expert science work done to make data usable by a third party that you have an entirely professional class of data curation that is emerging as a job. The changes in the document reflect that growth in biology and that growth in the understanding of all the stuff around the raw data that gives it meaning that you’ve also got to have to make open data usable.”

Wilbanks thinks the new policy will benefit RARE-X by making it easier for the organization to achieve its mission to aggregate data by having an analogous effort underway at NIH to which it can point to help others make sense of the effort.

“That it makes it easier in the aggregate for RARE-X to achieve its mission. To the extent that RARE-X is out there making arguments that no one else is making, it’s hard for RARE-X to change the game,” he said. “If RARE-X can be out there and say the kind of stuff we’re talking about is the kind of stuff the U.S. government’s talking about and what we’re doing is building the same kind of systems they want to build, but for us and by us, I think that’s a pretty significant change in the information sphere.”

Dr. Tom Curran

The Genomic Medicine Center at Children’s Mercy Hospital is working to sequence 100,000 genomes of children, siblings, and their parents with the hope of producing the largest data set of genetic information about children with rare diseases created to date.

The effort, known as the Genomic Answers for Kids program, is a seven-year, $80 million project at the Kansas City, Missouri-based hospital. The expectation is that studying the genetics of thousands of children will fuel progress in understanding rare diseases, identify undiscovered ones, and lead to faster diagnosis and new treatments.

“Families have struggled sometimes for decades, not getting a diagnosis,” said Dr. Tom Curran, senior vice president, chief scientific officer, and executive director of the Children’s Mercy Research Institute. “Providing that upfront as part of a research study, but under conditions that allows the data to be returned to the families, meant that we could provide answers for a very large number of families while advancing the field.”

The program is using a variety of technologies to capture pathogenic mutations that might otherwise go undetected. It’s using a method known as long sequencing, which Curran said provides a greater hit rate in rare genetic mutations. The program is also using single-cell genomics for complex disorders that have somatic mutations, and germline mutations.

The program, which launched before the COVID-19 pandemic, has enrolled 1,650 families with about 3,900 total participants to date. The project has already provided 160 new diagnoses, identifying 80 genes that may be contributing to disease in a research study of 720 families.

Curran called the data challenge involved in the project as “unbelievably enormous” and said we need new ways to deal with clinical informatics and genomic data.

“I don’t think there’s anyone who can tell you that they’re confident that they’re making the best use of data. This is an evolving situation,” he said. “We, along with many other institutions, are using AI approaches, but there is a long way to go. And this is where it’s very important to engage with the best and the brightest. One way to do that is to make your data available, so someone anywhere in the world can make a contribution to the interpretation of your data.”

The project has made a significant commitment to data sharing. Its data will be uploaded to the National Center for Biotechnology Information database of Genotypes and Phenotypes (dbGaP), which was developed to archive and distribute data and results from studies that have investigated the interaction of genotype and phenotype in humans. He said the research center is also developing a platform for real-time sharing that ultimately will be expanded for wide access.

“No single group has all the answers. And particularly with rare diseases, it’s very important to share information. You can learn from the experiences of others,” said Curran, who pointed to the case of one family with a rare mutation of unknown significance. When the program shared that information, it was able to identify an investigator who had identified a small number of families with the same disorder and same mutation.

“The more we share, the more we impact the data mix,” he said. “And our philosophy at Children’s Mercy in Kansas City is that although we’re very focused on our local demographic population, we actually believe we’re working for children everywhere in the world.”

RARE-X believes in that commitment to data sharing. It’s central to RARE-X’s entire reason for being. RARE-X understands that when it comes to rare disease data, the more that can be aggregated, the faster researchers will be able to gain new insights, and the faster we will be able to diagnose and treat patients.

Stay Connected

Sign up for updates straight to your inbox.