<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=799546403794687&amp;ev=PageView&amp;noscript=1">
Massive Datasets

Data scientists in the spotlight

CU Anschutz researchers are harnessing the power of the largest patient-privacy-limited dataset in US history

minute read.

Written by Wendy Meyer on July 20, 2022

Melissa Haendel, PhD, professor in the Department of Biochemistry and Molecular Genetics at the University of Colorado School of Medicine, and her team of data scientists have been working at a lightning-fast pace for two years, unlocking some of the mysteries of long COVID. Not only have they been instrumental in the development of the largest national, publicly available HIPAA-limited dataset in U.S. history – the National COVID Cohort Collaborative (N3C) – but their research using the data is making headlines and getting the attention of the White House.

This spring, they published a paper in Lancet Digital Health on their work to identify who has long COVID in the United States, using a machine-learning approach. The paper has been featured in sources as diverse as the NIH Director’s Blog to Marketplace, just to name a couple. And the N3C itself has been highlighted in Nature, Newsweek, MIT Review and STAT News.

Anita Walden, MS, associate director of the National Center for Data to Health, works with Chief Research Informatics Officer Haendel and has been one of the original architects of N3C.

“What I tell my family is I do the science behind the technology,” said Walden, associate research professor in the Department of Biomedical Informatics. “We are here to help make information more available and more usable – to take the data and create knowledge out of it.” 

Tell Bennett, MD, MS, vice chair of clinical informatics for the Department of Biomedical Informatics, also played a key role in the development of the N3C, and has published multiple papers with a focus on the pediatric population

Haendel, Walden and Bennett serve in leadership roles in the Colorado Clinical and Translational Sciences Institute. All three were part of the team that created a machine-learning algorithm to identify potential long COVID in patients before they receive a diagnosis. “Using the machine-learning model, we have identified over 150,000 adult patients in N3C with high confidence who may have long COVID,” Haendel said. 

She explained that researchers are in the process of validating these data in locations across the country. Physicians will review the charts and records of individuals the algorithm identified as having long COVID, to see if what the data scientists found is accurate. 

“Validation will help us refine our algorithms,” Haendel continued. Once the information is validated, scientists may be able to develop useful tools that doctors may use in the clinic, suggesting diagnostic and treatment options. 

Now that Haendel and team have identified potential long COVID patients, the information can help identify potential recruitment candidates for clinical trials to treat the disease. Moreover, Haendel said, her team has multiple studies underway, for example, looking at the impact of vaccination on long COVID or reinfections during different variant waves. 

Haendel and Walden and the N3C team have also been looking into the effectiveness of the antiviral treatment Paxlovid. Coronavirus experts from President Biden’s administration approached Haendel and her colleagues to tell them about the effectiveness of Paxlovid in keeping individuals out of the hospital, the effects of drug on the kidney and the rebound phenomenon of whether patients still test positive after treatment. 

The data from N3C on Paxlovid are limited, so it is difficult to draw definitive conclusions. However, Haendel said she is confident that her team will supply additional information to the president’s administration in the near future. 

“For the White House to come and ask us for data is pretty special,” Haendel said. 

Haendel said N3C has successfully demonstrated the use of real-world data to evaluate comorbidities, interventions and outcomes—and there is so much more to learn using massive datasets. She noted that new data governance structures are being piloted to create novel, shared infrastructure for use in all disease areas, which could result in new types of research in Alzheimer’s, diabetes and more. 

“If we can continue to leverage the incredible community work on data governance, data harmonization and collaborative analytics for other issues like opioid abuse or rare diseases, this would be amazing,” Walden said.

Topics: COVID-19,

Featured Experts
Staff Mention

Melissa Haendel, PhD

Staff Mention

Tell Bennett, MD, MS

Staff Mention

Anita Walden, MS