Researchers in the Department of Biomedical Informatics (DBMI) at the University of Colorado School of Medicine spearheaded an innovative method to synthesize and compile accurate information about COVID-19 since the onset of the global pandemic in 2020.
Reviewing thousands of research articles about COVID-19, a consortium led by Casey Greene, PhD, founding chair of the DBMI, and Halie Rando, PhD, computational biologist who works in the department’s Greene Lab, assembled the results in a more than 600-page manuscript on the cloud-based collaboration platform, GitHub, titled “SARS-CoV-2 and COVID-19: An Evolving Review of Diagnostics and Therapeutics.”
“We have written what is essentially a book about COVID-19 that has been relatively well cited,” explains Rando. “It’s much more dynamic than a typical manuscript. Our understanding of this weird virus continues to evolve, and this manuscript tells the story.”
The living manuscript provides a rare historical online record of research produced throughout the pandemic. Forming worldwide research connections, sorting through waves of new data each day, and battling misinformation, Greene says the effort went far beyond leveraging technology.
“How do you connect this set of very interested people to take on a project that has turned out to be a completely insurmountable challenge and make something out of it?” he asks. “That’s what Halie did.”
Rando, who began her postdoctoral research career in early 2020, was just starting work in the lab when pandemic lockdowns upended her plans to collect DNA samples. As the virus rapidly spread, hampering lab-based research like Rando’s, more than 20,000 articles were published in the first four months of the pandemic, creating a new need for a community evaluating, synthesizing, and summarizing the findings.
“A lot of us were isolated at home, reading the literature in our houses, and missing the community that would normally help us make sense of all sorts of new data coming out,” Rando says. “For early-career people, it was devastating not being in the lab. Our careers depend on getting research out.”
Many of the early articles on COVID-19 were accelerated by traditional publishers and preprint servers as researchers attemped to understand this new pathogen. Preprints are not peer-reviewed, and some traditional publishers accelerated their editorial processes, including peer review, in order to fast-track the dissemination of COVID-19 content early in the pandemic.
Rando set out to review this flood of content under the direction of Greene. She and fellow researchers launched a COVID-19 review consortium on March 20, 2020. The members all worked remotely, dedicating their efforts to summarizing, synthesizing, and reviewing the literature on COVID-19 and gaining attention from the international journal Nature as an interesting use case on platforms for distributed authoring.
Since the project began, the team has expanded to over 50 authors, written over 200,000 words, and reviewed more than 2,000 papers on COVID-19. They have released seven preprints, three of which have since been published in the journal mSystems, with two more coming out in early 2023, and presented work at conferences about open-source software and “living documents.”
Traditionally, scientific collaborations involve emailing a Word document among collaborators or using a shared document on Google Workspace.
Anticipating that the consortium would become a major collaboration, the reviewers instead used Manubot, “a really cool piece of software,” Rando says. Manubot had been developed in Greene’s lab to facilitate a literature review of advances in deep learning, another fast-moving field that required real-time collaboration among authors. Manubot continuously integrates the work of multiple collaborators, with line-by-line revision marking showing the date and source of each revision. By managing Manubot projects with GitHub, collaborators could engage remotely with shared documents.
Manubot allowed the COVID-19 consortium to add and edit content using a version-control tool to track revisions and contributions over time. This made it possible for Rando, the team leader, to manage and coordinate the contributions of dozens of authors. In addition to writing her own reviews, she created training materials and provided one-on-one technical support to researchers who had never used GitHub, much less Manubot.
The team also expanded Manubot to make it possible to integrate data from external sources tracking COVID-19 data directly into the manuscript, allowing figures, tables, statistics, and even the text itself to update in response to the rapidly changing pandemic.
Initially, the consortium focused on the pathogenesis of COVID-19 and potential therapeutics. As the pandemic spread, the team broadened into other areas of COVID-19 research, including diagnostics, social determinants, nutraceuticals, and vaccines.
One researcher who became a prolific contributor to the project was Ronan Lordan, PhD, a postdoc at the University of Pennsylvania with a background in clinical nutrition and biochemistry. He had previously served as a peer reviewer for journals such as The Lancet and was initially motivated to contribute his expertise to review articles touting hydroxychloroquine.
“There simply weren’t trials of hydroxychloroquine at the time,” he says. “It was important to get the right information out. We were pretty efficient. As information evolved, we would correct the record.”
Lordan continues to review articles for the consortium, focusing on supplements and nutrition.
The project has no end in sight as the COVID-19 literature continues to grow. The team is always looking for new contributors, Rando says. The project can be accessed at GitHub here.