<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=799546403794687&amp;ev=PageView&amp;noscript=1">

Researchers at the University of Colorado Develop Summix2 to Improve Understanding of how Genetics Impact Health and Disease

Audrey Hendricks, Hayley Stoneman, and Adelle Price develop tools to enhance the accuracy of health risk predictions by integrating the complexities of genetic substructure from summary data.

minute read

by Melinda Lammert | February 6, 2025
DNA strand on light blue background

Genetic summary data is useful for disease risk prediction and understanding the biology underlying health and disease. However, comprehensive and accurate use of all genetic summary data has been impossible due to unaccounted for genetic substructure. For instance, African American and Latino populations often have a more intricate genetic makeup due to a complex background known as genetic substructure. This substructure arises from distinct genetic ancestry combinations due to migration and historical events. While African American and Latino populations have more substantial genetic substructure, everyone has some form of genetic substructure, although sometimes from more similar groups such as within continental (e.g., Europea, Asia). When health risks are predicted without fully considering these genetic nuances, research is often less accurate, limiting precision medicine and our understanding of the full genomics and health landscape for all people.

In response to this challenge, Audrey Hendricks, PhD, associate professor of biomedical informatics, first authors Hayley Stoneman (Wolff), PhD, and Adelle Price, MS, and their team at the University of Colorado School of Medicine developed Summix2, an open source software package, to improve the use of genetic summary data to understand how genetics impact health and disease.

Traditional genetic research typically focuses on individual data, whereas Summix2 enables the use of data aggregated over entire groups of people (otherwise called summary data). This group-based model is particularly useful for large-scale studies, where individual-level data may not be available. Price explained that because Summix2 requires only summary-level data, genetic substructure estimation and adjustment can be completed on one’s personal computer- without the use of high-performance computational clusters- and runs in under 10 minutes regardless of variant and reference group number. This helps ensure that genetic research is more available to all people.

Summix2 uses a mathematical approach called mixture modeling to create a genetic similarity map between the summary data and reference populations. Using the genetic similarity map, genetic patterns in a group of people can be adjusted to match a person or another sample. While the first version of Summix, published in AJHG in 2021, used five reference continental-level genetic ancestry groups (African-like, East Asian-like, European-like, Indigenous American-like, and South Asian-like), Summix2 uses fifty finer-scale reference groups creating a denser and more complete similarity map and moving closer to a continuous representation of the genetic space. A continuous representation of genetic ancestry is important as no two people are exactly alike and there are truly no clearly delineated ancestry groups.

This new tool has been successfully used in several studies. Price shared, “Summix2 has been used to identify potential regions of the genome under evolutionary selection pressure, identify likely ascertainment bias (i.e., selection bias), and as a quality control step to ensure appropriate data matching during association analyses,” Hendricks added, “The second example of selection bias where the sample is not representative of the general population is especially interesting.” Using Summix2, the team was able to detect that the proportion of people with a genetic risk of prostate cancer was greater for older age groups in the Colorado Center for Personalized Medicine (CCPM) Biobank. We believe this increase in genetic risk is likely due to people with prostate cancer receiving treatment at the University of Colorado Hospital and subsequently being recruited into the biobank.

To ensure Summix2 works correctly, the team assessed its performance in computational simulations and real data with known substructure across a wide variety of genetic substructure patterns —including substructure patterns seen in African-American, Latino, and finer-scale European groups. This helped confirm its accuracy across different populations, including those with more complex genetic backgrounds.

Price emphasized that, as is common with all method development, the team went through trial and error—particularly regarding developing a statistical test to detect regions where the local substructure differed from the average across the whole genome.

The team is now working to improve Summix2 by accounting for hidden genetic differences that reference groups don't capture. This is particularly important for studying underrepresented populations.

As more comprehensive genetic reference data becomes available in the future, Summix2 will be able to support research in unique populations better as well. Ultimately, Summix2 enables more comprehensive and accurate use of genetic summary data for all people improving precision medicine and our understanding of the genetic factors that contribute to health and disease.

Featured Experts
Staff Mention

Audrey Hendricks, PhD

Staff Mention

Adelle Price, MS

Staff Mention

Hayley Stoneman, PhD