<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=799546403794687&amp;ev=PageView&amp;noscript=1">

Best holdout assessment is sufficient for cancer transcriptomic model selection

Default sub title

minute read

by Cell | January 3, 2025
placeholder

Existing recommendations in statistics and machine learning suggest that smaller, or simpler, predictive models are more likely to generalize well. In cancer transcriptomics, this manifests as a preference for small “gene signatures,” or groups of genes whose expression is used to define subtypes or suggest therapeutic interventions. This study uses public datasets to test the generalization performance of cancer gene expression-based predictive models both across datasets (from cell lines to tumor samples and vice versa) and across cancer types/tissues of origin. In general, we do not observe strong evidence that simpler models inherently generalize more effectively than more complex ones. Our results underscore the importance of defining clear goals in machine learning-based transcriptomic analyses. If the goal is to achieve robust performance across contexts or datasets, then we recommend directly evaluating generalization whenever possible; otherwise, we recommend choosing the model that performs the best on unseen data via cross-validation.

Topics: Press Coverage