Comorbidities are often calculated with International Classification of Diseases (ICD) codes as an input variable, but changes in coding, retirement of codes, and other factors can cause inconsistencies in running comorbidity algorithms—especially across long periods of time. To address this challenge, Peter DeWitt, PhD, assistant research professor at the Department of Biomedical Informatics (DBMI) at the University of Colorado Anschutz (CU Anschutz), developed the medicalcoder package.
This is the seventh installment in our ongoing series spotlighting the DBMI Wall of Software, an interactive online hub showcasing the latest open-source software and data tools for researchers. View the Wall of Software here.
Comorbidities, Comorbidity Algorithms and ICD Diagnosis Codes
Comorbidities are very important for patient care because treatments and interventions must be aware of and respond to the unique sets of comorbidities in a patient. Comorbidities are also extremely important for retrospective analysis, as they are used for risk adjustment or cohort characterization.
A comorbidity classification algorithm is a tool that analyzes healthcare data—especially ICD diagnosis codes—to identify and weight coexisting diseases in patients. These classification algorithms return a score that gives researchers and clinicians insight into the probability of a patient’s mortality, hospital readmission and more.
There are a variety of comorbidity classification algorithms, including but not limited to the Pediatric Complex Chronic Conditions (PCCC), the Charlson and the Elixhauser indicies.
These comorbidity classification algorithms often rely on ICD diagnosis codes, which are standardized codes used by healthcare providers to classify and document morbidities. In 2015, the medical field in the United States transitioned from ICD-9 to ICD10, which opened the door to some inconsistencies and challenges that inspired DeWitt to begin development.
The Development of the medicalcoder Package
The development of the medicalcoder package originated from an encounter in a previous project where DeWitt was working with the pccc R package—a package that implements the PCCC comorbidity algorithm version 2, as defined in a publication by Feudtner and colleagues published in BMC Pediatrics in 2014.
In 2024, version 3 of PCCC was published in JAMA Netw Open by Feinstein and colleagues, but the pccc R package was still built on version 2. So, DeWitt went to work adjusting the package to address and account forPCCC version 3.
As he implemented the new package, DeWitt found that he was receiving mismatches between what he had written and the example dataset he was provided. When he dug into this, he found that some of the ICD codes that were built into the package had been retired. Additionally, he started to realize that the transition from the amount of digits in the codes would cause challenges across analyses when comorbidity algorithms were implemented over time.
So first, DeWitt developed a strong database of ICD-9 and ICD-10 codes that would account for datasets that included both ICD-9 and ICD-10 codes from various fiscal years. This new database would act as a reliable database for the PCCC package, ensuring it would be able to account for these intricacies.
But then DeWitt saw an opportunity to expand this work across different comorbidity algorithms such as the Charlson or the Elixhauser indicies. This led DeWitt to cultivate the medicalcoder R package.
About the medicalcoder Package
The medicalcoder R package can work with multiple comorbidity algorithms, accounting for ICD-9 and ICD-10 codes across varying fiscal years. This means that the package can account for comorbidities longitudinally (across time), ensuring that the algorithms are fed as much of the relevant data as possible.
“The medicalcoder package is a utility for applying comorbidity algorithms to a dataset. It is designed to work in base R, to account for both ICD-9 and ICD-10 codes in the same patient record, and to apply comorbidities over longitudinal records.”
- Peter DeWitt, PhD
The medicalcoder package is built so that it does not require anything besides base R (version 3.5 or newer). No other namespaces or dependencies are required, which allows it to work in healthcare settings where devices may have limited flexibility, or may be facing inconsistent internet access.
However, DeWitt has also implemented conditional optimizations that allow the package to run more efficiently if and only if the dependencies are available. This creates a package that can flex to varying circumstances, optimizing based on the user’s unique situation.
One of the challenges that DeWitt encountered in the development of the medicalcoder package was staying within the 5MB package limit from the comprehensive R archive network (CRAN), while also ensuring the package contained all of the relevant ICD data. To address this, he developed the package with a dictionary that maps the relevant components of the ICD codes—including versions, descriptions, and the codes themselves—to integer codes. Then, the tables can be constructed on the machine of the end user without having to be stored in the package itself. Because DeWitt took the time to source and construct these datasets, the medicalcoder package is able to contain an entire database of ICD-9 codes from 1997–2015, as well as ICD-10 codes from 2001–2026—while the entire package remains under 3MB.
The medicalcoder package is a vital tool for clinicians and researchers who are interested in implementing comorbidity algorithms across longitudinal data. The package can be accessed at the medicalcoder page on CRAN, or it can be found on the DBMI Wall of Software, where the medicalcoder package’s documentation is linked.
Presentations on the medicalcoder Package
DeWitt is presenting about the medicalcoder R package at the virtual R/Medicine 2026 Conference on Friday, May 8 2026 at 2:06 p.m. Eastern Daylight Time (EDT). DeWitt is also presenting about the medicalcoder R package to the Colorado/Wyoming Chapter of the American Statistical Association on Friday, May 8 2026.