Calculate comorbidities, Charlson and van Walraven scores, perform fast and accurate validation, conversion, manipulation, filtering and comparison of ICD-9 and ICD-10 codes. This package enables a work flow from raw lists of ICD codes in hospital databases to comorbidities. ICD-9 and ICD-10 comorbidity mappings from Quan (Deyo and Elixhauser versions), Elixhauser and AHRQ included. Common ambiguities and code formats are handled.
Calculate comorbidities, Charlson and van Walraven scores, perform fast and accurate validation, conversion, manipulation, filtering and comparison of ICD-9 and ICD-10 codes. This package enables a work flow from raw lists of ICD codes in hospital databases to comorbidities. ICD-9 and ICD-10 comorbidity mappings from Quan (Deyo and Elixhauser versions), Elixhauser and AHRQ included. Common ambiguities and code formats are handled.
icd
is used by many researchers around the world who work in public
health, epidemiology, clinical research, nutrition, journalism, health
administration and more. I’m grateful for contact from people in these
fields for their feedback and code contributions, and I’m pleased to say
that icd
has been used in works like the Pulitzer
finalist work on
maternal death by
ProPublica.
See also the vignettes and examples embedded in the help for each function for more. Here’s a taste:
library(icd)# Typical diagnostic code data, with many-to-many relationshippatient_data#> visit_id icd9#> 1 1000 40201#> 2 1000 2258#> 3 1000 7208#> 4 1000 25001#> 5 1001 34400#> 6 1001 4011#> 7 1002 4011#> 8 1000 <NA># get comorbidities using Quan's application of Deyo's Charlson comorbidity groupscomorbid_charlson(patient_data)#> MI CHF PVD Stroke Dementia Pulmonary Rheumatic PUD LiverMild#> 1000 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE#> 1001 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE#> 1002 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE#> DM DMcx Paralysis Renal Cancer LiverSevere Mets HIV#> 1000 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE#> 1001 FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE#> 1002 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE# or go straight to the Charlson scores:charlson(patient_data)#> 1000 1001 1002#> 2 2 0# for more examples, see this and other vignettesvignette("introduction", package = "icd")
ICD-9 codes are still in heavy use around the world, particularly in the USA where the ICD-9-CM (Clinical Modification) was in widespread use until the end of 2015. ICD-10 has been used worldwide for reporting cause of death for more than a decade, and ICD-11 is due to be released in 2018. ICD-10-CM is now the primary coding scheme for US hospital admission and discharge diagnoses used for regulatory purposes and billing. A vast amount of electronic patient data is recorded with ICD-9 codes of some kind: this package enables their use in R alongside ICD-10.
A common requirement for medical research involving patients is determining new or existing comorbidities. This is often reported in Table 1 of research papers to demonstrate the similarity or differences of groups of patients. This package is focussed on fast and accurate generation of this comorbidity information from raw lists of ICD-9 codes.
ICD-9 codes are not numbers, and great care is needed when matching individual codes and ranges of codes. It is easy to make mistakes, hence the need for this package. ICD-9 codes can be presented in short 5 character format, or decimal format, with a decimal place separating the code into two groups. There are also codes beginning with V and E which have different validation rules. Zeroes after a decimal place are meaningful, so numeric ICD-9 codes cannot be used in most cases. In addition, most clinical databases contain invalid codes, and even decimal and non-decimal format codes in different places. This package primarily deals with ICD-9-CM (Clinical Modification) codes, but should be applicable or easily extendible to the original WHO ICD-9 system.
ICD-10 has a somewhat simpler format, with consistent use of a letter, then two alphanumeric characters. However, especially for ICD-10-CM, there are a multitude of qualifiers, e.g. specifying recurrence, laterality, which vastly increase the number of possible codes. This package recognizes validity of codes by syntax alone, or whether the codes appear in a canonical list. The current ICD-10-CM master list is the 2016 set. There is no capability of converting between ICD-9 and ICD-10, but comorbidities can be generated from older ICD-9 codes and newer ICD-10 codes in parallel, and the comorbidities can then be compared.
Look at the help files for details and examples of almost every function in this package. There are several vignettes showing the main features. Many users have emailed me directly for help, and I’ll do what I can, but it is often better to examine or add to the list of issues so we can help each other. Advanced users may look at the source code, particularly the extensive test suite which exercises all the key functions.
?comorbid?comorbid_hcc?explain_code?is_valid# first show the listvignette(package = "icd")vignette("pccc", package = "icd")
Note that reformatting from wide to long and back is not as straightforward as using the various Hadley Wickham tools for doing this: knowing the more detailed structure of the data let’s us do this better for the case of dealing with ICD codes.
The latest version is available in github icd, and can be installed with:
install.packages("devtools")devtools::install_github("jackwasey/icd")
A substantial amount of code has now been contributed to the package.
Contributions of any kind to icd
are very welcome. See the [GitHub
issues
page]](https://github.com/jackwasey/icd/issues)
to see open issues and feature requests. Documentation, vignettes and
examples are very welcome, especially if accompanied by some real-world
data.
To build icd
, Rcpp
must be compiled from source. This happens
automatically on Linux, but on Mac and Windows, the following may
sometimes be required, especially after upgrading R itself. This is a
limitation of the R build system.
install.packages("Rcpp", type = "source")
restore_id_order = FALSE
in comorbidity calculations.icd9_comorbid_charlson
and icd10_comorbid_charlson
as synonyms for the Quan/Deyo comorbidity calculations. comorbid_charlson
will infer the ICD type.pkgdown
generated site in gh-pages
branch.library(icd)
firstrticles
and tinytex
comorbid_charlson(patient_data)
comorbid(patient_data)
, and icd::comorbid
may also be used, and which many consider good practice. explain_icd
synonym avoids name conflict with the popular dplyr
package, but icd::explain
also a nice option.icd9
still available in CRAN repo, but not being updated. This greatly speeds up and simplifies the test suite.explain_table
which tabulates results of looking up various information about a list of ICD codes. This is a new feature which may be changed as it is used in the real world, and more tests are developed.stringr
family of dependencies: it was often slower on benchmarking than built-ins, and no clear benefits other than internally consistent syntax, and stringr updates caused CRAN warnings due to a documentation change.icd9
should now be uninstalled.icd9ExplainShort
becomes icd_explain
as.icd10("A01")
or as.icd9cm("0101")
. This will help avoid mistakes when working with mixed data.icd9
prefix functions, now this package equally covers ICD-10. New naming scheme follows Hadley Wickham's preferred coding style, using underscores. Most public functions begin with icd_
. Package data, and version specific functions, are named with icd_
, icd9_
, icd10_
, prefixes. E.g. icd10_chapters
and icd9cm_hierarchy
. All deprecated functions will still work, but they give warnings (sometimes many). The warnings can be turned off with an option. The original test suite from icd9
runs and passes on the icd
package, with only minimal changes.icd9ValidDecimal
testthat
which has backward-incompatible changesicd
does import stringi
via stringr
to give cleaner string processing. Base string processing is still used as it is often faster. magrittr
is now too useful not to import, has no dependencies of its own, and is imported by stringr
anyway. CRAN now also seems to need base packages to be listed as imports.fastmatch
for fast factor generation, but with the tweak of not sorting the levels. This had been by far the slowest step in generating comorbidities.icd9
commands. These are avaiable in the package data icd9Billable
. See vignette for examples.icd9
can now parse this eclectically formatted document to extract all the headings, so it is not possible to do icd9Explain
on a non-billable four-digit code, e.g. 643.0 (Mild hyperemesis of pregnancy). Previously on three-digit and billable (i.e. lead node) codes were used. In principle, the RTF parsing code could be run on previous versions going back to about year 2000. It seems that most years are the same or expand previous years, although there are a few deletions. Ideally, we would know what year/version a given ICD-9 code was coded under, and then validate or interpret accordingly. This can indeed be done for billable codes, but until the RTF is parsed for previous years, not for headings.icd:::icd9PartsToShort
etc.lintr
package from @jimhestervermont_dx
.