Recognize and Handle Data in Formats Used by Swedish Cancer Centers

Handle data in formats used by cancer centers in Sweden, both from 'INCA' (< https://rcc.incanet.se>) and by the older register platform 'Rockan'. All variables are coerced to suitable classes based on their format. Dates (from various formats such as with missing month or day, with or without century prefix or with just a week number) are all recognized as dates and coerced to the ISO 8601 standard (Y-m-d). Boolean variables (internally stored either as 0/1 or "True"/"False"/blanks when exported) are coerced to logical. Variable names ending in '_Beskrivning' and '_Varde' will be character, and 'PERSNR' will be coerced (if possible) to a valid personal identification number 'pin' (by the 'sweidnumbr' package). The package also allow the user to interactively choose if a variable should be coerced into a potential format even though not all of its values might conform to the recognized pattern. It also contain a caching mechanism in order to temporarily store data sets with its newly decided formats in order to not rerun the identification process each time. The package also include a mechanism to aid the documentation process connected to projects build on data from 'INCA'. From version 0.7, some general help functions are also included, as previously found in the 'rccmisc' package.



output: github_document

Build status Project Status: Active - The project has reached a stable, usable state and is being actively developed. CRAN_Status_Badge Monthly downloads Total downloads

The goal of incadata is to provide basic functionality to handle data from INCA and the Regional cancer centers in Sweden.

Installation

You can install the released version of incadata from CRAN with:

install.packages("incadata")

And the development version from BitBucket with:

# install.packages("devtools")
devtools::install_bitbucket("cancercentrum/incadata")

Standardised data sets

The function as.incadata standardize data from INCA and Rockan:

  • All date formats used by Rockan are recognized as dates and coerced to such (for example: 1985-05-04, "", 19850504, 19850500 , 19850000 and 8513).
  • Booleans are numeric vectors in INCA: c(0, 1, 0, 1, 0, 0), but coerced to character when exported: c(NA, "True", NA, "True", NA, NA). The package recognise this peculiarity and coerce to Boolean.
  • Personal identity numbers are recognised even if they end with "X" etcetera (used in Rockan).
  • Standard numerical codes from Rockan are decoded (using the decoder package).
  • Column names are always coerced to lower case, since these are generally easier to work with.
  • Data frames are coerced to tibbles .
  • An id column is always added to data frames in order to always have an identification variable at hand (regardless if the data has none or one of "PERSNR", "PNR" or "PAT_ID")

Register documentation

The package also provides functionality for easier access and archiving of register documentation (se vignette "incadoc") and function documents.

Additional functionality

The package also lets you ...

  • ... cache your data sets between work sessions in on order to speed up the data loading and munging process
  • ... use a single data reading/munging function regardless if you work on INCA or locally
  • ... interactively engage in the coercing process of variable formats. This is handy for example if a variable is almost a date but has some additional entries that are not recognised as such.

Code of conduct

Please note that the 'incadata' project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

News

incadata 0.8.1

  • Removed non-ASCII encoding from example data.

incadata 0.8

  • Refactored functions for document. Does no longer rely on static link file included in the package.

incadata 0.7

  • Incorporated relevant functions from the rccmiscpackage to simplify dependencies.
  • Extended the test suite to reach higher test coverage.

incadata 0.6.4

  • Added a NEWS.md file to track changes to the package.
  • Fixed bugs for CRAN.
  • Use pkgdown to create website

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("incadata")

0.8.2 by Erik Bulow, 7 months ago


https://cancercentrum.bitbucket.io/incadata


Report a bug at https://www.bitbucket.org/cancercentrum/incadata/issues


Browse source code at https://github.com/cran/incadata


Authors: Erik Bulow [aut, cre]


Documentation:   PDF Manual  


GPL-2 license


Imports backports, decoder, dplyr, rvest, sweidnumbr, xml2

Suggests testthat, knitr, rmarkdown, R.rsp


See at CRAN