Simulation of Study Data

Simulates data sets in order to explore modeling techniques or better understand data generating processes. The user specifies a set of relationships between covariates, and generates data based on these specifications. The final data sets can represent data from randomized control trials, repeated measure (longitudinal) designs, and cluster randomized trials. Missingness can be generated using various mechanisms (MCAR, MAR, NMAR).


The simstudy package is collection of functions that allow users to generate simulated data sets in order to explore modeling techniques or better understand data generating processes. The user specifies a set of relationships between covariates, and generates data based on these specifications. The final data sets can represent data from randomized control trials, repeated measure (longitudinal) designs, and cluster randomized trials. Missingness can be generated using various mechanisms (MCAR, MAR, NMAR).

Here is some simple sample code, much more in the vignette:

library(simstudy)
def <- defData(varname="x", formula = 10, variance = 2)
def <- defData(def, varname="y", formula = "3 + 0.5 * x", variance = 1)
dt <- genData(250, def)
 
dt <- trtAssign(dt, nTrt = 4, grpName = "grp", balanced = TRUE)
 
dt
##       id grp         x        y
##   1:   1   3 10.393817 7.805703
##   2:   2   1 10.235161 5.705590
##   3:   3   1 11.517813 8.210183
##   4:   4   1 12.068125 8.618601
##   5:   5   1 10.078817 5.780655
##  ---                           
## 246: 246   4 11.419577 8.442363
## 247: 247   3 10.567231 9.808930
## 248: 248   1 10.451896 7.720858
## 249: 249   3  7.633381 6.861638
## 250: 250   2  9.347781 6.094965

News

simstudy 0.1.1

  • This is the first submission of simstudy, so there is no news yet!

simstudy 0.1.2

  • Fixed index variable issue related to generating categorical data
  • Fixed index variable issue related to generating longitudinal data
  • Fixed issue that arised When creating categorical variable in first field
  • Increased speed required to generate categorical data with large sample sizes
  • Categorical data can now accomodate probabilities condition on covariates
  • Fix: package data.table 1.10.0 broke genMissDataMat. genMissDataMat has been updated.

simstudy 0.1.3

  • Modified "nonrandom" data generation to allow "log"" and "logit"" link options.
  • Added function genCorGen - generate a new data.table with correlated data from various distributions.
  • Added function addCorData - add correlated data from various distributions to existing data.tables.

simstudy 0.1.4

  • Added error check to verify that specified distributions are valid
  • Added function genFactor - converts an existing (non-double) field in a data.table to a factor
  • Added function genDummy - creates dummy variables from an integer or factor field in a data.table
  • Added function defCondition - define distribution conditional on existing fields
  • Added function defReadCond - read in conditional definitions from external csv file
  • Added function addCondition - genaration data based on conditional definition

simstudy 0.1.5

  • Added uniform integer distribution (uniformInt)
  • Added negative binomial distribution (negBinomial)
  • Added exponential distribution (exponential)
  • Added function delColumns - deletes one or more columns from data.table

simstudy 0.1.6

  • Fixed function genSurv
  • Added spline generating functions

simstudy 0.1.7

  • Added function genOrdCat - creates ordinal categorical data
  • Added function genFormula - creates a linear formula in the form of a string
  • Added function updateDef - modify existing data definition table (to be used in genData())
  • Added function updateDefData - modify existing data def table (to be used in addColumns())

simstudy 0.1.8

  • Fixed function updateDef
  • Fixed bug in internal function genbinom
  • Added function genCorFlex - generate correlated data from variables that have different marginal distributions
  • Added function genCorFlex - generate correlated data from variables that have different marginal distributions, can be dependent on previously defined data

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("simstudy")

0.1.9 by Keith Goldfeld, 9 days ago


Browse source code at https://github.com/cran/simstudy


Authors: Keith Goldfeld [aut, cre]


Documentation:   PDF Manual  


GPL-3 license


Imports Rcpp, mvnfast

Depends on data.table

Suggests testthat, knitr, rmarkdown, ggplot2, scales, grid, gridExtra, survival, gee, splines, formatR, mgcv

Linking to Rcpp


See at CRAN