Simulation of Study Data

Simulates data sets in order to explore modeling techniques or better understand data generating processes. The user specifies a set of relationships between covariates, and generates data based on these specifications. The final data sets can represent data from randomized control trials, repeated measure (longitudinal) designs, and cluster randomized trials. Missingness can be generated using various mechanisms (MCAR, MAR, NMAR).


The simstudy package is collection of functions that allow users to generate simulated data sets in order to explore modeling techniques or better understand data generating processes. The user specifies a set of relationships between covariates, and generates data based on these specifications. The final data sets can represent data from randomized control trials, repeated measure (longitudinal) designs, and cluster randomized trials. Missingness can be generated using various mechanisms (MCAR, MAR, NMAR).

Here is some simple sample code, much more in the vignette:

library(simstudy)
def <- defData(varname="x", formula = 10, variance = 2)
def <- defData(def, varname="y", formula = "3 + 0.5 * x", variance = 1)
dt <- genData(250, def)
 
dt <- trtAssign(dt, nTrt = 4, grpName = "grp", balanced = TRUE)
 
dt
##       id grp         x        y
##   1:   1   3 10.393817 7.805703
##   2:   2   1 10.235161 5.705590
##   3:   3   1 11.517813 8.210183
##   4:   4   1 12.068125 8.618601
##   5:   5   1 10.078817 5.780655
##  ---                           
## 246: 246   4 11.419577 8.442363
## 247: 247   3 10.567231 9.808930
## 248: 248   1 10.451896 7.720858
## 249: 249   3  7.633381 6.861638
## 250: 250   2  9.347781 6.094965

News

simstudy 0.1.1

  • This is the first submission of simstudy, so there is no news yet!

simstudy 0.1.2

  • Fixed index variable issue related to generating categorical data
  • Fixed index variable issue related to generating longitudinal data
  • Fixed issue that arose when creating categorical variable in first field
  • Increased speed required to generate categorical data with large sample sizes
  • Categorical data can now accommodate probabilities condition on covariates
  • Fix: package data.table 1.10.0 broke genMissDataMat. genMissDataMat has been updated.

simstudy 0.1.3

  • Modified "nonrandom" data generation to allow "log"" and "logit"" link options.
  • Added function genCorGen - generate a new data.table with correlated data from various distributions.
  • Added function addCorData - add correlated data from various distributions to existing data.tables.

simstudy 0.1.4

  • Added error check to verify that specified distributions are valid
  • Added function genFactor - converts an existing (non-double) field in a data.table to a factor
  • Added function genDummy - creates dummy variables from an integer or factor field in a data.table
  • Added function defCondition - define distribution conditional on existing fields
  • Added function defReadCond - read in conditional definitions from external csv file
  • Added function addCondition - generate data based on conditional definition

simstudy 0.1.5

  • Added uniform integer distribution (uniformInt)
  • Added negative binomial distribution (negBinomial)
  • Added exponential distribution (exponential)
  • Added function delColumns - deletes one or more columns from data.table

simstudy 0.1.6

  • Fixed function genSurv
  • Added spline generating functions

simstudy 0.1.7

  • Added function genOrdCat - creates ordinal categorical data
  • Added function genFormula - creates a linear formula in the form of a string
  • Added function updateDef - modify existing data definition table (to be used in genData())
  • Added function updateDefData - modify existing data def table (to be used in addColumns())

simstudy 0.1.8

  • Fixed function updateDef
  • Fixed bug in internal function genbinom
  • Added function genCorFlex - generate correlated data from variables that have different marginal distributions
  • Added function genCorFlex - generate correlated data from variables that have different marginal distributions, can be dependent on previously defined data

simstudy 0.1.9

  • Added function catProbs - to be used to generate categorical data
  • Added binomial distribution
  • Added ability to specify formula in variance
  • Added function genMultiFac - generates multi-factorial design data
  • Added function addMultiFac - adds multi-factorial design data
  • Added function iccRE - generates required random effect variance for specified intra-class coefficients (ICCs)
  • Fixed bug in function genCorFlex
  • Fixed bug in numerous functions related to error checking and scoping
  • Fixed bug in function addCondition

simstudy 0.1.10

  • Added function genCorMat - generate an n x n correlation matrix
  • Added function genCorOrdCat - generate correlated ordinal categorical data
  • Added beta distribution option to function defData (and associated functions)
  • Added function betaGetShapes
  • Implemented Emrich and Piedmonte algorithm for correlated binary data for function genCorGen and addCorGen
  • Modified function genOrdCat - allows adjVar = NULL
  • Fixed bug in function addCorFlex

simstudy 0.1.11

  • Added negative binomial option to genCorGen, addCorGen, genCorFlex, and addCorFlex
  • Fixed bug in function genFactor
  • Added LAG() functionality to missing data generation - updated functions genMiss and added two new internal functions .checkLags and .addLags
  • Function catProbs now accepts a vector of probabilities or weights as an argument
  • Fixed bugs in function addCondition

simstudy 0.1.12

  • Fixed genCorFlex and addMultiFac to accomodate bug fixes with package data.table

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("simstudy")

0.1.14 by Keith Goldfeld, a month ago


Browse source code at https://github.com/cran/simstudy


Authors: Keith Goldfeld [aut, cre]


Documentation:   PDF Manual  


GPL-3 license


Imports Rcpp, data.table, mvnfast, mvtnorm

Suggests testthat, knitr, rmarkdown, ggplot2, grid, gridExtra, survival, splines, formatR, mgcv

Linking to Rcpp


See at CRAN