Online data collection tools like Google Forms often export multiple-response questions with data concatenated in cells. The concat.split (cSplit) family of functions splits such data into separate cells. The package also includes functions to stack groups of columns and to reshape wide data, even when the data are "unbalanced"---something which reshape (from base R) does not handle, and which melt and dcast from reshape2 do not easily handle.
R functions to split concatenated data, stack columns of your datasets, and convert your data into different shapes.
cSplit: A core function that collects the functionality of several of the
concat.splitfamily of functions.
concat.split: A set of functions to split strings where data have been concatenated into a single value, as is common when getting data collected with tools like Google Forms. (
cSplit_lto return a
cSplit_eto return an "expanded" view of the input data.)
Stacked: A function to create a list of
stacked sets of variables. Similar to
meltfrom "reshape2", but doesn't put everything into one very long
Reshape: A function to allow base R's
reshapefunction to work with "unbalanced" datasets.
stratified: A function to take random row samples by groups, similar to
getanID: A function for creating a secondary ID when duplicated "id" variables are present.
expandRows: "Expands" the rows of a dataset.
listCol_w: Unlists (long) or flattens (wide) a column in a
data.tablestored as a
list. Neither is vectorized.
The package is on CRAN. You can install it using:
To install the V2 beta version, use:
devtools::install_github("mrdwab/splitstackshape", ref = "v2.0")
To install the V1 development version, use:
devtools::install_github("mrdwab/splitstackshape", ref = "v1_development")
Current CRAN version: 1.4.8
Updated to pass CRAN tests due to changes in the RNG sample kinds.
22 July 2018
Interim release to help the
data.table team with reverse dependency checks.
05 April 2018
Preparing for transition to V2 of the splitstackshape package.
.Deprecated(). These include
concat.split.compact, both of which can now just directly use
cSplit_fhas been removed as it would no longer be relevant in V2 of the package and isn't entirely reliable the way it has been written.
fread, on which the function was based, has underwent many changes since the function was written.
Tests have been added covering most basic cases, but not for all potential bugs that have been fixed in V2 of the package.
stratifiedhas been fixed.
cSplit_fhas been removed.
29 March 2018
Reshape()bugfix. Reported at https://stackoverflow.com/q/49281838/1270695.
listCol_w()bugfix. Thanks to @jazzurro.
cSplit_e()bugfix. Reported at https://stackoverflow.com/q/48576331/1270695
20 March 2018
23 October 2014
listCol_wadded as utilities for unlisting or flattening columns stored as
18 October 2014
:::.stripWhite when using
"|" as a delimiter fixed.
13 October 2014
See 1.3.0 -- 1.3.8 for details of changes.
cSplit now replaces
cSplit_f has been introduced as a related function. Other new
12 October 2014
The "_f" is both representative of
fread, which this function uses to split
the concatenated cells, and "fixed", which is indicative of the fact that this
function would only work if the number of resulting columns is the same for
each row in the input.
"Expand" the rows of a
data.frame or a
data.table either by values
specified in a column of the input dataset or by a vector specifying the
number of times to repeat each row.
merged.stacknow try to guess the "
id.vars" values based on the values in "
var.stubs". The values can still be specified manually.
08/10 October 2014
Incremental cleanups and additions to get ready for V1.4.0.
concat.split.multipleare now simply wrappers for
cSplitand no longer use
:::read.concatto split up the values.
concat.split.expandedgiven short name forms (
Before the release of 1.4.0, the basic
concat.split* functions would become
simple wrappers for
cSplit, which is much more efficient than the previous
implementations. The earlier functions will remain for compatability purposes.
cSplit is already in use, it will be an exported function.
A function to take fixed or proportional samples by group from a
27 October 2013
concat.splitnow have an additional argument,
type, which takes a value of either
"character". It is set to a default of
type = "numeric"in the case of
type = NULLin the case of
valueMat for numeric data.
charBinaryMat for string data.
Due to changes introduced after recommendations by @flodel, the following
functions have been rewritten as
20 October 2013
New function added:
concat.split.expanded did not previously support expanding "character" data.
Due to prompting by @juba,
charBinaryMat has been included to handle such cases.
27 August 2013
merge.stackis now faster than
Reshape, at least for large datasets.
18 August 2013
merge.stacknow made MUCH faster using almost a pure
17 August 2013
Stackedresults in a list of length 1, it is "unlisted" before being returned.
Reshape(and as a result,
concat.split.multiple(..., direction = "long")) has been enhanced by the addition of a feature to automatically add an ID variable if the present "IDs" are not unique.
New functions added:
16 August 2013
read.concatupdated to use
count.fieldsto determine the correct number of columns that the resulting
Reshapenow has an option to remove the
rownamesfrom the output, set to
12 August 2013
Initial commit of splitstacshape with the following main functions:
concat.split.multiple) -- To split concatenated data into more manageable data formats.
Reshape-- To help base R's reshape function handle unbalanced data and simplify the reshape syntax (wide to long only).
Stacked-- To selectively stack columns of a data.frame.
Non-exported functions are indicated with
::: before their names.