Random Cluster Generation (with Specified Degree of Separation)

We developed the clusterGeneration package to provide functions for generating random clusters, generating random covariance/correlation matrices, calculating a separation index (data and population version) for pairs of clusters or cluster distributions, and 1-D and 2-D projection plots to visualize clusters. The package also contains a function to generate random clusters based on factorial designs with factors such as degree of separation, number of clusters, number of variables, number of noisy variables.


News

v1.3.2 >>>> Feb 14, 2015

(1) fixed a few bugs in function 'getSepProjData':

'u.cl <- unique(cl)' should be

'u.cl <- sort(unique(cl)'

'yi <- y[cl == u.cl[i], , drop = FALSE]'

should be

'yi <- y[which(cl == u.cl[i]), , drop = FALSE]'

v1.3.1 >>>> Jan 7, 2013

(1) added a space between 'Weiliang Qiu' and

'[email protected]' in the 'Maintainer' slot

in the DESCRIPTION file

Thank Dr. Kurt Hornik for his kind help!

v1.3.0 >>>> Jan 6, 2013

(1) rename 'log.txt' file to 'NEWS'.

Thanks for Mr. Suraj Gupta ([email protected]) for this suggestion!

v1.2.9 >>>> April 2, 2012

(1) fixed a bug pointed by Dr. Anton Korobeynikov

[email protected]

Dear Dr. Weiliang Qiu,

Recently we tried to used your package clusterGeneration but found

that the behavior of genRandomClust() with clustszind == 3 is

definitely wrong compared to the one documented.

After looking into the implementation it became obvious that

genMemSize() function does wrong things: it tries to sample from 1:G

using provided clusterSizes as weights. Surely the output clusters

have wrong sizes (not the ones specified).

The fix is pretty simple: change the code for clustszind == 3 to

something like this:

mem <- sample(unlist(lapply(1:G, function(x) rep.int(x, times =

clustSizes[x]))))

N <- sum(clustSizes)

Or, maybe if you want to keep the current behavior it'd be better to

introduce new clustszind variant.

(2) add 'clustSizes<-as.integer(clustSizes)' before checking

'!is.integer(clustSizes[i])'

v1.2.8 >>>> March 19, 2012 (1) fixed a few warning messages:

(a)>>

* checking R code for possible problems ... NOTE

genNoisyMeanCov: warning in eigen(Sigma.noisy, sym = TRUE): partial

argument match of 'sym' to 'symmetric'

(b)>>>

** running examples for arch 'i386' ... WARNING

Found the following significant warnings:

Warning: sd() is deprecated.

Warning: sd() is deprecated.

Warning: sd() is deprecated.

Warning: sd() is deprecated.

Deprecated functions may be defunct as soon as of the next release of

R.

See ?Deprecated.

** running examples for arch 'x64' ... WARNING

Found the following significant warnings:

Warning: sd() is deprecated.

Warning: sd() is deprecated.

Warning: sd() is deprecated.

Warning: sd() is deprecated.

Deprecated functions may be defunct as soon as of the next release of

R.

See ?Deprecated.

(2) add 'na.rm=TRUE' to functions 'min', 'max', 'sum', 'mean', 'median', etc.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("clusterGeneration")

1.3.4 by Weiliang Qiu, 5 years ago


Browse source code at https://github.com/cran/clusterGeneration


Authors: Weiliang Qiu <[email protected]> , Harry Joe <[email protected]>.


Documentation:   PDF Manual  


Task views: Cluster Analysis & Finite Mixture Models, Multivariate Statistics


GPL (>= 2) license


Depends on MASS


Imported by mlVAR, phytools, rEMM, stream.

Suggested by qVarSel.


See at CRAN