Provides flexible Bayesian estimation of Infinite Mixtures of Infinite Factor Analysers and related models, for nonparametrically clustering high-dimensional data, introduced by Murphy et al. (2020)
The IMIFA package provides flexible Bayesian estimation of Infinite Mixtures of Infinite Factor Analysers and related models, for nonparametric model-based clustering of high-dimensional data, introduced by Murphy et al. (2017) <arXiv:1701.07010v4>. The IMIFA model assumes factor analytic covariance structures within mixture components and simultaneously achieves dimension reduction and clustering without recourse to model selection criteria to choose the number of clusters or cluster-specific latent factors, mostly via efficient Gibbs updates. Model-specific diagnostic tools are also provided, as well as many options for plotting results, conducting posterior inference on parameters of interest, posterior predictive checking, and quantifying uncertainty.
The package also contains three data sets: olive
, USPSdigits
, and coffee
.
You can install the latest stable official release of the IMIFA
package from CRAN:
install.packages("IMIFA")
or the development version from GitHub:
# If required install devtools:
# install.packages('devtools')
devtools::install_github('Keefe-Murphy/IMIFA')
In either case, you can then explore the package with:
library(IMIFA)
help(mcmc_IMIFA) # Help on the main modelling function
Generally, mcmc_IMIFA()
is used for running the model and creating a raw results object, on which get_IMIFA_results()
is then called to prepare these results for posterior inference. The output of the second call be visualised in many ways using plot.Results_IMIFA()
.
For a more thorough intro, the vignette document is available as follows:
vignette("IMIFA", package="IMIFA")
However, if the package is installed from GitHub the vignette is not automatically created. It can be accessed when installing from GitHub with the code:
devtools::install_github('Keefe-Murphy/IMIFA', build_vignettes = TRUE)
Alternatively, the vignette is available on the package's CRAN page.
Murphy, K., Gormley, I. C. and Viroli, C. (2018) Infinite Mixtures of Infinite Factor Analysers. To appear. <arXiv:1701.07010v4>
mgpControl
gains the arguments cluster.shrink
and sigma.hyper
:
cluster.shrink
governs invocation of cluster shrinkage MGP hyperparameter for MIFA/OMIFA/IMIFA methods.sigma.hyper
controls the gamma hyperprior on this parameter. The posterior mean is reported, where applicable.alpha
to be learned via MH steps for the OM(I)FA models.
bnpControl
args. learn.alpha
, alpha.hyper
, zeta
, & tune.zeta
become relevant for OM(I)FA models.get_IMIFA_results
(with associated plots):scores_MAP
to decompose factor scores summariesget_IMIFA_resuls
into submatrices corresponding to the MAP partition.sim_IMIFA_model
to call sim_IMIFA_data
usingResults_IMIFA
objects.get_IMIFA_results
arg. vari.rot
allows loadings templates to be varimax rotated,FALSE
).plot.Results_IMIFA
argument common
governing plot.meth="means"
plots (details in documentation).sigma.mu
defaults to 1
s.t. the hypercovariance is the identity for the prior on the means;sigma.mu=NULL
.prec.mu
defaults to 0.01
s.t. the prior on the cluster means is flat by default.learn.d
defaults to TRUE
s.t. a PYP prior is assumed for IM(I)FA models by default.alpha.hyper
now has a larger hyper-rate by default, to better encourage clustering.alpha.d1
& alpha.d2
now set to 2.1
/3.1
rather than 2
/6
to discourage exponentially fast shrinkage.z.init
now defaults to "hc"
: model-based agglomerative hierarchical clustering.psi_hyper
(details in documentation) for:
N <= P
data where the sample covariance matrix is not invertible.type="isotropic"
uniquenesses.scores
& loadings
can now be supplied to sim_IMIFA_data
directly;non.zero
controls the # effective factors (per column & cluster) when loadings
are instead simulated.hc
can now be passed when init.z="mclust"
also"hc"
), thus controlling how Mclust
is itself initialised.criterion
to be passed via ...
in mixfaControl
to choose betweenmclustBIC
/mclustICL
to determine optimum model to initialise with whenz.init="mclust"
& also sped-up mclust
initialisation in the process.stop.AGS
arg. to mgpControl
: renamed adapt.at
to start.AGS
for consistency.start.zeta
& stop.zeta
options to tune.zeta
argument in bnpControl
.breaks
in the plotting functions mat2cols
& heat_legend
.pareto_scale()
.get_IMIFA_results
for clustering methods:
>=
the max of the modal estimates of the same>=
the corresponding modal estimate were used):range.G
and trunc.G
defaults fixed, especially for small sample size settings.zlabels
are supplied to get_IMIFA_results
;uni.type
.get_IMIFA_results
.get_IMIFA_results
.mcmc_IMIFA
& sim_IMIFA_data
.Q
cannot exceed no. of observations in the corresponding cluster in sim_IMIFA_data
.alpha=0
for IM(I)FA models;discount
when fixing alpha<=0
.hc
model types for initialisation purposes via ...
in mixfaControl
.dimnames
of get_IMIFA_results
output in x$Loadings
& x$Scores
.burnin=0
.zlabels
supplied.show_IMIFA_digit
to better account for missing pixels &/or the data having been centered/scaled.psi
when not supplied to sim_IMIFA_data
to IG rather than GA.Q
to be supplied to get_IMIFA_results
for infinite factor methods.plot.meth="zlabels"
.show_digit
.get_IMIFA_results
.Procrustes
now works when X
has fewer columns than Xstar
.scores
& loadings
in trace
& density
plots.Ledermann
and related warnings to account for case of isotropic uniquenesses.cat
/message
/warning
calls for printing clarity.IMIFA-package
help file (formerly just IMIFA
).CITATION
file and authorship.mcmc_IMIFA
by consolidating arguments using new helper functions (with defaults):
mixfaControl
.mgpControl
for infinite factor models.bnpControl
for infinite mixture models.storeControl
.error.metrics
argument to get_IMIFA_results
.plot.meth="errors"
to plot.Results_IMIFA
.mixfaControl
gains the arg. prec.mu
to control the degree of flatness of the prior for the means.get_IMIFA_results
) & visualisable (plot.Results_IMIFA
,plot.meth="zlabels"
), via new function post_conf_mat
, to further assess clustering uncertainty.plot.Results_IMIFA
when plot.meth="zlabels"
.get_IMIFA_results
now also returns the last valid samples for parameters of interest,plot.Results_IMIFA
gains new arg. show.last
that replaces any instance of showing the posterior meanplot.meth="means"
or plot.meth="parallel.coords")
.equal.pro
argument for M(I)FA models:PGMM_dfree
accordingly and forced non-storage of mixing proportions when equal.pro
is TRUE
.sim_IMIFA_data
also extended to work for univariate data, as well as sped-up.nu
& nuplus1
to mgpControl
, replaced by ability to specify more general gamma prior,phi.hyper
arg. specifying shape and rate - mgp_check
has also been modified accordingly.Zsimilarity
sped-up via the comp.psm
& cltoSim
functions s.t. when # observations < 1000.get_IMIFA_results
.psi.alpha
no longer needs to be strictly greater than 1, unless the default psi.beta
is invoked;mixfaControl
.hc
" option to z.init
to initialise allocations via hierarchical clustering (using mclust::hc
)....
in mixfaControl
.mu
argument to sim_IMIFA_data
to allow supplying true mean parameter values directly.aicm
/bicm
model selection criteria now computed and returned.Rfast
utility functions: colTabulate
& matrnorm
.matrixStats
, on which IMIFA
already depends.adapt=FALSE
for infinite factor models with fixed high truncation level.Plot.Results_IMIFA
,plot.meth="zlabels"
and the true zlabels
are supplied.mixfaControl
gains arg. drop0sd
to control removal of zero-variance features (defaults to TRUE
).heat_legend
gains cex.lab
argument to control magnification of legend text.mat2cols
gains the transparency
argument.PGMM_dfree
to include the 4 extra models from the EPGMM family.zlabels
to get_IMIFA_results
will now match the cluster labels and parameters tozlabels
to plot.Results_IMIFA
when plot.meth="zlabels"
no longer doesget_IMIFA_results
: nowplot(get_IMIFA_results(sim), plot.meth="zlabels", zlabels=z)
gives different results fromplot(get_IMIFA_results(sim, zlabels=z), plot.meth="zlabels")
as only the latter will permute.sigma.mu
& psi.beta
values.get_IMIFA_results
.get_IMIFA_results
for IMFA/OMFA models when range.Q
is a range.aicm
, bicm
and dic
criteria: all results remain the same.alpha
when discount
is being learned.uni.prior="isotropic"
when uni.type
is (un)constrained
.mcmc_IMIFA
.get_IMIFA_results
when there are empty clusters.print
and summary
functions for objects of class IMIFA
and Results_IMIFA
.zeta
when adaptively targeting alpha
's optimal MH acceptance rate.alpha
be tiny for (O)M(I)FA models (provided z.init != "priors"
for overfitted models).get_IMIFA_results
when conditioning on G
for IM(I)FA/OM(I)FA models.MGP_check
that alpha.d2
be moderately large relative to alpha.d1
.sigma.mu
hyperparameter arg. is always coerced to diagonal entries of a covariance matrix.plot.Results_IMIFA
now depends on device's support of semi-transparency.is.list(x)
with inherits(x, "list")
for stricter checking.check.margin=FALSE
to calls to sweep
.Ledermann
, MGP_check
, and PGMM_dfree
are now properly vectorised.USPSdigits
data set (training and test),show_digit
and show_IMIFA_digit
.olive
, coffee
and vignette data and used LazyData: true
.call.=FALSE
to stop()
messages and immediate.=TRUE
to certain warning()
calls.adrop
, e1071
, graphics
, grDevices
, plotrix
, stats
& utils
libraries.Rfast
w/ own versions of colVars
, rowVars
, & standardise
.IMIFA_news
for accessing this NEWS
file.CITATION
file.Collate:
field to DESCRIPTION
file.usage
sections of multi-argument functions.G_expected
& G_variance
.range.G
contains 1.get_IMIFA_results
from working properly when true labels are NOT supplied."constrained"
& "single"
to mcmc_IMIFA
's uni.type
argument:mcmc_IMIFA
gains the tune.zeta
argument, a list of heat
, lambda
& target
parameters, to invokealpha
"constrained"
or "single"
,"unconstrained"
or "isotropic"
, utilising pre-computation and empty assignment.is.cols
, Ledermann
, Procrustes
& shift_GA
.is.posi_def
gains make
argument, merging it with previously hidden function .make_posdef
:log.like
arg. removed from gumbel_max
; function stands alone, now only stored log-likelihoods computed.psi
argument added to sim_IMIFA_data
to allow supplying true uniqueness parameter values directly.bw="SJ"
everywhere density
is invoked for plotting (bw="nrd0"
is invoked if this fails).isotropic
(I)FA models.isotropic
uniquenesses plots.learn.d
is TRUE
but learn.alpha
is FALSE
.discount
when mutation rate is too low (i.e. too many zeros).byrow=TRUE
:load.meth
argument replaced by logical heat.map
in plot.Results_IMIFA
.mat2cols
gains compare
argument to yield common palettes/breaks for heat maps of multiple matrices:plot_cols
function also fixed, and now unhidden.IMIFA
no longer depends on the corpcor
, gclus
, MASS
, matrixcalc
, or MCMCpack
libraries.par()$bg
(i.e. default "white"
) for plotting zero-valued entries of similarity matrix.heat_legend
calculated correctly.mcmc_IMIFA
's verbose
argument now governs printing of message
& cat
calls, but not stop
or warning
.NEWS.md
to build.discount
& alpha
parameters via Metropolis-Hastings now implemented.
discount
: size of spike controlled by arg. kappa
.param
argument gains the option discount
for posterior inference.gumbel_max
replaces earlier function to sample cluster labels and is now unhidden/exported/documented.plot.meth=GQ
for OM(I)FA/IM(I)FA models depicting trace of #s of active/non-empty clusters.Zsimilarity
to summarise posterior clustering by the sampled labels with minimumget_IMIFA_results
, the similarity matrix can be plotted via plot.meth="zlabels"
.alpha
when discount
is non-zero, rather than usual Gibbs.discount
parameter.aic.mcmc
& bic.mcmc
criteria when uniquenesses are isotropic:PGMM_dfree
, which calculates # 'free' parameters for finite factor analytic mixture models is exported/documented.G_priorDensity
now better reflects discrete nature of the density, and plots for non-zero PY discount values.heat_legend
.MCMCpack:rdirichlet
:rDirichlet
replaces earlier function to sample mixing proportions and is now unhidden/exported/documented.dimnames
attributes in mcmc_IMIFA
to get_IMIFA_results
: lower memory burden/faster simulations.get_IMIFA_results
to reduce size/simplify access.trunc.G
, the max allowable # active clusters, and # active clusters now stored.active
G=1 by not simulating labels for IM(I)FA models.score.switch
defaults to FALSE
if # models ran is large.Rfast::sort_unique
and rotating properly.rnorm
columns to scores matrix during adaptation, esp. when widest loadings matrix grows/shrinks.N < P
.alpha
parameter now correctly depend on current # non-empty rather than active clusters.discount
.mcmc_IMIFA
output.stop(...)
for finite factor models to warning(...)
.get_IMIFA_results
)/printed (plot.Results_IMIFA
) even when zlabels
not supplied.verbose=FALSE
.