Latent Unknown Clustering with Integrated Data

An implementation for the 'LUCID' method to jointly estimate latent unknown clusters/subgroups with integrated data. An EM algorithm is used to obtain the latent cluster assignment and model parameter estimates. Feature selection is achieved by applying the regularization method.


BuildStatus CRAN_Status_Badge

The LUCIDus R package is an integrative tool to obtain a joint estimation of latent or unknown clusters/subgroups with multi-omics data and phenotypic traits. This package is an implementation for the novel statistical method proposed in the research paper “Latent Unknown Clustering Integrating Multi-Omics Data with Phenotypic Traits (LUCID)[1].”

Installation

You will be able to install the released version of LUCIDus from CRAN soon with:

install.packages("LUCIDus")

For now, it can be installed from GitHub using the following codes:

install.packages("devtools")
devtools::install_github("USCbiostats/LUCIDus")

Otherwise, one can download the package from GitHub, and run the following codes from the parent working directory that contains the LUCIDus folder:

install.packages("devtools")
setwd("..")
devtools::install("LUCIDus")

Fitting the latent cluster models

library(LUCIDus)

Three functions, including est_lucid(), sem_lucid(), and tune_lucid(), are currently available for model fitting and selection. The model outputs can be summarized and visualized using summary_lucid() and plot_lucid() respectively. Predictions could be made with pred_lucid().

est_lucid()

Estimating latent clusters with multi-omics data

Example

For a testing dataset with 10 genetic features (5 causal) and 4 biomarkers (2 causal)

Integrative clustering without feature selection
set.seed(10)
IntClusFit <- est_lucid(G=G1,Z=Z1,Y=Y1,K=2,family="binary",Pred=TRUE)

Checking important model outputs with summary_lucid()

summary_lucid(IntClusFit)

Visualize the results with Sankey diagram using plot_lucid()

plot_lucid(IntClusFit)

Re-run the model with covariates in the G->X path

IntClusCoFit <- est_lucid(G=G1,CoG=CoG,Z=Z1,Y=Y1,K=2,family="binary",Pred=TRUE)

Check important model outputs

summary_lucid(IntClusCoFit)

Visualize the results

plot_lucid(IntClusCoFit)

sem_lucid()

Supplemented EM-algorithm for latent cluster estimation

Example

set.seed(100)
sem_lucid(G=G2,Z=Z2,Y=Y2,useY=TRUE,K=2,Pred=TRUE,family="normal",Get_SE=TRUE,
            def_initial(),def_tol(MAX_ITR=1000,MAX_TOT_ITR=3000))

tune_lucid()

Example

Grid search for tuning parameters using parallel computing

# Better be run on a server or HPC
set.seed(10)
GridSearch <- tune_lucid(G=G1, Z=Z1, Y=Y1, K=2, Family="binary", USEY = TRUE,
                           LRho_g = 0.008, URho_g = 0.012, NoRho_g = 3,
                           LRho_z_invcov = 0.04, URho_z_invcov = 0.06, NoRho_z_invcov = 3,
                           LRho_z_covmu = 90, URho_z_covmu = 110, NoRho_z_covmu = 2)
GridSearch$Results
GridSearch$Optimal

Run LUCID with best tuning parameters and select informative features

set.seed(10)
IntClusFit <- est_lucid(G=G1,Z=Z1,Y=Y1,K=2,family="binary",Pred=TRUE,
                        tunepar = def_tune(Select_G=TRUE,Select_Z=TRUE,
                                           Rho_G=0.01,Rho_Z_InvCov=0.06,Rho_Z_CovMu=90))
# Identify selected features
summary_lucid(IntClusFit)$No0G; summary_lucid(IntClusFit)$No0Z
colnames(G1)[summary_lucid(IntClusFit)$select_G]; colnames(Z1)[summary_lucid(IntClusFit)$select_Z]
# Select the features
if(!all(summary_lucid(IntClusFit)$select_G==FALSE)){
    G_select <- G1[,summary_lucid(IntClusFit)$select_G]
}
if(!all(summary_lucid(IntClusFit)$select_Z==FALSE)){
    Z_select <- Z1[,summary_lucid(IntClusFit)$select_Z]
}

Re-fit with selected features

set.seed(10)
IntClusFitFinal <- est_lucid(G=G_select,Z=Z_select,Y=Y1,K=2,family="binary",Pred=TRUE)

Visualize the results with a Sankey diagram

plot_lucid(IntClusFitFinal)

Re-run feature selection with covariates in the G->X path

IntClusCoFit <- est_lucid(G=G1,CoG=CoG,Z=Z1,Y=Y1,K=2,family="binary",Pred=TRUE,
                          initial=def_initial(), itr_tol=def_tol(),
                          tunepar = def_tune(Select_G=TRUE,Select_Z=TRUE,Rho_G=0.02,Rho_Z_InvCov=0.1,Rho_Z_CovMu=93))
summary_lucid(IntClusCoFit)

Re-fit with selected features with covariates

IntClusCoFitFinal <- est_lucid(G=G_select,CoG=CoG,Z=Z_select,Y=Y1,K=2,family="binary",Pred=TRUE)

Visualize the results

plot_lucid(IntClusCoFitFinal)

For more details, see documentations for each function in the R package.

Built With

  • devtools
    • Tools to Make Developing R Packages Easier
  • roxygen2
    • In-Line Documentation for R

Versioning

The current version is 0.9.0.

For the versions available, see the Release on this repository.

Authors

  • Cheng Peng

License

This project is licensed under the GPL-2 License.

Acknowledgments

  • David V. Conti, Ph.D.
  • Zhao Yang, Ph.D.
  • USC IMAGE P1 Group
  1. Under development, citation coming soon

News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("LUCIDus")

0.9.0 by Cheng Peng, 8 months ago


https://github.com/USCbiostats/LUCIDus


Browse source code at https://github.com/cran/LUCIDus


Authors: Cheng Peng , Zhao Yang , David V. Conti


Documentation:   PDF Manual  


GPL-2 license


Imports mvtnorm, nnet, glmnet, glasso, Matrix, lbfgs, stats, methods, networkD3, foreach, doParallel

Suggests testthat, knitr, rmarkdown


See at CRAN