A Biomarker Validation Approach for Classification and Predicting Survival Using Metabolomics Signature

An approach to identifies metabolic biomarker signature for metabolic data by discovering predictive metabolite for predicting survival and classifying patients into risk groups. Classifiers are constructed as a linear combination of predictive/important metabolites, prognostic factors and treatment effects if necessary. Several methods were implemented to reduce the metabolomics matrix such as the principle component analysis of Wold Svante et al. (1987) , the LASSO method by Robert Tibshirani (1998) , the elastic net approach by Hui Zou and Trevor Hastie (2005) . Sensitivity analysis on the quantile used for the classification can also be accessed to check the deviation of the classification group based on the quantile specified. Large scale cross validation can be performed in order to investigate the mostly selected predictive metabolites and for internal validation. During the evaluation process, validation is accessed using the hazard ratios (HR) distribution of the test set and inference is mainly based on resampling and permutations technique.

R package : A biomarker validation approach for predicting survival using metabolic signature, this package develope biomarker signature for metabolic data. It contains a set of functions and cross validation methods to validate and select biomarkers when the outcome of interest is survival. The package can handle prognostic factors and mainly metabolite matrix as input, the package can served as biomarker validation tool.

Why use the package

  • It can be used with any form of high dimensional/omics data such as: Metabolic data, Gene expression matrix, incase you dont have a data it can simulate hypothetical scinerio of a high dimensional data based on the desired biological parameters
  • It developed any form of signature from the high dimensional data to be used for other purpose
  • It also employs data reduction techniques such as PCA, PLS and Lasso
  • It classifies subjects based on the signatures into Low and high risk group
  • It incorporate the use of subject prognostic information for the to enhance the biomarker for classification
  • It gives information about the surival rate of subjects depending on the classification


You can install the released version of MetabolicSurv from CRAN with:


Illustrations to simulate a Metabolomic profile matrix

Apart from the survival prediction and classification, \pkg{MetabolicSurv} can also be used to generate an artificial Metabolomic profile matrix, survival data (Survival time and censoring indiicator) and clinical covariates which will be referred to as prognostic factors to be used for further analysis or for other pursoses. Since there a few publicly available metabolic profile matrix this package can be used to firstly simulate each of this respective dataset which is required to evaluate the other basic and advance function in the package.

    Data <- MSData(nPatients = 200, nMet = 3000, Prop = 0.5)
    Metdata <- Data$Mdata
    Survdata <- Data$Survival
    Censordata <- Data$Censor
    Progdata <- Data$Prognostic

The code above was used to simulate a metabolomic, survival and prognostic data with a total of 200 patients with 3000 metabolites in the metabolomic profile matriix assuming that the proportion of patients having low risk is 0.5 . The proportion can be adjusted depending on how strict one need to be in assuming equal or unequal proportion of classification based on biological findings or intelligent guess. The Metabolomic profile matrix is stored in Metdata, the survival time is stored in Survdata, Censoring information in Censordata and the Prognosticfactor/clinical covariates in Progdata.

A quick Demostration to solve a problem

"Problem of interest"
"Given a set of subjects with known riskscores and prognostic features how can we use this information to obtain their risk of surving and what group does each respective subject belongs to?"
##  Loading the package
##  Loading one of the inbuilt data
##  This function does Classification, Survival Estimation and Visualization
Result = EstimateHR(Risk.Scores=DataHR[,1],Data.Survival=DataHR[,2:3]
## Survival information
## Group information

Functions in the package

Category Functions Description
Basic MSpecificCoxPh Metabolite by metabolite Cox proportional hazard analysis
SurvPcaClass Classifier based on first PCA
SurvPlsClass Classifier based on first PLS
Majorityvotes Classifiction for Majority Votes
Lasoelacox Wrapper function for glmnet
MSData Generate Artificial Metabolic Survival Data
Advance CVLasoelacox Cross Validations for Lasso Elastic Net predictive models and Classification
CVSim Cross-validation for Top $K_{1}, \ldots, K_{n}$ metabolites
CVPcaPls Cross-validations for PCA and PLS based methods
CvMajorityvotes Cross-validation for majority votes
MetFreq Frequency of Selected Metabolites from the Metabolite specific Cross Validation
QuantileAnalysis Sensitivity of the quantile used for classification
Icvlasoel Inner and outer cross-validations for shrinkage methods
DistHR Null distribution of the estimated HR
SIMet Sequentially increase the number of top $K$ metabolites


version 1.0.0

The first version of the package

  • A software paper associated with this package MetabolicSurv will be available in few weeks.



  • added NEWS.md creation

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


1.0.0 by Olajumoke Evangelina Owokotomo, 3 months ago


Report a bug at https://github.com/OlajumokeEvangelina/MetabolicSurv/issues/new

Browse source code at https://github.com/cran/MetabolicSurv

Authors: Olajumoke Evangelina Owokotomo [aut, cre] , Ziv Shkedy [aut]

Documentation:   PDF Manual  

GPL-3 license

Imports superpc, glmnet, matrixStats, survminer, survival, rms, tidyr, pls, Rdpack, methods, stats, gplots, ggplot2

Suggests knitr, rmarkdown

See at CRAN