Supervised Component Generalized Linear Regression

An extension of the Fisher Scoring Algorithm to combine PLS regression with GLM estimation in the multivariate context. Covariates can also be grouped in themes.


CRAN_Status_Badge

Introduction

SCGLR is an open source implementation of the Supervised Component Generalized Linear Regression (Bry et al. 2013, 2016, 2018), which identifies, among a large set of potentially multicolinear predictors, the strong dimensions most predictive of a set of responses.

SCGLR is an extension of partial least square regression (PLSR) to the uni- and multivariate generalized linear framework. PLSR is particularly well suited for analyzing a large array of explanatory variables and many studies have demonstrated its predictive performance in various biological fields such as genetics (Boulesteix and Strimmer 2007) or ecology (Carrascal, Galván, and Gordo 2009). While PLSR is well adapted for continuous variables, maximizing the covariance between linear combination of dependent variables, and linear combinations of covariates, SCGLR is suited for non-Gaussian outcomes and non-continuous covariates.

SCGLR is a model-based approach that extends PLS (Tenenhaus 1998), PCA on instrumental variables (Sabatier, Lebreton, and Chessel 1989), canonical correspondence analysis (Ter Braak 1987), and other related empirical methods, by capturing the trade-off between goodness-of-fit and common structural relevance of explanatory components. The notion of structural relevance has been introduced (Bry and Verron 2015).

SCGLR can deal with covariates partitioned in several groups called “themes”, plus a group of additional covariates. Each theme is searched for orthogonal components representing its variables in the model, whereas the additional covariates appear directly in the model, without the mediation of a component (Bry et al. 2019).

Installation

# Install release version from CRAN
install.packages("SCGLR")
 
# Install development version from GitHub
devtools::install_github("SCnext/SCGLR")

Main functions and works in progress

SCGLR is designed to deal with outcomes from multiple distributions: Gaussian, Bernoulli, binomial and Poisson separately or simultaneously (Bry et al. 2013). Moreover SCGLR is also able to deal with multiple conceptually homogeneous explanatory variable groups (Bry et al. 2018).

SCGLR is a set of R functions illustrated on a floristic data set, genus. scglr and scglrTheme are respectively dedicated to fitting the model with one or more thematic group of regressors. scglrCrossVal and scglrThemeBackward are respectively dedicated to selecting the number of components. print, summary and plot methods are also available for the scglr and scglrTheme function results.

Different works are in progress both dealing for instance with the inclusion of random effects extending SCGLR to the generalized linear mixed model framework (Chauvet, Trottier, and Bry 2018a, 2018b), or the Cox regression model.

References

Boulesteix, Anne-Laure, and Korbinian Strimmer. 2007. “Partial Least Squares: A Versatile Tool for the Analysis of High-Dimensional Genomic Data.” Briefings in Bioinformatics 8 (1): 32–44. http://bib.oxfordjournals.org/content/8/1/32.short.

Bry, Xavier, Catherine Trottier, Frédéric Mortier, and Guillaume Cornu. 2019. “Component-Based Regularisation of a Multivariate GLM with a Thematic Partitioning of the Explanatory Variables.” Statistical Modelling 19 (0): 00–00 (to appear). <https://doi.org/TO BE ADDED>.

Bry, X., C. Trottier, F. Mortier, and G Cornu. 2018. “Component-Based Regularisation of a Multivariate Glm with a Thematic Partitioning of the Explanatory Variables.” Statistical Modelling, In press.

Bry, X., C. Trottier, F. Mortier, G. Cornu, and Verron T. 2016. “Supervised-Component-Based Generalised Linear Regression with Multiple Explanatory Blocks: THEME-Scglr.” In The Multiple Facets of Partial Least Squares and Related Methods, edited by H. Abdi, V.E. Vinzi, V. Russolillo, G. Saporta, and L Trinchera, 141–54. Switzerland: Springer Proceedings in Mathematics & Statistics.

Bry, X., C. Trottier, T. Verron, and F. Mortier. 2013. “Supervised Component Generalized Linear Regression Using a Pls-Extension of the Fisher Scoring Algorithm.” Journal of Multivariate Analysis 119: 47–60. http://www.sciencedirect.com/science/article/pii/S0047259X13000407.

Bry, X., and T Verron. 2015. “THEME: THEmatic Model Exploration Through Multiple Co-Structure Maximization.” Journal of Chemometrics 29 (12): 637–47. http://onlinelibrary.wiley.com/doi/10.1002/cem.2759/full.

Carrascal, Luis M., Ismael Galván, and Oscar Gordo. 2009. “Partial Least Squares Regression as an Alternative to Current Regression Methods Used in Ecology.” Oikos 118 (5): 681–90. http://onlinelibrary.wiley.com/doi/10.1111/j.1600-0706.2008.16881.x/full.

Chauvet, J., C. Trottier, and X Bry. 2018a. “Component-Based Regularisation of Multivariate Generalised Linear Mixed Models.” Journal of Computational and Graphical Statistics, In press.

———. 2018b. “Regularisation of Generalised Linear Mixed Models with Autoregressive Random Effect.” Journal of Computational and Graphical Statistics, In prep.

Sabatier, R., J. D. Lebreton, and D. Chessel. 1989. “Principal Component Analysis with Instrumental Variables as a Tool for Modelling Composition Data.” Multiway Data Analysis, 341–52.

Tenenhaus, M. 1998. La Régression PLS: Théorie et Pratique. Paris: Editions Technip. https://books.google.fr/books?hl=fr&lr=&id=OesjK2KZhsAC&oi=fnd&pg=PA1&dq=Tenenhaus+PLS&ots=EvUst85CEP&sig=EpksVNlZFUVoYLX7JX952PIGaHU.

Ter Braak, Cajo JF. 1987. “The Analysis of Vegetation-Environment Relationships by Canonical Correspondence Analysis.” In Theory and Models in Vegetation Science, 69–77. Springer. https://link.springer.com/chapter/10.1007/978-94-009-4061-1_7.

News

version 3.0

This major version introduces a new feature allowing to group covariates in so called themes.

  • added scglrTheme and scglrThemeBackward to handle theme oriented selection
  • reworked multivariateFormula to handle themes
  • added new plots targeting themes
  • deprecated barplot in favor of screeplot (same parameters)

version 2.1

  • Removed LPLS legacy method
  • changed from ING to PING

version 2.0

  • New method is available : SR (Structural Relevance) see vignette
  • Major rewrite of plot styling (not backward compatible)
  • Various fixes and improvements (especially when dealing with a single dependant variable)

version 1.0

Initial version of SCGLR

  • method LPLS (Local PLS)

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("SCGLR")

3.0 by Guillaume Cornu, 10 months ago


https://scnext.github.io/SCGLR, https://github.com/SCnext/SCGLR, https://cran.r-project.org/package=SCGLR


Report a bug at https://github.com/SCnext/SCGLR/issues


Browse source code at https://github.com/cran/SCGLR


Authors: Guillaume Cornu [aut, cre] , Frederic Mortier [aut] , Catherine Trottier [aut] , Xavier Bry [aut] , Sylvie Gourlet-Fleury [dtc] , http://www.coforchange.eu/) , Claude Garcia [dtc] , http://www.cofortips.org/)


Documentation:   PDF Manual  


CeCILL-2 | GPL-2 license


Imports Matrix, Formula, expm, graphics, ggplot2, grid, pROC, ade4

Suggests parallel, gridExtra


See at CRAN