Functions for dissimilarity analysis and memory-based learning
(MBL, a.k.a local modeling) in complex spectral data sets.
Most of these functions are based the methods presented in
Ramirez-Lopez et al. (2013)
Leo Ramirez-Lopez & Antoine Stevens
resemble site here
Installing the package is very simple:
If you do not have the following packages installed, in some cases it is better to install them first
install.packages('Rcpp') install.packages('RcppArmadillo') install.packages('foreach') install.packages('iterators')
Note: Apart from these packages we stronly recommend to download and install Rtools (directly from here or from CRAN https://cran.r-project.org/bin/windows/Rtools/).
This is important for obtaining the proper C++ toolchain that you might need for using
install.packages('C:/MyFolder/resemble-1.2.2.zip', repos = NULL)
The development version can be obtained at the package website
resemble you should be also able to run the following lines:
require(resemble) help(mbl) #install.packages('prospectr') require(prospectr) data(NIRsoil) Xu <- NIRsoil$spc[!as.logical(NIRsoil$train),] Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)] Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)] Xr <- NIRsoil$spc[as.logical(NIRsoil$train),] Xu <- Xu[!is.na(Yu),] Xr <- Xr[!is.na(Yr),] Yu <- Yu[!is.na(Yu)] Yr <- Yr[!is.na(Yr)] # Example of the mbl function # A mbl approach (the spectrum-based learner) as implemented in Ramirez-Lopez et al. (2013) # An exmaple where Yu is supposed to be unknown, but the Xu (spectral variables) are known ctrl <- mblControl(sm = 'pc', pcSelection = list('opc', 40), valMethod = 'NNv', center = TRUE) sbl.u <- mbl(Yr = Yr, Xr = Xr, Yu = NULL, Xu = Xu, mblCtrl = ctrl, dissUsage = 'predictors', k = seq(40, 150, by = 10), method = 'gpr') getPredictions(sbl.u)
resemble implements a function dedicated to non-linear modelling of complex visible and infrared spectral data based on memory-based learning (MBL, a.k.a instance-based learning or local modelling in the chemometrics literature). The package also includes functions for: computing and evaluate spectral similarity/dissimilarity matrices; projecting the spectra onto low dimensional orthogonal variables; removing irrelevant spectra from a reference set; etc.
The functions for computing and evaluate spectral similarity/dissimilarity matrices can be summarized as follows:
fDiss: Euclidean and Mahalanobis distances as well as the cosine dissimilarity (a.k.a spectral angle mapper)
corDiss: correlation and moving window correlation dissimilarity
sid: spectral information divergence between spectra or between the probability distributions of spectra
orthoDiss: principal components and partial least squares dissimilarity (including several options)
simEval: evaluates a given similarity/dissimilarity matrix based on the concept of side information
The functions for projecting the spectra onto low dimensional orthogonal variables are:
pcProjection: projects the spectra onto a principal component space
plsProjection: projects the spectra onto a partial least squares component space (a.k.a projection to latent structures)
orthoProjection: reproduces either the
pcProjection or the
The projection functions also offer different options for optimizing/selecting the number of components involved in the projection.
The functions modelling the spectra using memory-based learning are:
mblControl: controls some modelling aspects of the
mbl: models the spectra by memory-based learning
Some additional miscellaneous functions are:
print.mbl: prints a summary of the results obtained by the
plot.mbl: plots a summary of the results obtained by the
print.localOrthoDiss: prints local distance matrices generated with the
In order to expand a little bit more the explanation on the
mbl function, let's define first the basic input datasets:
Reference (training) set: Dataset with n reference samples (e.g. spectral library) to be used in the calibration of spectral models. Xr represents the matrix of samples (containing the spectral predictor variables) and Yr represents a given response variable corresponding to Xr.
Prediction set : Data set with m samples where the response variable (Yu) is unknown. However it can be predicted by applying a spectral model (calibrated by using Xr and Yr) on the spectra of these samples (Xu).
In order to predict each value in Yu, the
mbl function takes each sample in Xu and searches in Xr for its k-nearest neighbours (most spectrally similar samples). Then a (local) model is calibrated with these (reference) neighbours and it immediately predicts the correspondent value in Yu from Xu. In the function, the k-nearest neighbour search is performed by computing spectral similarity/dissimilarity matrices between samples. The
mbl function offers the following regression options for calibrating the (local) models:
'gpr': Gaussian process with linear kernel
'pls': Partial least squares
'wapls1': Weighted average partial least squares 1
'wapls2': Weighted average partial least squares 2 (no longer supported)