An implementation of the data processing and data analysis portion of a pipeline named the PepSAVI-MS which is currently under development by the Hicks laboratory at the University of North Carolina. The statistical analysis package presented herein provides a collection of software tools used to facilitate the prioritization of putative bioactive peptides from a complex biological matrix. Tools are provided to deconvolute mass spectrometry features into a single representation for each peptide charge state, filter compounds to include only those possibly contributing to the observed bioactivity, and prioritize these remaining compounds for those most likely contributing to each bioactivity data set.
The PepSAVIms R package provides a collection of software tools used to facilitate the prioritization of putative bioactive compounds from a complex biological matrix. The package was constructed to provide an implementation of the statistical portion of the laboratory and statistical procedure proposed in The PepSAVI-MS pipeline for natural product bioactive peptide discovery, by Kirkpatrick et al.
The software in this package aims to perform the following steps, described in more detail below.
The mass spectrometry abundance data can optionally undergo two preprocessing steps. The first step is a consolidation step: the goal is to to consolidate mass spectrometry observations in the data that are believed to belong to the same underlying compound. In other words, the instrumentation may have obtained multiple reads of mass spectrometry abundances that in actuality belong to the same compound - in which case we wish to attribute all of those observations to a single compound.
The second optional preprocessing step for the mass spectrometry abundance data is a filtering step. The goal of the filtering step is to further reduce the data set to focus on only those compounds that could plausibly be contributing to the bioactivity area of interest. Furthermore, these criteria aim to filter out some of the noise detected in the dataset. By filtering the candidate set prior to statistical analysis, the ability of the analysis to effectively differentiate such compounds is greatly increased.
Once the mass spectrometry abundance data has optionally undergone any preprocessing steps, a statistical procedure to search for putative bioactive peptides is performed. The procedure works by specifying the level of the L2 penalty parameter in the elastic net penalty, and tracking the inclusion of the coefficients corresponding to compounds into the nonzero set along the elastic net path. An ordered list of candidate compounds is obtained by providing the order in which the coefficients corresponding to compounds entered the nonzero set.
Please see the R function documentation or the package vignettes for far more information regarding the use of this package.
One of the laboratories that provided a bioactivity data set reported that they believe that their data was unreliable; consequently we have removed this data from the data provided by the package, and rebuilt the vignettes without using that particular data set.
Initial release of the PepSAVIms R package. This is an implementation of the data processing and data analysis portion of the pipeline proposed in Kirkpatrick et al.: The PepSAVI-MS Pipeline for Natural Product Bioactive Peptide Discovery.