Large-Scale Bayesian Variable Selection Using Variational Methods

Fast algorithms for fitting Bayesian variable selection models and computing Bayes factors, in which the outcome (or response variable) is modeled using a linear regression or a logistic regression. The algorithms are based on the variational approximations described in "Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies" (P. Carbonetto & M. Stephens, 2012, ). This software has been applied to large data sets with over a million variables and thousands of samples.


CRAN status badge Travis CI Build Status Appveyor Build status codecov

See also the varbvs R package website generated using pkgdown.

Citing varbvs

If you find that this software is useful for your research project, please cite our paper:

Carbonetto, P. and Stephens, M. (2012). Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Analysis 7, 73-108.

License

Copyright (c) 2012-2017, Peter Carbonetto.

The varbvs source code repository by Peter Carbonetto is free software: you can redistribute it under the terms of the GNU General Public License. All the files in this project are part of varbvs. This project is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See file LICENSE for the full text of the license.

Setup

To install the varbvs CRAN release (link), in R run:

install.packages("varbvs")

Alternatively, you can to install the most up-to-date development version. The easiest way to accomplish this is using the devtools package:

install.packages("devtools")
library(devtools)
install_github("pcarbo/varbvs",subdir = "varbvs-R")

Without devtools, it is a little more complicated, but not hard. Begin by downloading the github repository for this project. The simplest way to do this is to download the repository as a ZIP archive. Once you have extracted the files from the compressed archive, you will see that the main directory has two subdirectories, one containing the MATLAB code, and the other containing the files for the R package.

This subdirectory has all the necessary files to build and install a package for R. To install this package, follow the standard instructions for installing an R package from source. On a Unix or Unix-like platform (e.g., Mac OS X), the following steps should install the R package:

mv varbvs-R varbvs
R CMD build varbvs
R CMD INSTALL varbvs_2.4-0.tar.gz

Using the package

Once you have installed the package, load the package in R by entering

library(varbvs)

To get an overview of the package, enter

help(package = "varbvs")

The key function in this package is function varbvs. Here is an example in which we fit the variable selection model to the Leukemia data:

library(varbvs)
data(leukemia)
fit <- varbvs(leukemia$x,NULL,leukemia$y,family = "binomial",
              logodds = seq(-3.5,-1,0.1),sa = 1)
print(summary(fit))

To get more information about this function, type

help(varbvs)

Working examples

We have provided several R scripts in the vignettes and testthat folders to illustrate application of varbvs to small and large data sets:

  • Script demo.qtl.R demonstrates how to use the varbvs function for mapping a quantitative trait (i.e., a continuously valued outcome) in a small, simulated data set. Script demo.cc.R demonstrates mapping of a binary valued outcome in a simulated data set.

  • The leukemia.Rmd vignette demonstrates application of both glmnet and varbvs to the Leukemia data. The main aim of this script is to illustrate some of the different properties of varbvs (Bayesian variable selection) and glmnet (penalized sparse regression).

  • Like demo.qtl.R, the cfw.Rmd vignette demonstrates varbvs for mapping genetic factors contributing to a quantitative trait, but here it is applied to an actual data set generated from an outbred mouse study.

  • Finally, the cd.Rmd and cytokine.Rmd vignettes illustrate how the varbvs package can be applied to a very large data set to map genetic loci and test biological hypotheses about genetic factors contributing to human disease risk. Although we cannot share the data needed to run these scripts due to data privacy restrictions, we have included these scripts because it is helpful to be able to follow the steps given in these R scripts.

How to build static HTML documentation

These are the R commands to build the website (make sure you are connected to Internet while running these commands, and the working directory is set to varbvs-R):

library(pkgdown)
build_site(examples = FALSE,mathjax = FALSE)

After updating the webpages, I reorder the vignettes manually and change the unordered list to an ordered list.

Credits

The varbvs software package was developed by:
Peter Carbonetto
Dept. of Human Genetics, University of Chicago
2012-2017

Xiang Zhou, Xiang Zhu, Matthew Stephens and others have also contributed to the development of this software.

News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("varbvs")

2.4-0 by Peter Carbonetto, a year ago


http://github.com/pcarbo/varbvs


Report a bug at http://github.com/pcarbo/varbvs/issues


Browse source code at https://github.com/cran/varbvs


Authors: Peter Carbonetto [aut, cre] , Matthew Stephens [aut] , David Gerard [aut]


Documentation:   PDF Manual  


GPL (>= 3) license


Imports methods, Matrix, stats, graphics, lattice, latticeExtra, Rcpp

Suggests glmnet, qtl, knitr, rmarkdown, testthat

Linking to Rcpp


See at CRAN