# Bayes Factors, Model Choice and Variable Selection in Linear Models

Conceived to calculate Bayes factors in Linear models and then to provide a formal Bayesian answer to testing and variable selection problems. From a theoretical side, the emphasis in this package is placed on the prior distributions and it allows a wide range of them: Jeffreys (1961); Zellner and Siow(1980); Zellner and Siow(1984); Zellner (1986); Fernandez et al. (2001); Liang et al. (2008) and Bayarri et al. (2012). The interaction with the package is through a friendly interface that syntactically mimics the well-known lm() command of R. The resulting objects can be easily explored providing the user very valuable information (like marginal, joint and conditional inclusion probabilities of potential variables; the highest posterior probability model, HPM; the median probability model, MPM) about the structure of the true -data generating- model. Additionally, this package incorporates abilities to handle problems with a large number of potential explanatory variables through parallel and heuristic versions of the main commands, Garcia-Donato and Martinez-Beneito (2013). It also allows problems with p>n and p>>n and also incorporates routines to handle problems with variable selection with factors.

Hypothesis testing, model selection and model averaging are important statistical problems that have in common the explicit consideration of the uncertainty about which is the true model. The formal Bayesian tool to solve such problems is the Bayes factor (Kass and Raftery, 1995) that reports the evidence in the data favoring each of the entertained hypotheses/models and can be easily translated to posterior probabilities.

This package has been specifically conceived to calculate Bayes factors in linear models and then to provide a formal Bayesian answer to testing and variable selection problems. From a theoretical side, the emphasis in the package is placed on the prior distributions (a very delicate issue in this context) and `BayesVarSel` allows using a wide range of them: Jeffreys-Zellner-Siow (Jeffreys, 1961; Zellner and Siow, 1980,1984) Zellner (1986); Fernandez et al. (2001), Liang et al. (2008) and Bayarri et al. (2012).

# Installation

The stable version can be installed using:

``````install.packages("BayesVarSel")
``````

### Development release

You can track, download the latest version or contribute to the development of `BayesVarSel` at https://github.com/comodin19/BayesVarSel. To install the most recent version of the package (1.8.0) you should:

1. Install devtools from CRAN with `install.packages("devtools")`.

2. Install the development version of `BayesVarSel` from GitHub:

devtools::install_github("comodin19/BayesVarSel")

# Features

The interaction with the package is through a friendly interface that syntactically mimics the well-known lm command of `R`. The resulting objects can be easily explored providing the user very valuable information (like marginal, joint and conditional inclusion probabilities of potential variables; the highest posterior probability model, HPM; the median probability model, MPM) about the structure of the true -data generating- model. Additionally, `BayesVarSel` incorporates abilities to handle problems with a large number of potential explanatory variables through parallel and heuristic versions (Garcia-Donato and Martinez-Beneito, 2013) of the main commands.

# Usage

## Variable selection

``````library(BayesVarSel)

data(Hald)
hald_Bvs <- Bvs(formula = y ~ x1 + x2 + x3 + x4, data = Hald)
#> Info. . . .
#> Most complex model has 5 covariates
#> From those 1 is fixed and we should select from the remaining 4
#> x1, x2, x3, x4
#> The problem has a total of 16 competing models
#> Of these, the  10 most probable (a posteriori) are kept
#> Working on the problem...please wait.
summary(hald_Bvs)
#>
#> Call:
#> Bvs(formula = y ~ x1 + x2 + x3 + x4, data = Hald)
#>
#> Inclusion Probabilities:
#>    Incl.prob. HPM MPM
#> x1     0.9762   *   *
#> x2     0.7563   *   *
#> x3     0.2624
#> x4     0.4153
#> ---
#> Code: HPM stands for Highest posterior Probability Model and
#>  MPM for Median Probability Model.
#>
colMeans(predict(hald_Bvs, Hald[1:2,]))
#>
#> Simulations obtained using the best 10 models
#> that accumulate 1 of the total posterior probability
#>  78.86902 73.09265

# Simulate coefficients
set.seed(171) # For reproducibility of simulations.

sim_coef <- BMAcoeff(hald_Bvs);
#>
#> Simulations obtained using the best 10 models
#> that accumulate 1 of the total posterior probability

colMeans(sim_coef)
#>  Intercept         x1         x2         x3         x4
#> 70.9736117  1.4166974  0.4331986 -0.0409743 -0.2170662
``````

## Hypothesis testing

``````library(BayesVarSel)
data(Hald)
fullmodel <- y ~ x1 + x2 + x3 + x4
reducedmodel <- y ~ x1 + x2
nullmodel <- y ~ 1
Btest(models = c(H0 = nullmodel, H1 = fullmodel, H2 = reducedmodel), data = Hald)
#> ---------
#> Models:
#> \$H0
#> y ~ 1
#>
#> \$H1
#> y ~ x1 + x2 + x3 + x4
#>
#> \$H2
#> y ~ x1 + x2
#>
#> ---------
#> Bayes factors (expressed in relation to H0)
#>  H0.to.H0  H1.to.H0  H2.to.H0
#>       1.0   44300.8 3175456.4
#> ---------
#> Posterior probabilities:
#>    H0    H1    H2
#> 0.000 0.014 0.986
``````

## References

• Bayarri, M.J., Berger, J.O., Forte, A. and Garcia-Donato, G. (2012). Criteria for Bayesian Model choice with Application to Variable Selection. Annals of Statistics, 40: 1550-1577. DOI: 10.1214/12-aos1013
• Fernandez, C., Ley, E. and Steel, M.F.J. (2001). Benchmark priors for Bayesian model averaging. Journal of Econometrics, 100: 381-427. DOI: 10.1016/s0304-4076(00)00076-2
• Garcia-Donato, G. and Martinez-Beneito, M.A. (2013). On sampling strategies in Bayesian variable selection problems with large model spaces. Journal of the American Statistical Association, 108: 340-352. DOI: 10.1080/01621459.2012.742443
• Liang, F., Paulo, R., Molina, G., Clyde, M. and Berger, J.O. (2008). Mixtures of g-priors for Bayesian Variable Selection. Journal of the American Statistical Association, 103: 410-423.DOI: 10.1198/016214507000001337
• Zellner, A. and Siow, A. (1980). Posterior Odds Ratio for Selected Regression Hypotheses. Trabajos de Estadística y de Investigación Operativa, 31: 585. DOI: 10.1007/bf02888369
• Zellner, A. and Siow, A. (1984). Basic Issues in Econometrics. Chicago: University of Chicago.
• Zellner, A. (1986). On Assessing Prior Distributions and Bayesian Regression Analysis with g-prior Distributions. In A. Zellner (ed.), Bayesian Inference and Decision techniques: Essays in Honor of Bruno de Finetti, 389-399. Edward Elgar Publishing Limited. DOI: 10.2307/2233941

# BayesVarSel 1.8.0

• Merged Bvs and PBvs in just one function (called Bvs and old PBvs disappears). Bvs now has two extra parameters to control parallelization: parallel and n.nodes.

• Several changes in Btest: 1) it now allows for unnamed lists of models and if unnamed lists are provided, default names are given by the function. 2) The prior probabilities argument priorprobs does not have to be named (by default the order of the models in argument models is used). 3) Deprecated argument relax.nest, replaced by the explicit definition of the null model by the user via the argument null.model

• In Bvs and GibbsBvs the order of arguments has slightly changed and now data is the second argument followed by formula.

• Now plotBvs is a S3 function defined as plot.Bvs

• Now predictBvs is a S3 function defined as predict.Bvs

• In Bvs the argument fixed.cov is deprecated, now replaced by the formula for the null model in the argument null.model

• Option "trace" added to plot

• print updated to show the 10 most probable models (among the visited) for method Gibbs

• removed the comment at initialization

• BMAcoeff no longer shows the graphic for all the variables, instead use histBMA to plot the posterior distribution of the coefficients.

# Reference manual

install.packages("BayesVarSel")

2.0.1 by Anabel Forte, a year ago

https://github.com/comodin19/BayesVarSel

Report a bug at https://github.com/comodin19/BayesVarSel/issues

Browse source code at https://github.com/cran/BayesVarSel

Authors: Gonzalo Garcia-Donato [aut] , Anabel Forte [aut, cre] , Carlos Vergara-Hernández [ctb]

Documentation:   PDF Manual