The R package 'ashr' implements an Empirical Bayes
approach for large-scale hypothesis testing and false discovery
rate (FDR) estimation based on the methods proposed in
M. Stephens, 2016, "False discovery rates: a new deal",
This repository contains an R package for performing "Adaptive Shrinkage."
To install the ashr package first you need to install devtools:
install.packages("devtools")library(devtools)install_github("stephens999/ashr")
To use the interior-point solver (optmethod = "mixIP"
), you need to
install MOSEK and the
Rmosek package. We have
provided some Mac-specific and
Linux-specific instructions for installing MOSEK.
The main function in the ashr package is ash
. To get minimal help:
library(ashr)?ash
The ashr ("Adaptive SHrinkage") package aims to provide simple, generic, and flexible methods to derive "shrinkage-based" estimates and credible intervals for unknown quantities $\beta=(\beta_1,\dots,\beta_J)$, given only estimates of those quantities ($\hat\beta=(\hat\beta_1,\dots, \hat\beta_J)$) and their corresponding estimated standard errors ($s=(s_1,\dots,s_J)$).
The "adaptive" nature of the shrinkage is two-fold. First, the appropriate amount of shrinkage is determined from the data, rather than being pre-specified. Second, the amount of shrinkage undergone by each $\hat\beta_j$ will depend on the standard error $s_j$: measurements with high standard error will undergo more shrinkage than measurements with low standard error.
The methods are based on treating the vectors $\hat\beta$ and $s$ as "observed data", and then performing inference for $\beta$ from these observed data, using a standard hierarchical modelling framework to combine information across $j=1,\dots,J$.
Specifically, we assume that the true $\beta_j$ values are independent
and identically distributed from some unimodal distribution $g$. By
default we assume $g$ is unimodal about zero and symmetric. You can
specify or estimate a different mode using the mode
parameter. You
can allow for asymmetric $g$ by specifying
mixcompdist="halfuniform"
.
Then, we assume that the observations $\hat\beta_j \sim
N(\beta_j,s_j)$, or alternatively the normal assumption can be
replaced by a $t$ distribution by specifying df
, the number of
degrees of freedom used to estimate $s_j$. Actually this is
important: do be sure to specify df
if you can.
There are a few major changes in output and input that will likely break existing dependencies. Here are the highlights
the main output (lfsr, lfdr, etc) is rearranged into a dataframe,
called result
.
so, for example, the lfsr is now a$result$lfsr
instead of
a$lfsr
or, better, use the accessor function get_lfsr(a)
to extract
the lfsr etc
I added accessor functions get_lfsr
, get_lfdr
, get_pm
,
get_psd
etc to access the lfsr, lfdr, posterior mean and posterior
standard deviation. Using these functions to access results will
help ensure your code remains valid if I happen to change the
internal structure of the results again (although not
anticipated...)
output fitted.g
is renamed fitted_g
, and flash.data
becomes
flash_data
to make the whole package convention more
consistent. Also fit
becomes fit_details
.
function prefixes comppost
and compdens
replaced with
comp_post
and comp_dens
, again for consistency.
nonzeromode
option is replaced with the option mode
to specify
mode. Or use mode="estimate"
to estimate the mode.
more flexible control of output. For example, you can say you want
only the logLR output by specifying outputlevel = c("lfsr","logLR")
, or only posterior mean by outputlevel = c("PosteriorMean")
.