Generalized Berk-Jones Test for Set-Based Inference in Genetic Association Studies

Offers the Generalized Berk-Jones (GBJ) test for set-based inference in genetic association studies. The GBJ is designed as an alternative to tests such as Berk-Jones (BJ), Higher Criticism (HC), Generalized Higher Criticism (GHC), Minimum p-value (minP), and Sequence Kernel Association Test (SKAT). All of these other methods (except for SKAT) are also implemented in this package, and we additionally provide an omnibus test (OMNI) which integrates information from each of the tests. The GBJ has been shown to outperform other tests in genetic association studies when signals are correlated and moderately sparse. Please see the vignette for a quickstart guide or the paper at for full details.

What is GBJ?

The Generalized Berk-Jones statistic was developed to perform set-based inference in genetic association studies. It is an alternative to tests such as the Sequence Kernel Association Test (SKAT), Generalized Higher Criticism (GHC), and Minimum p-value (minP).

Why use GBJ?

GBJ is a generalization of the Berk-Jones (BJ) statistic, which offers - in a certain sense - asymptotic power guarantees for detection of rare and weak signals. GBJ modifies BJ to account for correlation between factors in a set. GBJ has been demonstrated to outperform other tests when signals are moderately sparse (more precisely, when the number of signals is between d1/4 and d1/2, where d is the number of factors in the set).

Other advantages include:

  1. Analytic p-value calculation (no need for permutation inference).
  2. Can be applied to individual-level genotype data or GWAS summary statistics.
  3. No tuning parameters. Accepts standard inputs (similar to glm() function).


We show a simple example for testing the association between a set of 50 SNPs (which could be, for example, from the same gene or pathway) and a binary outcome.

cancer_status <- c(rep(1,500), rep(0,500))
# We have 50 SNPs each with minor allele frequency of 0.3 in this example
genotype_data <- matrix(data=rbinom(n=1000*50, size=2, prob=0.3), nrow=1000)
age <- round( runif(n=1000, min=30, max=80) )
gender <- rbinom(n=1000, size=1, prob=0.5)     
# Fit the null model, calculate marginal score statistics for each SNP
# (asymptotically equivalent to those calculated by, for example, PLINK)
null_mod <- glm(cancer_status~age+gender, family=binomial(link="logit"))
log_reg_stats <- calc_score_stats(null_model=null_mod, factor_matrix=genotype_data, link_function="logit")
# Run the test
GBJ(test_stats=log_reg_stats$test_stats, cor_mat=log_reg_stats$cor_mat)
#> $GBJ
#> [1] 1.43984
#> $GBJ_pvalue
#> [1] 0.330911
#> $err_code
#> [1] 0

What else is in here?

We may not have convinced you that GBJ is the best option for your application. If that is the case, then you may still be interested in trying the Berk-Jones (BJ), Generalized Higher Criticism (GHC), Higher Criticism (HC), or Minimum p-value (minP) tests, which can be run with the same inputs, i.e. GHC(test_stats=score_stats, cor_mat=cor_Z) to run the GHC. We also have developed an omnibus test which information from multiple different methods. Please see the vignette for more details.


Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.5.3 by Ryan Sun, 2 years ago

Browse source code at

Authors: Ryan Sun [aut, cre]

Documentation:   PDF Manual  

GPL-3 license

Imports Rcpp, mvtnorm, SKAT, stats

Suggests knitr, rmarkdown, bindata, rje, testthat

Linking to Rcpp, BH

Imported by sGBJ, sumFREGAT.

See at CRAN