MSigDB Gene Sets for Multiple Organisms in a Tidy Data Format

Provides the 'Molecular Signatures Database' (MSigDB) gene sets typically used with the 'Gene Set Enrichment Analysis' (GSEA) software (Subramanian et al. 2005 , Liberzon et al. 2015 ) in a standard R data frame with key-value pairs. Included are the original human gene symbols and Entrez IDs as well as the equivalents for various frequently studied model organisms such as mouse, rat, pig, fly, and yeast.

CRAN Travis Build Status codecov


The msigdbr R package provides Molecular Signatures Database (MSigDB) gene sets typically used with the Gene Set Enrichment Analysis (GSEA) software:

  • in an R-friendly format (a data frame in a "long" format with one gene per row)
  • for multiple frequently studied model organisms (human, mouse, rat, pig, fly, yeast, etc.)
  • as both gene symbols and Entrez Gene IDs (for better compatibility with pathway enrichment tools)
  • that can be used in a script without requiring additional external files

Details and examples are described in the vignette.


The package can be installed from CRAN.



Load package.


Check the available species.

#>  [1] "Bos taurus"               "Caenorhabditis elegans"   "Canis lupus familiaris"  
#>  [4] "Danio rerio"              "Drosophila melanogaster"  "Gallus gallus"           
#>  [7] "Homo sapiens"             "Mus musculus"             "Rattus norvegicus"       
#> [10] "Saccharomyces cerevisiae" "Sus scrofa"

Retrieve all human gene sets.

m_df = msigdbr(species = "Homo sapiens")
#> # A tibble: 6 x 9
#>   gs_name    gs_id gs_cat gs_subcat human_gene_symb… species_name entrez_gene gene_symbol
#>   <chr>      <chr> <chr>  <chr>     <chr>            <chr>              <int> <chr>      
#> 1 AAACCAC_M… M126… C3     MIR       ABCC4            Homo sapiens       10257 ABCC4      
#> 2 AAACCAC_M… M126… C3     MIR       ACTN4            Homo sapiens          81 ACTN4      
#> 3 AAACCAC_M… M126… C3     MIR       ACVR1            Homo sapiens          90 ACVR1      
#> 4 AAACCAC_M… M126… C3     MIR       ADAM9            Homo sapiens        8754 ADAM9      
#> 5 AAACCAC_M… M126… C3     MIR       ADAMTS5          Homo sapiens       11096 ADAMTS5    
#> 6 AAACCAC_M… M126… C3     MIR       AGER             Homo sapiens         177 AGER       
#> # ... with 1 more variable: sources <chr>

Retrieve mouse hallmark collection gene sets.

m_df = msigdbr(species = "Mus musculus", category = "H")
#> # A tibble: 6 x 9
#>   gs_name    gs_id gs_cat gs_subcat human_gene_symb… species_name entrez_gene gene_symbol
#>   <chr>      <chr> <chr>  <chr>     <chr>            <chr>              <int> <chr>      
#> 1 HALLMARK_… M5905 H      ""        ABCA1            Mus musculus       11303 Abca1      
#> 2 HALLMARK_… M5905 H      ""        ABCB8            Mus musculus       74610 Abcb8      
#> 3 HALLMARK_… M5905 H      ""        ACAA2            Mus musculus       52538 Acaa2      
#> 4 HALLMARK_… M5905 H      ""        ACADL            Mus musculus       11363 Acadl      
#> 5 HALLMARK_… M5905 H      ""        ACADM            Mus musculus       11364 Acadm      
#> 6 HALLMARK_… M5905 H      ""        ACADS            Mus musculus       11409 Acads      
#> # ... with 1 more variable: sources <chr>


msigdbr 6.2.1

  • Based on MSigDB v6.2 release.

msigdbr 6.1.1

  • Based on MSigDB v6.1 release.
  • Initial CRAN submission.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


6.2.1 by Igor Dolgalev, 4 months ago

Report a bug at

Browse source code at

Authors: Igor Dolgalev [aut, cre]

Documentation:   PDF Manual  

MIT + file LICENSE license

Imports magrittr, rlang

Depends on dplyr, tibble

Suggests testthat, knitr, rmarkdown

See at CRAN