MSigDB Gene Sets for Multiple Organisms in a Tidy Data Format

Provides the 'Molecular Signatures Database' (MSigDB) gene sets typically used with the 'Gene Set Enrichment Analysis' (GSEA) software (Subramanian et al. 2005 , Liberzon et al. 2015 ) in a standard R data frame with key-value pairs. The package includes the original human gene symbols and NCBI/Entrez IDs as well as the equivalents for frequently studied model organisms such as mouse, rat, pig, fly, and yeast.

The msigdbr R package provides Molecular Signatures Database (MSigDB) gene sets typically used with the Gene Set Enrichment Analysis (GSEA) software:

  • in an R-friendly format (a data frame in a "long" format with one gene per row)
  • for multiple frequently studied model organisms (human, mouse, rat, pig, fly, yeast, etc.)
  • as both gene symbols and Entrez Gene IDs (for better compatibility with pathway enrichment tools)
  • that can be used in a script without requiring additional external files

Details and examples are described in the vignette.


The package can be installed from CRAN.



Load package.


Check the available species.

#>  [1] "Bos taurus"               "Caenorhabditis elegans"   "Canis lupus familiaris"  
#>  [4] "Danio rerio"              "Drosophila melanogaster"  "Gallus gallus"           
#>  [7] "Homo sapiens"             "Mus musculus"             "Rattus norvegicus"       
#> [10] "Saccharomyces cerevisiae" "Sus scrofa"

Retrieve all human gene sets.

m_df = msigdbr(species = "Homo sapiens")
#> # A tibble: 6 x 9
#>   gs_name    gs_id gs_cat gs_subcat human_gene_symb… species_name entrez_gene gene_symbol
#>   <chr>      <chr> <chr>  <chr>     <chr>            <chr>              <int> <chr>      
#> 1 AAACCAC_M… M126… C3     MIR       ABCC4            Homo sapiens       10257 ABCC4      
#> 2 AAACCAC_M… M126… C3     MIR       ACTN4            Homo sapiens          81 ACTN4      
#> 3 AAACCAC_M… M126… C3     MIR       ACVR1            Homo sapiens          90 ACVR1      
#> 4 AAACCAC_M… M126… C3     MIR       ADAM9            Homo sapiens        8754 ADAM9      
#> 5 AAACCAC_M… M126… C3     MIR       ADAMTS5          Homo sapiens       11096 ADAMTS5    
#> 6 AAACCAC_M… M126… C3     MIR       AGER             Homo sapiens         177 AGER       
#> # ... with 1 more variable: sources <chr>

Retrieve mouse hallmark collection gene sets.

m_df = msigdbr(species = "Mus musculus", category = "H")
#> # A tibble: 6 x 9
#>   gs_name    gs_id gs_cat gs_subcat human_gene_symb… species_name entrez_gene gene_symbol
#>   <chr>      <chr> <chr>  <chr>     <chr>            <chr>              <int> <chr>      
#> 1 HALLMARK_… M5905 H      ""        ABCA1            Mus musculus       11303 Abca1      
#> 2 HALLMARK_… M5905 H      ""        ABCB8            Mus musculus       74610 Abcb8      
#> 3 HALLMARK_… M5905 H      ""        ACAA2            Mus musculus       52538 Acaa2      
#> 4 HALLMARK_… M5905 H      ""        ACADL            Mus musculus       11363 Acadl      
#> 5 HALLMARK_… M5905 H      ""        ACADM            Mus musculus       11364 Acadm      
#> 6 HALLMARK_… M5905 H      ""        ACADS            Mus musculus       11409 Acads      
#> # ... with 1 more variable: sources <chr>


msigdbr 6.2.1

  • Based on MSigDB v6.2 release.

msigdbr 6.1.1

  • Based on MSigDB v6.1 release.
  • Initial CRAN submission.

