Download and Explore Datasets from UCSC Xena Data Hubs

Download and explore datasets from UCSC Xena data hubs, which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded.

CRAN AppVeyor BuildStatus Travis buildstatus

CoverageStatus GitHubissues Closedissues

UCSCXenaTools is an R package for downloading and exploring data from UCSC Xena data hubs, which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded.

Table of Contents


Install stable release from CRAN with:


You can also install devel version of UCSCXenaTools from github with:

# install.packages("remotes")
remotes::install_github("ShixiangWang/UCSCXenaTools", build_vignettes = TRUE)

Data Hub List

All datasets are available at

Currently, UCSCXenaTools supports all 9 data hubs of UCSC Xena.

If the url of data hub changed, please remind me by emailing to [email protected] or opening an issue on GitHub.


Download UCSC Xena datasets and load them into R by UCSCXenaTools is a workflow with generate, filter, query, download and prepare 5 steps, which are implemented as XenaGenerate, XenaFilter, XenaQuery, XenaDownload and XenaPrepare functions, respectively. They are very clear and easy to use and combine with other packages like dplyr.

To show the basic usage of UCSCXenaTools, we download clinical data of LUNG, LUAD, LUSC from TCGA (hg19 version) data hub.

XenaData data.frame

UCSCXenaTools uses a data.frame object (built in package) XenaData to generate an instance of XenaHub class, which communicates with API of UCSC Xena Data Hubs.

You can load XenaData after loading UCSCXenaTools into R.

#> =========================================================================
#> UCSCXenaTools version 1.0.0
#> Github page:
#> Documentation:
#> If you use it in published research, please cite:
#> Wang, Shixiang, et al. "APOBEC3B and APOBEC mutational signature
#>     as potential predictive markers for immunotherapy
#>     response in non-small cell lung cancer." Oncogene (2018).
#> =========================================================================
#>                         XenaHosts XenaHostNames
#> 1   UCSC_Public
#> 2   UCSC_Public
#> 3   UCSC_Public
#> 4   UCSC_Public
#> 5   UCSC_Public
#> 6   UCSC_Public
#>                                     XenaCohorts
#> 1 Acute lymphoblastic leukemia (Mullighan 2008)
#> 2 Acute lymphoblastic leukemia (Mullighan 2008)
#> 3 Acute lymphoblastic leukemia (Mullighan 2008)
#> 4                   Breast Cancer (Caldas 2007)
#> 5                   Breast Cancer (Caldas 2007)
#> 6                   Breast Cancer (Caldas 2007)
#>                                               XenaDatasets
#> 1    mullighan2008_public/mullighan2008_500K_genomicMatrix
#> 2 mullighan2008_public/mullighan2008_public_clinicalMatrix
#> 3    mullighan2008_public/mullighan2008_SNP6_genomicMatrix
#> 4              Caldas2007/chinSF2007_public_clinicalMatrix
#> 5             Caldas2007/chinSFGenomeBio2007_genomicMatrix
#> 6                   Caldas2007/naderi2007Exp_genomicMatrix


Select datasets.

# The options in XenaFilter function support Regular Expression
XenaGenerate(subset = XenaHostNames=="TCGA") %>% 
  XenaFilter(filterDatasets = "clinical") %>% 
  XenaFilter(filterDatasets = "LUAD|LUSC|LUNG") -> df_todo
#> class: XenaHub 
#> hosts():
#> cohorts() (3 total):
#>   TCGA Lung Adenocarcinoma (LUAD)
#>   TCGA Lung Cancer (LUNG)
#>   TCGA Lung Squamous Cell Carcinoma (LUSC)
#> datasets() (3 total):
#>   TCGA.LUAD.sampleMap/LUAD_clinicalMatrix
#>   TCGA.LUNG.sampleMap/LUNG_clinicalMatrix
#>   TCGA.LUSC.sampleMap/LUSC_clinicalMatrix

Query and download.

XenaQuery(df_todo) %>%
  XenaDownload() -> xe_download
#> This will check url status, please be patient.
#> We will download files to directory /var/folders/mx/rfkl27z90c96wbmn3_kjk8c80000gn/T//RtmpOTE23c.
#> Downloading TCGA.LUAD.sampleMap__LUAD_clinicalMatrix.gz
#> Downloading TCGA.LUNG.sampleMap__LUNG_clinicalMatrix.gz
#> Downloading TCGA.LUSC.sampleMap__LUSC_clinicalMatrix.gz
#> Note file names inherit from names in datasets column
#>   and '/' all changed to '__'.

Prepare data into R for analysis.

cli = XenaPrepare(xe_download)
#> [1] "list"
#> [1] "TCGA.LUAD.sampleMap__LUAD_clinicalMatrix.gz"
#> [2] "TCGA.LUNG.sampleMap__LUNG_clinicalMatrix.gz"
#> [3] "TCGA.LUSC.sampleMap__LUSC_clinicalMatrix.gz"

Browse datasets

Create two XenaHub objects:

  • to_browse - a XenaHub object contains a cohort and a dataset.
  • to_browse2 - a XenaHub object contains 2 cohorts and 2 datasets.
XenaGenerate(subset = XenaHostNames=="TCGA") %>%
    XenaFilter(filterDatasets = "clinical") %>%
    XenaFilter(filterDatasets = "LUAD") -> to_browse
#> class: XenaHub 
#> hosts():
#> cohorts() (1 total):
#>   TCGA Lung Adenocarcinoma (LUAD)
#> datasets() (1 total):
#>   TCGA.LUAD.sampleMap/LUAD_clinicalMatrix
XenaGenerate(subset = XenaHostNames=="TCGA") %>%
    XenaFilter(filterDatasets = "clinical") %>%
    XenaFilter(filterDatasets = "LUAD|LUSC") -> to_browse2
#> class: XenaHub 
#> hosts():
#> cohorts() (2 total):
#>   TCGA Lung Adenocarcinoma (LUAD)
#>   TCGA Lung Squamous Cell Carcinoma (LUSC)
#> datasets() (2 total):
#>   TCGA.LUAD.sampleMap/LUAD_clinicalMatrix
#>   TCGA.LUSC.sampleMap/LUSC_clinicalMatrix

XenaBrowse() function can be used to browse dataset/cohort links using your default web browser. At default, this function limit one dataset/cohort for preventing user to open too many links at once.

# This will open you web browser
XenaBrowse(to_browse, type = "cohort")
# This will throw error
#> Error in XenaBrowse(to_browse2): This function limite 1 dataset to browse.
#>  Set multiple to TRUE if you want to browse multiple links.
XenaBrowse(to_browse2, type = "cohort")
#> Error in XenaBrowse(to_browse2, type = "cohort"): This function limite 1 cohort to browse. 
#>  Set multiple to TRUE if you want to browse multiple links.

When you make sure you want to open multiple links, you can set multiple option to TRUE.

XenaBrowse(to_browse2, multiple = TRUE)
XenaBrowse(to_browse2, type = "cohort", multiple = TRUE)


More features and usages please read online documentation on CRAN or Github website.


Wang, Shixiang, et al. “APOBEC3B and APOBEC mutational signature as potential predictive markers for immunotherapy response in non-small cell lung cancer.” Oncogene (2018).


This package is based on XenaR, thanks Martin Morgan for his work.



please note, code from XenaR package under Apache 2.0 license.

Code of conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.



Bug fixes

  • fix API functions cannot be called from outside

New features

  • Add option to XenaDownload function for downloading gene id mapping data
  • Add XenaQueryProbeMap() for querying probemap of datasets.

Minor changes

  • export XenaHub Class
  • improve XenaQuery function
  • update website and documentation


  • reformat code files
  • update XenaData & XenaDataUpdate() function to obtain more info
  • add jsonlite as import package
  • add metadata for all datasets into package inst dir


  • update doc for APIs
  • modify description for package


  • this version will change many variable names or functions. Update and Read the documentation are highly recommended!
  • add API functions, this is inspired by xenaPython package
  • remove useless query sentence from old code from XenaR
  • I rename all hub names
  • single cell hub is added
  • new XenaData contains much more information
  • improve internal code


  • fix some grammar errors in documentation


  • update README, documentation and vignette
  • fix some typo
  • new function: XenaBrowse - open the dataset link using web browser
  • open grel all parameters to XenaFilter function


  • fix bug #2
  • add pipe operator
  • add doc & docs


  • add new datahub ATACseq
  • enhance preparing data to R: provide options to select subset of original file data (see #1)


  • add new datahub PCAWG
  • speed up XenaFilter function


  • add new function getTCGAdata to help user download TCGA data by projects and biological data type


  • add shiny app to show TCGA datasets information of Xena
  • add downloadTCGA function to help user quickly download TCGA datasets


  • fix question about using temp directory


  • Add two hosts: Treehouse and TCGA Pan-Cancer

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


1.4.7 by Shixiang Wang, a month ago,

Report a bug at

Browse source code at

Authors: Shixiang Wang [aut, cre] , Xue-Song Liu [aut] , Martin Morgan [ctb] , Christine Stawitz [rev] (Christine reviewed the package for ropensci , see <>) , Carl Ganz [rev] (Carl reviewed the package for ropensci , see <>)

Documentation:   PDF Manual  

GPL-3 license

Imports digest, dplyr, httr, jsonlite, magrittr, methods, readr, rlang, utils

Suggests covr, DT, knitr, prettydoc, rmarkdown, shiny, shinydashboard, testthat

Imported by UCSCXenaShiny.

Suggested by sigminer.

See at CRAN