Download and Explore Datasets from UCSC Xena Data Hubs

Download and explore datasets from UCSC Xena data hubs, which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded.


CRAN AppVeyor BuildStatus Travis buildstatus

CoverageStatus GitHubissues Closedissues

UCSCXenaTools is an R package for downloading and exploring data from UCSC Xena data hubs, which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded.

Table of Contents

Installation

Install stable release from CRAN with:

install.packages("UCSCXenaTools")

You can also install devel version of UCSCXenaTools from github with:

# install.packages("remotes")
remotes::install_github("ShixiangWang/UCSCXenaTools", build_vignettes = TRUE)

Data Hub List

All datasets are available at https://xenabrowser.net/datapages/.

Currently, UCSCXenaTools supports all 9 data hubs of UCSC Xena.

If the url of data hub changed, please remind me by emailing to [email protected] or opening an issue on GitHub.

Usage

Download UCSC Xena datasets and load them into R by UCSCXenaTools is a workflow with generate, filter, query, download and prepare 5 steps, which are implemented as XenaGenerate, XenaFilter, XenaQuery, XenaDownload and XenaPrepare functions, respectively. They are very clear and easy to use and combine with other packages like dplyr.

To show the basic usage of UCSCXenaTools, we download clinical data of LUNG, LUAD, LUSC from TCGA (hg19 version) data hub.

XenaData data.frame

UCSCXenaTools uses a data.frame object (built in package) XenaData to generate an instance of XenaHub class, which communicates with API of UCSC Xena Data Hubs.

You can load XenaData after loading UCSCXenaTools into R.

library(UCSCXenaTools)
#> =========================================================================
#> UCSCXenaTools version 1.0.0
#> Github page: https://github.com/ShixiangWang/UCSCXenaTools
#> Documentation: https://github.com/ShixiangWang/UCSCXenaTools
#> If you use it in published research, please cite:
#> Wang, Shixiang, et al. "APOBEC3B and APOBEC mutational signature
#>     as potential predictive markers for immunotherapy
#>     response in non-small cell lung cancer." Oncogene (2018).
#> =========================================================================
#> 
data(XenaData)
 
head(XenaData)
#>                         XenaHosts XenaHostNames
#> 1 https://ucscpublic.xenahubs.net   UCSC_Public
#> 2 https://ucscpublic.xenahubs.net   UCSC_Public
#> 3 https://ucscpublic.xenahubs.net   UCSC_Public
#> 4 https://ucscpublic.xenahubs.net   UCSC_Public
#> 5 https://ucscpublic.xenahubs.net   UCSC_Public
#> 6 https://ucscpublic.xenahubs.net   UCSC_Public
#>                                     XenaCohorts
#> 1 Acute lymphoblastic leukemia (Mullighan 2008)
#> 2 Acute lymphoblastic leukemia (Mullighan 2008)
#> 3 Acute lymphoblastic leukemia (Mullighan 2008)
#> 4                   Breast Cancer (Caldas 2007)
#> 5                   Breast Cancer (Caldas 2007)
#> 6                   Breast Cancer (Caldas 2007)
#>                                               XenaDatasets
#> 1    mullighan2008_public/mullighan2008_500K_genomicMatrix
#> 2 mullighan2008_public/mullighan2008_public_clinicalMatrix
#> 3    mullighan2008_public/mullighan2008_SNP6_genomicMatrix
#> 4              Caldas2007/chinSF2007_public_clinicalMatrix
#> 5             Caldas2007/chinSFGenomeBio2007_genomicMatrix
#> 6                   Caldas2007/naderi2007Exp_genomicMatrix

Workflow

Select datasets.

# The options in XenaFilter function support Regular Expression
XenaGenerate(subset = XenaHostNames=="TCGA") %>% 
  XenaFilter(filterDatasets = "clinical") %>% 
  XenaFilter(filterDatasets = "LUAD|LUSC|LUNG") -> df_todo
 
df_todo
#> class: XenaHub 
#> hosts():
#>   https://tcga.xenahubs.net
#> cohorts() (3 total):
#>   TCGA Lung Adenocarcinoma (LUAD)
#>   TCGA Lung Cancer (LUNG)
#>   TCGA Lung Squamous Cell Carcinoma (LUSC)
#> datasets() (3 total):
#>   TCGA.LUAD.sampleMap/LUAD_clinicalMatrix
#>   TCGA.LUNG.sampleMap/LUNG_clinicalMatrix
#>   TCGA.LUSC.sampleMap/LUSC_clinicalMatrix

Query and download.

XenaQuery(df_todo) %>%
  XenaDownload() -> xe_download
#> This will check url status, please be patient.
#> We will download files to directory /var/folders/mx/rfkl27z90c96wbmn3_kjk8c80000gn/T//RtmpOTE23c.
#> Downloading TCGA.LUAD.sampleMap__LUAD_clinicalMatrix.gz
#> Downloading TCGA.LUNG.sampleMap__LUNG_clinicalMatrix.gz
#> Downloading TCGA.LUSC.sampleMap__LUSC_clinicalMatrix.gz
#> Note file names inherit from names in datasets column
#>   and '/' all changed to '__'.

Prepare data into R for analysis.

cli = XenaPrepare(xe_download)
class(cli)
#> [1] "list"
names(cli)
#> [1] "TCGA.LUAD.sampleMap__LUAD_clinicalMatrix.gz"
#> [2] "TCGA.LUNG.sampleMap__LUNG_clinicalMatrix.gz"
#> [3] "TCGA.LUSC.sampleMap__LUSC_clinicalMatrix.gz"

Browse datasets

Create two XenaHub objects:

  • to_browse - a XenaHub object contains a cohort and a dataset.
  • to_browse2 - a XenaHub object contains 2 cohorts and 2 datasets.
XenaGenerate(subset = XenaHostNames=="TCGA") %>%
    XenaFilter(filterDatasets = "clinical") %>%
    XenaFilter(filterDatasets = "LUAD") -> to_browse
 
to_browse
#> class: XenaHub 
#> hosts():
#>   https://tcga.xenahubs.net
#> cohorts() (1 total):
#>   TCGA Lung Adenocarcinoma (LUAD)
#> datasets() (1 total):
#>   TCGA.LUAD.sampleMap/LUAD_clinicalMatrix
 
XenaGenerate(subset = XenaHostNames=="TCGA") %>%
    XenaFilter(filterDatasets = "clinical") %>%
    XenaFilter(filterDatasets = "LUAD|LUSC") -> to_browse2
 
to_browse2
#> class: XenaHub 
#> hosts():
#>   https://tcga.xenahubs.net
#> cohorts() (2 total):
#>   TCGA Lung Adenocarcinoma (LUAD)
#>   TCGA Lung Squamous Cell Carcinoma (LUSC)
#> datasets() (2 total):
#>   TCGA.LUAD.sampleMap/LUAD_clinicalMatrix
#>   TCGA.LUSC.sampleMap/LUSC_clinicalMatrix

XenaBrowse() function can be used to browse dataset/cohort links using your default web browser. At default, this function limit one dataset/cohort for preventing user to open too many links at once.

# This will open you web browser
XenaBrowse(to_browse)
 
XenaBrowse(to_browse, type = "cohort")
# This will throw error
XenaBrowse(to_browse2)
#> Error in XenaBrowse(to_browse2): This function limite 1 dataset to browse.
#>  Set multiple to TRUE if you want to browse multiple links.
 
XenaBrowse(to_browse2, type = "cohort")
#> Error in XenaBrowse(to_browse2, type = "cohort"): This function limite 1 cohort to browse. 
#>  Set multiple to TRUE if you want to browse multiple links.

When you make sure you want to open multiple links, you can set multiple option to TRUE.

XenaBrowse(to_browse2, multiple = TRUE)
XenaBrowse(to_browse2, type = "cohort", multiple = TRUE)

Documentation

More features and usages please read online documentation on CRAN or Github website.

Citation

Wang, Shixiang, et al. “APOBEC3B and APOBEC mutational signature as potential predictive markers for immunotherapy response in non-small cell lung cancer.” Oncogene (2018).

Acknowledgement

This package is based on XenaR, thanks Martin Morgan for his work.

LICENSE

GPL-3

please note, code from XenaR package under Apache 2.0 license.

Code of conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

News

v1.2.1

Bug fixes

  • fix API functions cannot be called from outside

New features

  • Add option to XenaDownload function for downloading gene id mapping data
  • Add XenaQueryProbeMap() for querying probemap of datasets.

Minor changes

  • export XenaHub Class
  • improve XenaQuery function
  • update website and documentation

v1.2.0

  • reformat code files
  • update XenaData & XenaDataUpdate() function to obtain more info
  • add jsonlite as import package
  • add metadata for all datasets into package inst dir

v1.1.1

  • update doc for APIs
  • modify description for package

v1.1.0

  • this version will change many variable names or functions. Update and Read the documentation are highly recommended!
  • add API functions, this is inspired by xenaPython package
  • remove useless query sentence from old code from XenaR
  • I rename all hub names
  • single cell hub is added
  • new XenaData contains much more information
  • improve internal code

v1.0.1

  • fix some grammar errors in documentation

v1.0.0

  • update README, documentation and vignette
  • fix some typo
  • new function: XenaBrowse - open the dataset link using web browser
  • open grel all parameters to XenaFilter function

v0.2.7

  • fix bug #2
  • add pipe operator
  • add doc & docs

v0.2.6

  • add new datahub ATACseq
  • enhance preparing data to R: provide options to select subset of original file data (see #1)

v0.2.5

  • add new datahub PCAWG
  • speed up XenaFilter function

v0.2.4

  • add new function getTCGAdata to help user download TCGA data by projects and biological data type

v0.2.3

  • add shiny app to show TCGA datasets information of Xena
  • add downloadTCGA function to help user quickly download TCGA datasets

v0.2.2

  • fix question about using temp directory

v0.2.1

  • Add two hosts: Treehouse and TCGA Pan-Cancer

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("UCSCXenaTools")

1.2.10 by Shixiang Wang, 2 days ago


https://docs.ropensci.org/UCSCXenaTools, https://github.com/ropensci/UCSCXenaTools


Report a bug at https://github.com/ropensci/UCSCXenaTools/issues


Browse source code at https://github.com/cran/UCSCXenaTools


Authors: Shixiang Wang [aut, cre] , Xue-Song Liu [aut] , Martin Morgan [ctb] , Christine Stawitz [rev] (Christine reviewed the package for ropensci , see <https://github.com/ropensci/software-review/issues/315>) , Carl Ganz [rev] (Carl reviewed the package for ropensci , see <https://github.com/ropensci/software-review/issues/315>)


Documentation:   PDF Manual  


GPL-3 license


Imports httr, readr, methods, utils, dplyr, magrittr, jsonlite, rlang

Suggests DT, knitr, rmarkdown, prettydoc, shiny, shinydashboard, testthat, covr


Imported by UCSCXenaShiny.


See at CRAN