Download and explore datasets from UCSC Xena data hubs, which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded.
UCSCXenaTools is an R package for downloading and exploring data from UCSC Xena data hubs, which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded.
Install stable release from CRAN with:
install.packages("UCSCXenaTools")
You can also install devel version of UCSCXenaTools from github with:
# install.packages("remotes")remotes::install_github("ShixiangWang/UCSCXenaTools", build_vignettes = TRUE)
All datasets are available at https://xenabrowser.net/datapages/.
Currently, UCSCXenaTools supports all 9 data hubs of UCSC Xena.
If the url of data hub changed, please remind me by emailing to [email protected] or opening an issue on GitHub.
Download UCSC Xena datasets and load them into R by UCSCXenaTools is
a workflow with generate
, filter
, query
, download
and prepare
5 steps, which are implemented as XenaGenerate
, XenaFilter
,
XenaQuery
, XenaDownload
and XenaPrepare
functions, respectively.
They are very clear and easy to use and combine with other packages like
dplyr
.
To show the basic usage of UCSCXenaTools, we download clinical data of LUNG, LUAD, LUSC from TCGA (hg19 version) data hub.
UCSCXenaTools uses a data.frame
object (built in package)
XenaData
to generate an instance of XenaHub
class, which
communicates with API of UCSC Xena Data Hubs.
You can load XenaData
after loading UCSCXenaTools
into R.
library(UCSCXenaTools)#> =========================================================================#> UCSCXenaTools version 1.0.0#> Github page: https://github.com/ShixiangWang/UCSCXenaTools#> Documentation: https://github.com/ShixiangWang/UCSCXenaTools#> If you use it in published research, please cite:#> Wang, Shixiang, et al. "APOBEC3B and APOBEC mutational signature#> as potential predictive markers for immunotherapy#> response in non-small cell lung cancer." Oncogene (2018).#> =========================================================================#>data(XenaData)head(XenaData)#> XenaHosts XenaHostNames#> 1 https://ucscpublic.xenahubs.net UCSC_Public#> 2 https://ucscpublic.xenahubs.net UCSC_Public#> 3 https://ucscpublic.xenahubs.net UCSC_Public#> 4 https://ucscpublic.xenahubs.net UCSC_Public#> 5 https://ucscpublic.xenahubs.net UCSC_Public#> 6 https://ucscpublic.xenahubs.net UCSC_Public#> XenaCohorts#> 1 Acute lymphoblastic leukemia (Mullighan 2008)#> 2 Acute lymphoblastic leukemia (Mullighan 2008)#> 3 Acute lymphoblastic leukemia (Mullighan 2008)#> 4 Breast Cancer (Caldas 2007)#> 5 Breast Cancer (Caldas 2007)#> 6 Breast Cancer (Caldas 2007)#> XenaDatasets#> 1 mullighan2008_public/mullighan2008_500K_genomicMatrix#> 2 mullighan2008_public/mullighan2008_public_clinicalMatrix#> 3 mullighan2008_public/mullighan2008_SNP6_genomicMatrix#> 4 Caldas2007/chinSF2007_public_clinicalMatrix#> 5 Caldas2007/chinSFGenomeBio2007_genomicMatrix#> 6 Caldas2007/naderi2007Exp_genomicMatrix
Select datasets.
# The options in XenaFilter function support Regular ExpressionXenaGenerate(subset = XenaHostNames=="TCGA") %>%XenaFilter(filterDatasets = "clinical") %>%XenaFilter(filterDatasets = "LUAD|LUSC|LUNG") -> df_tododf_todo#> class: XenaHub#> hosts():#> https://tcga.xenahubs.net#> cohorts() (3 total):#> TCGA Lung Adenocarcinoma (LUAD)#> TCGA Lung Cancer (LUNG)#> TCGA Lung Squamous Cell Carcinoma (LUSC)#> datasets() (3 total):#> TCGA.LUAD.sampleMap/LUAD_clinicalMatrix#> TCGA.LUNG.sampleMap/LUNG_clinicalMatrix#> TCGA.LUSC.sampleMap/LUSC_clinicalMatrix
Query and download.
XenaQuery(df_todo) %>%XenaDownload() -> xe_download#> This will check url status, please be patient.#> We will download files to directory /var/folders/mx/rfkl27z90c96wbmn3_kjk8c80000gn/T//RtmpOTE23c.#> Downloading TCGA.LUAD.sampleMap__LUAD_clinicalMatrix.gz#> Downloading TCGA.LUNG.sampleMap__LUNG_clinicalMatrix.gz#> Downloading TCGA.LUSC.sampleMap__LUSC_clinicalMatrix.gz#> Note file names inherit from names in datasets column#> and '/' all changed to '__'.
Prepare data into R for analysis.
cli = XenaPrepare(xe_download)class(cli)#> [1] "list"names(cli)#> [1] "TCGA.LUAD.sampleMap__LUAD_clinicalMatrix.gz"#> [2] "TCGA.LUNG.sampleMap__LUNG_clinicalMatrix.gz"#> [3] "TCGA.LUSC.sampleMap__LUSC_clinicalMatrix.gz"
Create two XenaHub objects:
to_browse
- a XenaHub object contains a cohort and a dataset.to_browse2
- a XenaHub object contains 2 cohorts and 2 datasets.XenaGenerate(subset = XenaHostNames=="TCGA") %>%XenaFilter(filterDatasets = "clinical") %>%XenaFilter(filterDatasets = "LUAD") -> to_browseto_browse#> class: XenaHub#> hosts():#> https://tcga.xenahubs.net#> cohorts() (1 total):#> TCGA Lung Adenocarcinoma (LUAD)#> datasets() (1 total):#> TCGA.LUAD.sampleMap/LUAD_clinicalMatrixXenaGenerate(subset = XenaHostNames=="TCGA") %>%XenaFilter(filterDatasets = "clinical") %>%XenaFilter(filterDatasets = "LUAD|LUSC") -> to_browse2to_browse2#> class: XenaHub#> hosts():#> https://tcga.xenahubs.net#> cohorts() (2 total):#> TCGA Lung Adenocarcinoma (LUAD)#> TCGA Lung Squamous Cell Carcinoma (LUSC)#> datasets() (2 total):#> TCGA.LUAD.sampleMap/LUAD_clinicalMatrix#> TCGA.LUSC.sampleMap/LUSC_clinicalMatrix
XenaBrowse()
function can be used to browse dataset/cohort links using
your default web browser. At default, this function limit one
dataset/cohort for preventing user to open too many links at once.
# This will open you web browserXenaBrowse(to_browse)XenaBrowse(to_browse, type = "cohort")
# This will throw errorXenaBrowse(to_browse2)#> Error in XenaBrowse(to_browse2): This function limite 1 dataset to browse.#> Set multiple to TRUE if you want to browse multiple links.XenaBrowse(to_browse2, type = "cohort")#> Error in XenaBrowse(to_browse2, type = "cohort"): This function limite 1 cohort to browse.#> Set multiple to TRUE if you want to browse multiple links.
When you make sure you want to open multiple links, you can set
multiple
option to TRUE
.
XenaBrowse(to_browse2, multiple = TRUE)XenaBrowse(to_browse2, type = "cohort", multiple = TRUE)
More features and usages please read online documentation on CRAN or Github website.
Wang, Shixiang, et al. “APOBEC3B and APOBEC mutational signature as potential predictive markers for immunotherapy response in non-small cell lung cancer.” Oncogene (2018).
This package is based on XenaR, thanks Martin Morgan for his work.
GPL-3
please note, code from XenaR package under Apache 2.0 license.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
XenaDownload
function for downloading gene id mapping dataXenaQueryProbeMap()
for querying probemap of datasets.XenaHub
ClassXenaQuery
functionXenaData
& XenaDataUpdate()
function to obtain more infojsonlite
as import packageinst
dirXenaData
contains much more informationXenaBrowse
- open the dataset link using web browserXenaFilter
functionXenaFilter
functiongetTCGAdata
to help user download TCGA data by projects and biological data typedownloadTCGA
function to help user quickly download TCGA datasets