Automated Transcriptome Classifier Pipeline: Comprehensive Transcriptome Analysis

An unsupervised fully-automated pipeline for transcriptome analysis or a supervised option to identify characteristic genes from predefined subclasses. We rely on the 'pamr' <> clustering algorithm to cluster the Data and then draw a heatmap of the clusters with the most significant genes and the least significant genes according to the 'pamr' algorithm. This way we get easy to grasp heatmaps that show us for each cluster which are the clusters most defining genes.

Transcriptomic is the large-scale identification of gene expression across multiple samples. Gene expression mirrored functional aspects and included important information about biological functions and pathway activation. Their analysis can either uncover molecular functions on the one side and improve classification of large cohorts for improved clinical understanding on the other side. This tool aimed to design a standard-pipeline to integrate classification and functional aspect and generate a visual output to integrate transcriptomic data, clinical information and Gene Set Enrichment Analysis for functional aspects.

The pipeline was designed to integrate following aspects:

Reproducibility: Analysis needs to be easily reproduced by external researchers.

Easy-to-Use: The pipeline was designed to be user-friendly and applicable for non-expert users.

Compatible: The pipeline should be feasible for array based transcriptomic data as well as RNA sequencing outputs. For further clinical interpretation, external traits need to be easily integrated and included in the analysis.

How to install the package from GitHub

Install with devtools


Unsupervised Cluster Analysis


A function for unsupervised Clustering of the data

#Load data with Gene ENTREZ in rownames and samples in colnames

#Optional: Read in clinical Infos with samples in rownames


Produce a Heatmap using a Supervised Clustering Algorithm


This function produces a plot with a Heatmap using a supervised clustering algorithm which the user choses. with a the mean Silhouette width plotted on the right top corner and the Silhouette width for each sample on top. On the right side of the plot the n highest and lowest scoring genes for each cluster will added. And next to them the coressponding pathways (see Details)

##load the Library
#' ## load data

## calculate best number of clusters 
res<-TopPAM(me, max_clusters = 8, TOP=1000)


## Compute top genes of each cluster, with "TRw" samples with a negative Silhouette widths could be cut-off

File_genes=Groups_Sup(me_TOP, me=me, number_of_k,TRw=-1)


# groups_men contain informations of each sample and cluster, this could be adapted in case of a supervised analysis

o_g<-Supervised_Cluster_Heatmap(groups_men = groups_men, gene_matrix=me_x, method="PAMR",show_sil=TRUE,print_genes=TRUE, TOP = 1000,GSE=TRUE,plot_mean_sil=TRUE,sil_mean=res[[2]])

#Validate with Consensus Cluster or tSNE

cons_clust(me_x,max_clust=8, TOP=1000)


D. H. Heiland & K. Daka, Translational Research Group, Medcal-Center Freiburg, University of Freiburg


Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.1.6 by Karam Daka, 3 years ago

Browse source code at

Authors: Karam Daka [cre, aut] , Dieter Henrik Heiland [aut]

Documentation:   PDF Manual  

GPL-3 license

Imports cluster, pamr, siggenes, annotate, fgsea,, RColorBrewer, ConsensusClusterPlus, Rtsne, clusterProfiler, msigdbr

See at CRAN