Evolutionary Transcriptomics

Investigate the evolution of biological processes by capturing evolutionary signatures in transcriptomes (Drost et al. (2017) ). The aim of this tool is to provide a transcriptome analysis environment to quantify the average evolutionary age of genes contributing to a transcriptome of interest (Drost et al. (2016) ).

Travis-CI Build Status rpackages.io rank rstudio mirror downloads rstudio mirror downloads

Evolutionary Transcriptomics with R

Today, phenotypic phenomena such as morphological mutations, diseases or developmental processes are primarily investigated on the molecular level using transcriptomics approaches. Transcriptomes denote the total number of quantifiable transcripts present at a specific stage in a biological process. In disease or developmental (defect) studies transcriptomes are usually measured over several time points. In treatment studies aiming to quantify differences in the transcriptome due to biotic stimuli, abiotic stimuli, or diseases usually treatment / disease versus non-treatment / non-disease transcriptomes are being compared. In either case, comparing changes in transcriptomes over time or between treatments allows us to identify genes and gene regulatory mechanisms that might be involved in governing the biological process of investigation. Although transcriptomics studies are based on a powerful methodology little is known about the evolution of such transcriptomes. Understanding the evolutionary mechanism that change transcriptomes over time, however, might give us a new perspective on how diseases emerge in the first place or how morphological changes are triggered by changes of developmental transcriptomes.

Evolutionary transcriptomics aims to capture and quantify the evolutionary conservation of genes that contribute to the transcriptome during a specific stage of the biological process of interest. The resulting temporal conservation pattern enables then to detect stages of development or other biological processes that are evolutionarily constrained (Drost et al., 2018). This quantification on the highest level is achieved through transcriptome indices (e.g. Transcriptome Age Index or Transcriptome Divergence Index) which aim to quantify the average evolutionary age or sequence conseration of genes that contribute to the transcriptome at a particular stage. In general, evolutionary transcriptomics can be used as a method to quantify the evolutionary conservation of transcriptomes to investigate how transcriptomes underlying biological processes are constrained or channeled due to evolutionary history (Dollow's law) (Drost et al., 2017).

In principle, any transcriptome dataset published so far can be combined with evolutionary information. Thus, myTAI in combination with evolutionary information can be used to study corresponding transcriptomes with any available transcriptome dataset.

For the purpose of performing large scale evolutionary transcriptomics studies, the myTAI package implements frameworks to allow researchers to study the evolution of biological processes and to detect stages or periods of evolutionary conservation or variability.

I hope that myTAI will become the community standard tool to perform evolutionary transcriptomics studies and I am happy to add required functionality upon request.

Please note, since myTAI relies on gene age inference and there has been an extensive debate about the best approaches for gene age inference in the last years, please follow my updated discussion about the gene age inference literature.

The following tutorials will provide use cases and detailed explainations of how to quantify transcriptome onservation with myTAI and how to interpret the results generated with this software tool.


Please cite the following paper when using myTAI for your own research. This will allow me to continue working on this software tool and will motivate me to extend its functionality and usability in the next years. Many thanks in advance :)


Users can download myTAI from CRAN :

# install myTAI 0.9.0 from CRAN

Install Developer Version

Some bug fixes or new functionality will not be available on CRAN yet, but in the developer version here on GitHub. To download and install the most recent version of myTAI run:

# install the developer version of myTAI on your system


The current status of the package as well as a detailed history of the functionality of each version of myTAI can be found in the NEWS section.


These tutorials introduce users to myTAI:


# example dataset covering 7 stages of A thaliana embryo development
# transform absolute expression levels to log2 expression levels
ExprExample <- tf(PhyloExpressionSetExample, log2)
# visualize global Transcriptome Age Index pattern
# plot expression level distributions for each age (=PS) category 
# and each developmental stage 
PlotCategoryExpr(ExprExample, "PS")
# plot median expression of each age category seperated by old (PS1-3)
# versus young (PS4-12) genes
PlotMedians(ExprExample, Groups = list(1:3, 4:12))
# plot mean expression of each age category seperated by old (PS1-3)
# versus young (PS4-12) genes
PlotMeans(ExprExample, Groups = list(1:3, 4:12))
# plot relative mean expression of each age category seperated by old (PS1-3)
# versus young (PS4-12) genes
PlotRE(ExprExample, Groups = list(1:3, 4:12))
# plot the significant differences between gene expression distributions 
# of old (=group1) versus young (=group2) genes
PlotGroupDiffs(ExpressionSet = ExprExample,
               Groups        = list(group_1 = 1:3, group_2 = 4:12),
               legendName    = "PS",
               plot.type     = "boxplot")

Package Dependencies

# to perform differential gene expression analyses with myTAI
# please install the edgeR package
# install edgeR

Getting started with myTAI

Users can also read the tutorials within (RStudio) :

# source the myTAI package
# look for all tutorials (vignettes) available in the myTAI package
# this will open your web browser
# or as single tutorials
# open tutorial: Introduction to Phylotranscriptomics and myTAI
 vignette("Introduction", package = "myTAI")
# open tutorial: Intermediate Concepts of Phylotranscriptomics
 vignette("Intermediate", package = "myTAI")
# open tutorial: Advanced Concepts of Phylotranscriptomics
 vignette("Advanced", package = "myTAI")
# open tutorial: Age Enrichment Analyses
 vignette("Enrichment", package = "myTAI")
# open tutorial: Gene Expression Analysis with myTAI
 vignette("Expression", package = "myTAI")
 # open tutorial: Taxonomic Information Retrieval with myTAI
 vignette("Taxonomy", package = "myTAI")

In the myTAI framework users can find:

Phylotranscriptomics Measures:

  • TAI() : Function to compute the Transcriptome Age Index (TAI)
  • TDI() : Function to compute the Transcriptome Divergence Index (TDI)
  • TPI() : Function to compute the Transcriptome Polymorphism Index (TPI)
  • REMatrix() : Function to compute the relative expression profiles of all phylostrata or divergence-strata
  • RE() : Function to transform mean expression levels to relative expression levels
  • pTAI() : Compute the Phylostratum Contribution to the global TAI
  • pTDI() : Compute the Divergence Stratum Contribution to the global TDI
  • pMatrix() : Compute Partial TAI or TDI Values
  • pStrata() : Compute Partial Strata Values

Visualization and Analytics Tools:

  • PlotSignature() : Main visualization function to plot evolutionary signatures across transcriptomes
  • PlotPattern() : Base graphics function to plot evolutionary signatures across transcriptomes
  • PlotContribution() : Plot Cumuative Transcriptome Index
  • PlotCorrelation() : Function to plot the correlation between phylostratum values and divergence-stratum values
  • PlotRE() : Function to plot the relative expression profiles
  • PlotBarRE() : Function to plot the mean relative expression levels of phylostratum or divergence-stratum classes as barplot
  • PlotMeans() : Function to plot the mean expression profiles of age categories
  • PlotMedians() : Function to plot the median expression profiles of age categories
  • PlotVars() : Function to plot the expression variance profiles of age categories
  • PlotDistribution() : Function to plot the frequency distribution of genes within the corresponding age categories
  • PlotCategoryExpr() : Plot the Expression Levels of each Age or Divergence Category as Barplot or Violinplot
  • PlotEnrichment() : Plot the Phylostratum or Divergence Stratum Enrichment of a given Gene Set
  • PlotGeneSet() : Plot the Expression Profiles of a Gene Set
  • PlotGroupDiffs() : Plot the significant differences between gene expression distributions of PS or DS groups
  • PlotSelectedAgeDistr() : Plot the PS or DS distribution of a selected set of genes

A Statistical Framework and Test Statistics:

  • FlatLineTest() : Function to perform the Flat Line Test that quantifies the statistical significance of an observed phylotranscriptomics pattern (significant deviation from a frat line = evolutionary signal)
  • ReductiveHourglassTest() : Function to perform the Reductive Hourglass Test that statistically evaluates the existence of a phylotranscriptomic hourglass pattern (hourglass model)
  • EarlyConservationTest() : Function to perform the Reductive Early Conservation Test that statistically evaluates the existence of a monotonically increasing phylotranscriptomic pattern (early conservation model)
  • ReverseHourglassTest: Function to perform the Reverse Hourglass Test that statistically evaluates the existence of a reverse hourglass pattern (low-high-low)
  • EnrichmentTest() : Phylostratum or Divergence Stratum Enrichment of a given Gene Set based on Fisher's Test
  • bootMatrix() : Compute a Permutation Matrix for Test Statistics

All functions also include visual analytics tools to quantify the goodness of test statistics.

Differential Gene Expression Analysis

  • DiffGenes() : Implements Popular Methods for Differential Gene Expression Analysis
  • CollapseReplicates() : Combine Replicates in an ExpressionSet
  • CombinatorialSignificance() : Compute the Statistical Significance of Each Replicate Combination
  • Expressed() : Filter Expression Levels in Gene Expression Matrices (define expressed genes)
  • SelectGeneSet() : Select a Subset of Genes in an ExpressionSet
  • PlotReplicateQuality() : Plot the Quality of Biological Replicates
  • GroupDiffs() : Quantify the significant differences between gene expression distributions of PS or DS groups

Taxonomic Information Retrieval

  • taxonomy() : Retrieve Taxonomic Information for any Organism of Interest

Minor Functions for Better Usibility and Additional Analyses

  • MatchMap() : Match a Phylostratigraphic Map or Divergence Map with a ExpressionMatrix
  • tf() : Transform Gene Expression Levels
  • age.apply() : Age Category Specific apply Function
  • ecScore() : Compute the Hourglass Score for the EarlyConservationTest
  • geom.mean() : Geometric Mean
  • harm.mean() : Harmonic Mean
  • omitMatrix() : Compute TAI or TDI Profiles Omitting a Given Gene
  • rhScore() : Compute the Hourglass Score for the Reductive Hourglass Test
  • reversehourglassScore(): Compute the Reverse Hourglass Score for the Reverse Hourglass Test

Developer Version of myTAI

The developer version of myTAI might include more functionality than the stable version on CRAN. Hence users can download the current developer version of myTAI by typing:

# The developer version can be installed directly from github:
# install.packages("devtools")
# install developer version of myTAI
install_github("HajkD/myTAI", build_vignettes = TRUE, dependencies = TRUE)
# On Windows, this won't work - see ?build_github_devtools
# install_github("HajkD/myTAI", build_vignettes = TRUE, dependencies = TRUE)
# When working with Windows, first you need to install the
# R package: rtools 
# or consult: http://www.rstudio.com/products/rpackages/devtools/
# Afterwards you can install devtools -> install.packages("devtools")
# and then you can run:
devtools::install_github("HajkD/myTAI", build_vignettes = TRUE, dependencies = TRUE)
# and then call it from the library
library("myTAI", lib.loc = "C:/Program Files/R/R-3.1.1/library")


Domazet-Lošo T. and Tautz D. A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature (2010) 468: 815-8.

Quint M, Drost HG, et al. A transcriptomic hourglass in plant embryogenesis. Nature (2012) 490: 98-101.

Drost HG, Gabel A, Grosse I, Quint M. Evidence for Active Maintenance of Phylotranscriptomic Hourglass Patterns in Animal and Plant Embryogenesis. Mol. Biol. Evol. (2015) 32 (5): 1221-1231.

Drost HG, Bellstädt J, Ó'Maoiléidigh DS, Silva AT, Gabel A, Weinholdt C, Ryan PT, Dekkers BJW, Bentsink L, Hilhorst H, Ligterink W, Wellmer F, Grosse I, and Quint M. Post-embryonic hourglass patterns mark ontogenetic transitions in plant development. Mol. Biol. Evol. (2016) doi:10.1093/molbev/msw039

Discussions and Bug Reports

I would be very happy to learn more about potential improvements of the concepts and functions provided in this package.

Furthermore, in case you find some bugs or need additional (more flexible) functionality of parts of this package, please let me know:



I would like to thank several individuals for making this project possible.

First I would like to thank Ivo Grosse and Marcel Quint for providing me a place and the environment to be able to work on fascinating topics of Evo-Devo research and for the fruitful discussions that led to projects like this one.

Furthermore, I would like to thank Alexander Gabel and Jan Grau for valuable discussions on how to improve some methodological concepts of some analyses present in this package.

I would also like to thank Master Students: Sarah Scharfenberg, Anne Hoffmann, and Sebastian Wussow who worked intensively with this package and helped me to improve the usability and logic of the package environment.


myTAI 0.9.1

  • fixing a unit test that uses set.seed(123) which causes an error in the new R version 3.6.0 due to the switch from a non-uniform "Rounding" sampler to a "Rejection" sampler in the new R version; the corresponding unit test test-PlotEnrichment.R was adjusted accordingly. Here the CRAN statement:

Note that this ensures using the (old) non-uniform "Rounding" sampler for all 3.x versions of R, and does not add an R version dependency. Note also that the new "Rejection" sampler which R will use from 3.6.0 onwards by default is definitely preferable over the old one, so that the above should really only be used as a temporary measure for reproduction of the previous behavior (and the run time tests relying on it).

myTAI 0.9.0

New Functions

  • new function ReverseHourglassTest() to perform a Reverse Hourglass Test. The Reverse Hourglass Test aims to statistically evaluate the existence of a reverse hourglass pattern based on TAI or TDI computations. The corresponding p-value quantifies the probability that a given TAI or TDI pattern (or any phylotranscriptomics pattern) does follow an hourglass like shape. A p-value < 0.05 indicates that the corresponding phylotranscriptomics pattern does rather follow a reverse hourglass (low-high-low) shape.

  • new function reversehourglassScore() for computing the Reverse Hourglass Score for the Reverse Hourglass Test

Updated Functionality

  • function PlotSignature() receives a new TestStatistic (TestStatistic = "ReverseHourglassTest") to perform a revserse hourglass test (= testing the significance of a low-high-low pattern)

myTAI 0.8.0

Updated Functionality

  • fix remaining issues when input is a tibble

myTAI 0.7.0


New Functions

  • new function PlotCIRatio() to compute and visualize TAI/TDI etc patters using bootstrapping and confidence intervals (contributed by @ljljolinq1010)

Update Functionality

  • all functions can now handle tibble data as input -> before there were errors thrown when input data wasn't in strict data.frame format

myTAI 0.6.0


  • is.ExpressionSet() now prints out more detailed error messages when ExpressionSet format is violated

  • adapt PlotContribution() to new version of dplyr where summarise_each() is deprecated.

Error message accuring after new dplyr release was:

  1. Failure: PlotContribution() works properly with DivergenceExpressionSet input... (@test-PlotContribution.R#16) PlotContribution(DivergenceExpressionSetExample, legendName = "DS") produced messages.

summarise_each() is deprecated. Use summarise_all(), summarise_at() or summarise_if() instead. To map funs over all variables, use summarise_all() summarise_each() is deprecated. Use summarise_all(), summarise_at() or summarise_if() instead. To map funs over all variables, use summarise_all()

Is now fixed.

myTAI 0.5.0

New Functions

  • new function PlotSignature() allows users to plot evolutionary signatures across transcriptomes (based on ggplot2 -> new main visualization function aiming to replace the PlotPattern() function)

  • new function TPI() allows users to compute the Transcriptome Polymorphism Index introduced by Gossmann et al., 2015.

  • new function PlotMedians() allows users to compute and visualize the median expression of all age categories

  • new function PlotVars() allows users to compute and visualize the expression variance of all age categories


  • PlotContribution() is now based on ggplot2 and loses base graphics arguments

  • now R/RcppExports.R and src/rcpp_funcs.cpp are included in the package due to previous compilation problems (see also stackoverflow discussion)

  • MatchMap() is now based on dplyr::inner_join() to match age category table with a gene expression dataset

  • PlotCorrelation() has been extended and optimized for producing high publication quality plots

  • PlotMeans() is now based on ggplot2 and lost all base graphics arguments.

  • PlotRE() is now based on ggplot2 and lost all base graphics arguments.


  • In Introduction vignette: complete restructuring of the Introduction
  • In Introduction vignette: add new ggplot2 based examples

myTAI 0.4.0

New Functions

  • a new function PlotSelectedAgeDistr() allowing unsers to visualize the PS or DS gene distribution of a subset of genes stored in the input ExpressionSet object
  • a new function PlotGroupDiffs() allowing users to plot the significant differences between gene expression distributions of PS or DS groups
  • a new function GroupDiffs() allowing users to perform statistical tests to quantify the gene expression level differences between all genes of defined PS or DS groups


  • PlotDistribution() now uses ggplot2 to visualize the PS or DS distribution and is also based on the new function PlotSelectedAgeDistr(); furthermore it loses arguments plotText and ... and gains a new argument legendName

  • remove arguments 'main.text' and '...' from PlotCorrelation()

  • PlotCorrelation() is now based on ggplot2

  • PlotGroupDiffs() receives a new argument gene.set allowing users to statistically quantify the group specific PS/DS differences of a selected set of genes

  • analogously to PlotGroupDiffs() the function GroupDiffs() also receives a new argument gene.set allowing users to statistically quantify the group specific PS/DS differences of a selected set of genes

  • Fixing wrong x-axis labeling in PlotCategoryExpr() when type = "stage-centered" is specified

  • PlotCategoryExpr() now also prints out the PS/DS absolute frequency distribution of the selected gene.set

myTAI 0.3.0


  • adding examples for PlotCategoryExpr() to Advanced Vignette
  • adding examples for PlotReplicateQuality() to Expression vignette

New Functions

  • a new function PlotCategoryExpr() allowing users to plot the expression levels of each age or divergence category as boxplot, dot plot or violin plot
  • a new function PlotReplicateQuality() allowing users to visualize the quality of biological replicates

myTAI 0.2.1


  • fixed a wrong example in the Enrichment vignette (https://github.com/HajkD/myTAI/commit/8d52fd60c274361dc9028dec3409abf60a738d8a)


  • PlotGeneSet() and SelectGeneSet() now have a new argument use.only.map specifying whether or not instead of using a standard ExpressionSet a Phylostratigraphic Map or Divergene Map is passed to the function.
  • a wrong version of the edgeR Bioconductor package was imported causing version 0.2.0 to fail R CMD Check on unix based systems

myTAI 0.2.0


  • adding new vignette Taxonomy providing spep by step instructions on retrieving taxonomic information for any organism of interest

  • adding new vignette Expression Analysis providing use cases to perform gene expression data analysis with myTAI

  • adding new vignette Enrichment providing step-by-step instructions on how to perform PS and DS enrichment analyses with PlotEnrichment()

  • adding examples for pStrata(), pMatrix(), pTAI(), pTDI(), and PlotContribution() to the Introduction Vignette

New Functions

  • a new function taxonomy() allows users to retrieve taxonomic information for any organism of interest; this function has been taken from the biomartr package and was removed from biomartr afterwards. Please notice, that in myTAI version 0.1.0 the Introduction vignette referenced to the taxonomy() function in biomartr. This is no longer the case (since myTAI version 0.2.0), because now taxonomy() is implemented in myTAI.

  • the new taxonomy() function is based on the powerful R package taxize.

  • a new function SelectGeneSet() allows users to fastly select a subset of genes in an ExpressionSet

  • a new function DiffGenes() allows users to perform differential gene expression analysis with ExpressionSet objects

  • a new function EnrichmentTest() allows users to perform a Fisher's exact test based enrichment analysis of over or underrepresented Phylostrata or Divergence Strata within a given gene set without having to plot the result

  • a new function PlotGeneSet() allows users to visualize the expression profiles of a given gene set

  • a new function PlotEnrichment() allows users to visualize the Phylostratum or Divergence Stratum enrichment of a given Gene Set as well as computing Fisher's exact test to quantify the statistical significance of enrichment

  • a new function PlotContribution() allows users to visualize the Phylostratum or Divergence Stratum contribution to the global TAI/TDI pattern

  • a new function pTAI() allows users to compute the phylostratum contribution to the global TAI pattern

  • a new function pTDI() allows users to compute the divergence stratum contribution to the global TDI pattern


  • FilterRNASeqCT() has been renamed to Expressed() allowing users to apply this filter function to RNA-Seq data as well as to microarray data
  • PlotRE() and PlotMeans() are now based on colors from the RColorBrewer package (default)
  • PlotRE() and PlotMeans() now have a new argument colors allowing unsers to choose custom colors for the visualized relative or mean expression profiles
  • geom.mean() and harm.mean() now are external functions accessible to the myTAI user

myTAI 0.1.0

Main News

  • now all functions have unit tests

New Functions

  • a new function pStrata() allows users to compute partial TAI/TDI values for all Phylostrata or Divergence Strata

  • a new function CollapseReplicates() allows users to combine replicate expression levels in ExpressionSet objects

  • a new function FilterRNASeqCT() allows users to filter expression levels of ExpressionSet objects deriving from RNA-Seq count tables


  • function MatchMap() now receives a new argument remove.duplicates allowing users to delete duplicate gene ids (that might be stored in the input PhyoMap or DivergenceMap) during the process of matching a Map with an ExpressionSet

  • FlatLineTest(), ReductiveHourglassTest(), EarlyConservationTest(), and PlotPattern() implement a new argument custom.perm.matrix allowing users to pass their own (custom) permutation matrix to the corresponding function. All subsequent test statistics and p-value/std.dev computations are then based on this custom permutation matrix

  • EarlyConservationTest() and ReductiveHourglassTest() now have a new parameter gof.warning allowing users to choose whether or not non significant goodness of fit results should be printed as warning

  • now when specifying TestStatistic = NULL in PlotPattern() only the TAI/TDI profile is drawn (without performing any test statistics); this is equavalent to performing: plot(TAI(PhyloExpressionSetExample)

  • function combinatorialSignificance() is now named CombinatorialSignificance()

  • changing the title and description of the myTAI package

  • some minor changes in vignettes and within the documentation of functions

myTAI 0.0.2

New Features in v. 0.0.2

  • combinatorialSignificance(), FlatLineTest(), ReductiveHourglassTest(), and EarlyConservationTest() now support multicore processing

  • MatchMap() has been entirely rewritten and is now based on dplyr; additionally it now has a new argument accumulate that allows you to accumulate multiple expression levels to a unique expressiion level for a unique gene id


All three Vignettes: Introduction, Intermediate, and Advanced have been updated and extended.

Bug Fixes

  • two small bugs in ReductiveHourglassTest() and EarlyConservationTest() have been fixed that caused that instead of displaying 3 or 4 plots (par(mfrow=c(1,3)) or par(mfrow=c(2,2))) only 1 plot has been generated

  • a small bug in PlotMeans() that caused the visualization of a wrong y-axis label when plotting only one group of Phylostrata or Divergence Strata

myTAI 0.0.1

Introducing myTAI 0.0.1:

A framework to perform phylotranscriptomics analyses for Evolutionary Developmental Biology research.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.9.3 by Hajk-Georg Drost, a year ago


Report a bug at https://github.com/drostlab/myTAI/issues

Browse source code at https://github.com/cran/myTAI

Authors: Hajk-Georg Drost [aut, cre]

Documentation:   PDF Manual  

GPL-3 license

Imports Rcpp, nortest, fitdistrplus, parallel, foreach, doParallel, dplyr, RColorBrewer, taxize, methods, graphics, stats, grDevices, utils, reshape2, ggplot2, readr, tibble, scales, gridExtra, edgeR

Suggests knitr, rmarkdown, devtools, testthat, mgcv

Linking to Rcpp, RcppArmadillo, cpp11

System requirements: C++11

See at CRAN