Calculating Proportionality Between Vectors of Compositional Data

The bioinformatic evaluation of gene co-expression often begins with correlation-based analyses. However, this approach lacks statistical validity when applied to relative count data. This includes, for example, biological data produced by high-throughput RNA-sequencing, chromatin immunoprecipitation (ChIP), ChIP-sequencing, Methyl-Capture sequencing, and other techniques. Two metrics of proportionality, phi [Lovell et al (2015) ] and rho [Erb and Notredame (2016) ], both derived from compositional data analysis, a branch of math dealing specifically with relative data, represent novel alternatives to correlation. This package introduces a programmatic framework for calculating feature dependence through proportionality, as discussed in the cited publications.


Welcome to the propr GitHub page!

The bioinformatic evaluation of gene co-expression often begins with correlation-based analyses. However, this approach lacks statistical validity when applied to relative count data. This includes, for example, biological data produced by high-throughput RNA-sequencing, chromatin immunoprecipitation (ChIP), ChIP-sequencing, Methyl-Capture sequencing, and other techniques. This package provides a set of functions for measuring dependence between relative features using compositional data analysis. Specifically, this package implements two measures of proportionality, φ and ρ, introduced in Lovell 2015 and expounded in Erb 2016. You can get started with propr by installing the most up-to-date version of this package directly from GitHub.

library(devtools)
devtools::install_github("tpq/propr")
library(propr)

The principal functions in propr include: (1) phit, for the calculation of φ, and (2) perb, for the calculation of ρ. In the example below, we calculate proportionality for a simulated dataset, print the results as a proportionality matrix, then the index pairs of interest within the proportionality matrix. We refer you to the official package vignette for a comprehensive discussion of compositional data, proportionality, and everything this package has to offer.

set.seed(12345)
N <- 10
data.absolute <- data.frame(a=(1:N), b=(1:N) * rnorm(N, 10, 0.1),
                            c=(N:1), d=(N:1) * rnorm(N, 10, 1.0))
data.relative <- data.absolute / colSums(data.absolute)
phi <- phit(data.relative)
## Calculating phi from "count matrix".
phi@matrix
##             [,1]        [,2]       [,3]       [,4]
## [1,] 0.000000000 0.001894476 3.95056338 4.02312199
## [2,] 0.001894476 0.000000000 3.97849497 4.05353543
## [3,] 3.950563382 3.978494970 0.00000000 0.01119647
## [4,] 4.023121991 4.053535432 0.01119647 0.00000000
phi05 <- phi["<", .05]
phi05@pairs
## [1]  2 12
rho <- perb(data.relative)
## Calculating rho from "count matrix".
rho@matrix
##            [,1]       [,2]       [,3]       [,4]
## [1,]  1.0000000  0.9990459 -0.9985539 -0.9982335
## [2,]  0.9990459  1.0000000 -0.9981875 -0.9985699
## [3,] -0.9985539 -0.9981875  1.0000000  0.9945048
## [4,] -0.9982335 -0.9985699  0.9945048  1.0000000
rho99 <- rho[">", .99]
rho99@pairs
## [1]  2 12
  1. Erb, Ionas, and Cedric Notredame. “How Should We Measure Proportionality on Relative Gene Expression Data?” Theory in Biosciences = Theorie in Den Biowissenschaften 135, no. 1–2 (June 2016): 21–36. .

  2. Lovell, David, Vera Pawlowsky-Glahn, Juan José Egozcue, Samuel Marguerat, and Jürg Bähler. “Proportionality: A Valid Alternative to Correlation for Relative Data.” PLoS Computational Biology 11, no. 3 (March 2015): e1004075. .

News


  • Modified propr Class
    • Merged propr-class and propr documentation
  • Modified phit, perb functions
    • Merged phit and perb documentation
    • New phis function returns (1 - rho) / (1 + rho)
    • NAs in count matrix now throw error
    • 0s now replaced with 1s
  • Modified visualization tools
    • Merged documentation

  • Modified [ method
    • Now joins newly indexed pairs with any existing index
  • New cytescape function
    • Uses @pairs slot to build an interaction network

  • Modified visualization tools
    • Courtesy prompt argument extended to smear and dendrogram
    • Improved error handling and documentation
  • Modified abstract function
    • New dt argument indexes significant results in @pairs
  • Modified simplify function
    • Now builds index of lower left triangle of matrix
  • New adjacent function
    • Uses @pairs slot to build an adjacency matrix

  • Modified visualization tools
    • bucket now depends on slate function
  • Modified backend code
    • New coordToIndex performs inverse of indexToCoord
  • Modified prop2prob function
    • Return p-values as a sorted data.table
    • Now lets user select method for p-value adjustment
    • New prompt argument turns off big data prompt
    • Fix pass by reference bug in linRcpp
  • New abstract function
    • Combines two propr objects into one

  • New lrmodel class
    • Use modelCLR to capture the clr-transformation rule
    • Use predict to deploy this rule to new data
  • Modified backend code
    • Added corRcpp function from correlateR package
    • Added linRcpp function for Z-transformation
    • Added lltRcpp and urtRcpp to retrieve a half-matrix
    • Added labRcpp to label a half-matrix
  • New prop2prob function
    • Allows hypothesis testing of rho equals naught
    • Tests differential proportionality

  • Modified visualization tools
    • plotCheck extended to all plot functions
    • plot method now calls smear function
    • dendrogram plot now rendered using ggplot2
    • snapshot plot now rendered using ggplot2
    • bokeh plot now on positive log scale
    • plotly support added
  • Modified backend code
    • Temporarily removed a_bool function
  • Modified [ method
    • Removed bool and copy arguments

  • Modified backend code
    • New a_bool function returns thresholded boolean matrix
  • Modified [ method
    • New bool argument toggles whether to use a_bool
    • New tiny argument toggles whether to use simplify
    • New copy argument toggles a_bool copy-on-modify

  • New visualization tools
    • slate returns a table of VLR, VLS, and rho
    • bokeh plots pairs by the individual variances
  • Modified index-naive plot functions
    • Now uses fastcluster::hclust implementation
    • New prompt argument turns off big data prompt
    • prism now depends on slate function
  • Modified dendrogram function
    • Now uses fastcluster::hclust implementation
    • Now returns an hclust object

  • Modified subset method
    • Argument select now correctly rearranges features
  • Modified rhoRcpp function
    • Now accommodates new perb function feature
  • Modified perb function
    • New select argument returns subsetted matrix
    • This subset does not alter values of rho

  • Modified perb function
    • User can now specify name of ivar reference
  • Altered image method
    • Now includes dendrogram with heatmap
    • No longer uses index pairs
    • Now called snapshot
  • New prism function
  • New bucket function
  • New mds function
  • New vignette

  • Modified phit, perb functions
    • These functions now force zero removal
  • New simplify function
    • Subsets propr object based on index in @pairs slot
    • Returns an updated index

  • Modified phit, perb functions
    • Permutation testing removed
    • Added lazyPairs construct
      • Slot @pairs not populated until after [
  • Modified propr Class
    • @pairs slot now integer vector
      • Populated with indexPairs function
        • Translate with indexToCoord function
    • show method updated for lazyPairs construct
    • [ method completely redesigned
      • First argument specifies operation
      • Second argument specifies reference
      • Indexes @matrix based on these
    • subset method revised but still copy-on-modify
      • Resets @pairs when called
    • $ method removed
  • Visualization tools revised
    • plot, image, dendrogram methods
      • Improved performance
      • Compatible with new @pairs indexing
      • No longer requires column names
  • Modified backend code
    • Rephrased code for proprPhit
    • Rephrased code for proprPerb
    • Rephrased code for proprVLR
    • All functions translated into C++
      • Estimated 80% reduction in RAM overhead
      • Estimated 100-fold performance increase
      • ALR methods no longer drop dimension
      • All have modify-in-place behavior

  • New orientation expected for input data
    • Updated backend and vignette accordingly
    • Removed redundant transpositions
  • Fixed rare subsetting errors
  • Tweaked plot methods

  • Introduced phit function
    • Implements Lovell's phi proportionality metric
    • Returns object of class propr
  • Introduced perb function
    • Implements Erb's rho proportionality metric
    • Returns object of class propr
  • Introduced propr Class
    • show method
      • Subsets propr based on @pairs slot
    • subset method
      • Subsets propr based on @matrix slot
    • plot method
      • Plots pairwise *lr proportionality
    • dendrogram method
      • Plots clusters of *lr-transformed data
    • image method
      • Plots heatmap of *lr-transformed data

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("propr")

2.1.8 by Thomas Quinn, 19 days ago


http://github.com/tpq/propr


Report a bug at http://github.com/tpq/propr/issues


Browse source code at https://github.com/cran/propr


Authors: Thomas Quinn [aut, cre], David Lovell [aut], Ionas Erb [ctb], Anders Bilgrau [ctb], Greg Gloor [ctb]


Documentation:   PDF Manual  


GPL-2 license


Imports fastcluster, ggplot2, igraph, methods, Rcpp, stats, utils

Suggests ALDEx2, cccrm, compositions, data.table, grid, ggdendro, knitr, plotly, reshape2, rgl, rmarkdown, testthat

Linking to Rcpp


See at CRAN