Calculating Proportionality Between Vectors of Compositional Data

The bioinformatic evaluation of gene co-expression often begins with correlation-based analyses. However, this approach lacks statistical validity when applied to relative data. This includes, for example, biological count data generated by high-throughput RNA-sequencing, chromatin immunoprecipitation (ChIP), ChIP-sequencing, Methyl-Capture sequencing, and other techniques. This package implements two metrics, phi [Lovell et al (2015) ] and rho [Erb and Notredame (2016) ], to provide a valid alternatives to correlation for relative data. Unlike correlation, these metrics give the same result for both relative and absolute data. Pairs that are strongly proportional in relative space are also strongly correlated in absolute space. Proportionality avoids the pitfall of spurious correlation.


Welcome to the propr GitHub page!

The bioinformatic evaluation of gene co-expression often begins with correlation-based analyses. However, this approach lacks statistical validity when applied to relative count data. This includes, for example, biological data produced by high-throughput RNA-sequencing, chromatin immunoprecipitation (ChIP), ChIP-sequencing, Methyl-Capture sequencing, and other techniques. This package provides a set of functions for measuring dependence between relative features using compositional data analysis. Specifically, this package implements two measures of proportionality, φ and ρ, introduced in Lovell 2015 and expounded in Erb 2016. You can get started with propr by installing the most up-to-date version of this package directly from GitHub.

library(devtools)
devtools::install_github("tpq/propr")
library(propr)

The principal functions in propr include: (1) phit, for the calculation of φ, and (2) perb, for the calculation of ρ. In the example below, we calculate proportionality for a simulated dataset, print the results as a proportionality matrix, then the index pairs of interest within the proportionality matrix. We refer you to the official package vignette for a comprehensive discussion of compositional data, proportionality, and everything this package has to offer.

set.seed(12345)
N <- 10
data.absolute <- data.frame(a=(1:N), b=(1:N) * rnorm(N, 10, 0.1),
                            c=(N:1), d=(N:1) * rnorm(N, 10, 1.0))
data.relative <- data.absolute / colSums(data.absolute)
phi <- phit(data.relative)
## Calculating phi from "count matrix".
phi@matrix
##             [,1]        [,2]       [,3]       [,4]
## [1,] 0.000000000 0.001894476 3.95056338 4.02312199
## [2,] 0.001894476 0.000000000 3.97849497 4.05353543
## [3,] 3.950563382 3.978494970 0.00000000 0.01119647
## [4,] 4.023121991 4.053535432 0.01119647 0.00000000
phi05 <- phi["<", .05]
phi05@pairs
## [1]  2 12
rho <- perb(data.relative)
## Calculating rho from "count matrix".
rho@matrix
##            [,1]       [,2]       [,3]       [,4]
## [1,]  1.0000000  0.9990459 -0.9985539 -0.9982335
## [2,]  0.9990459  1.0000000 -0.9981875 -0.9985699
## [3,] -0.9985539 -0.9981875  1.0000000  0.9945048
## [4,] -0.9982335 -0.9985699  0.9945048  1.0000000
rho99 <- rho[">", .99]
rho99@pairs
## [1]  2 12
  1. Erb, Ionas, and Cedric Notredame. “How Should We Measure Proportionality on Relative Gene Expression Data?” Theory in Biosciences = Theorie in Den Biowissenschaften 135, no. 1–2 (June 2016): 21–36. .

  2. Lovell, David, Vera Pawlowsky-Glahn, Juan José Egozcue, Samuel Marguerat, and Jürg Bähler. “Proportionality: A Valid Alternative to Correlation for Relative Data.” PLoS Computational Biology 11, no. 3 (March 2015): e1004075. .

News


  • Modified propr Class
    • Merged propr-class and propr documentation
  • Modified phit, perb functions
    • Merged phit and perb documentation
    • New phis function returns (1 - rho) / (1 + rho)
    • NAs in count matrix now throw error
    • 0s now replaced with 1s
  • Modified visualization tools
    • Merged documentation

  • Modified [ method
    • Now joins newly indexed pairs with any existing index
  • New cytescape function
    • Uses @pairs slot to build an interaction network

  • Modified visualization tools
    • Courtesy prompt argument extended to smear and dendrogram
    • Improved error handling and documentation
  • Modified abstract function
    • New dt argument indexes significant results in @pairs
  • Modified simplify function
    • Now builds index of lower left triangle of matrix
  • New adjacent function
    • Uses @pairs slot to build an adjacency matrix

  • Modified visualization tools
    • bucket now depends on slate function
  • Modified backend code
    • New coordToIndex performs inverse of indexToCoord
  • Modified prop2prob function
    • Return p-values as a sorted data.table
    • Now lets user select method for p-value adjustment
    • New prompt argument turns off big data prompt
    • Fix pass by reference bug in linRcpp
  • New abstract function
    • Combines two propr objects into one

  • New lrmodel class
    • Use modelCLR to capture the clr-transformation rule
    • Use predict to deploy this rule to new data
  • Modified backend code
    • Added corRcpp function from correlateR package
    • Added linRcpp function for Z-transformation
    • Added lltRcpp and urtRcpp to retrieve a half-matrix
    • Added labRcpp to label a half-matrix
  • New prop2prob function
    • Allows hypothesis testing of rho equals naught
    • Tests differential proportionality

  • Modified visualization tools
    • plotCheck extended to all plot functions
    • plot method now calls smear function
    • dendrogram plot now rendered using ggplot2
    • snapshot plot now rendered using ggplot2
    • bokeh plot now on positive log scale
    • plotly support added
  • Modified backend code
    • Temporarily removed a_bool function
  • Modified [ method
    • Removed bool and copy arguments

  • Modified backend code
    • New a_bool function returns thresholded boolean matrix
  • Modified [ method
    • New bool argument toggles whether to use a_bool
    • New tiny argument toggles whether to use simplify
    • New copy argument toggles a_bool copy-on-modify

  • New visualization tools
    • slate returns a table of VLR, VLS, and rho
    • bokeh plots pairs by the individual variances
  • Modified index-naive plot functions
    • Now uses fastcluster::hclust implementation
    • New prompt argument turns off big data prompt
    • prism now depends on slate function
  • Modified dendrogram function
    • Now uses fastcluster::hclust implementation
    • Now returns an hclust object

  • Modified subset method
    • Argument select now correctly rearranges features
  • Modified rhoRcpp function
    • Now accommodates new perb function feature
  • Modified perb function
    • New select argument returns subsetted matrix
    • This subset does not alter values of rho

  • Modified perb function
    • User can now specify name of ivar reference
  • Altered image method
    • Now includes dendrogram with heatmap
    • No longer uses index pairs
    • Now called snapshot
  • New prism function
  • New bucket function
  • New mds function
  • New vignette

  • Modified phit, perb functions
    • These functions now force zero removal
  • New simplify function
    • Subsets propr object based on index in @pairs slot
    • Returns an updated index

  • Modified phit, perb functions
    • Permutation testing removed
    • Added lazyPairs construct
      • Slot @pairs not populated until after [
  • Modified propr Class
    • @pairs slot now integer vector
      • Populated with indexPairs function
        • Translate with indexToCoord function
    • show method updated for lazyPairs construct
    • [ method completely redesigned
      • First argument specifies operation
      • Second argument specifies reference
      • Indexes @matrix based on these
    • subset method revised but still copy-on-modify
      • Resets @pairs when called
    • $ method removed
  • Visualization tools revised
    • plot, image, dendrogram methods
      • Improved performance
      • Compatible with new @pairs indexing
      • No longer requires column names
  • Modified backend code
    • Rephrased code for proprPhit
    • Rephrased code for proprPerb
    • Rephrased code for proprVLR
    • All functions translated into C++
      • Estimated 80% reduction in RAM overhead
      • Estimated 100-fold performance increase
      • ALR methods no longer drop dimension
      • All have modify-in-place behavior

  • New orientation expected for input data
    • Updated backend and vignette accordingly
    • Removed redundant transpositions
  • Fixed rare subsetting errors
  • Tweaked plot methods

  • Introduced phit function
    • Implements Lovell's phi proportionality metric
    • Returns object of class propr
  • Introduced perb function
    • Implements Erb's rho proportionality metric
    • Returns object of class propr
  • Introduced propr Class
    • show method
      • Subsets propr based on @pairs slot
    • subset method
      • Subsets propr based on @matrix slot
    • plot method
      • Plots pairwise *lr proportionality
    • dendrogram method
      • Plots clusters of *lr-transformed data
    • image method
      • Plots heatmap of *lr-transformed data

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("propr")

2.2.0 by Thomas Quinn, a month ago


http://github.com/tpq/propr


Report a bug at http://github.com/tpq/propr/issues


Browse source code at https://github.com/cran/propr


Authors: Thomas Quinn [aut, cre], David Lovell [aut], Ionas Erb [ctb], Anders Bilgrau [ctb], Greg Gloor [ctb]


Documentation:   PDF Manual  


GPL-2 license


Imports fastcluster, ggplot2, igraph, Rcpp, stats, utils

Depends on methods

Suggests ALDEx2, cccrm, compositions, data.table, grid, ggdendro, knitr, plotly, reshape2, rgl, rmarkdown, testthat

Linking to Rcpp


See at CRAN