Compositional Analysis of Differentially Expressed Proteins in Cancer

Compositional analysis of differentially expressed proteins in cancer and cell culture proteomics experiments. The data include lists of up- and down-regulated proteins in different cancer types (breast, colorectal, liver, lung, pancreatic, prostate) and laboratory conditions (hypoxia, hyperosmotic stress, high glucose, 3D cell culture, and proteins secreted in hypoxia), together with amino acid compositions computed for protein sequences obtained from UniProt. Functions are provided to calculate compositional metrics including protein length, carbon oxidation state, and stoichiometric hydration state. In addition, phylostrata (evolutionary ages) of protein-coding genes are compiled using data from Liebeskind et al. (2016) or Trigos et al. (2017) . The vignettes contain plots of compositional differences, phylostrata for human proteins, and references for all datasets.

Datasets are collected here for differentially (up- and down-) expressed proteins identified in proteomic studies of cancer and in cell culture experiments. Tables of amino acid compositions of proteins are used for calculations of chemical composition, projected into selected basis species. Plotting functions are used to visualize the compositional differences and thermodynamic potentials for proteomic transformations.

This package has been developed to support a research project that has been reported in two papers in PeerJ (2016 and 2017).

Installation from CRAN


Installation from Github

First install the devtools package from CRAN:


Then install canprot from Github:


Building vignettes

To install the package including the vignettes:

devtools::install_github("jedick/canprot", build_vignettes = TRUE)

You may need to re-run this command one or more times. Note that this pulls in more R packages as dependencies, and pandoc is also required.


CHANGES IN canprot 0.1.2 (2019-02-26)

  • Replace data(canprot) with automatic loading of data when package loads, into an environment that is now an exported object ('canprot').

  • Because of similar changes in CHNOSZ, we now need library(CHNOSZ) in more places in examples and vignettes.

CHANGES IN canprot 0.1.1 (2018-01-18)

  • New function get_comptab() merges and replaces ZC_nH2O() and CNS(), and adds capability to calculate standard molal volumes.

  • Add protein length ('nAA') as variable in get_comptab().

  • Add 'mfun' argument to get_comptab() to choose median or mean.

  • Add 'vars' argument to xsummary() to choose variables to tabulate.

  • In pdat_ functions, add =NT tag for datasets involving comparisons with normal tissue.

  • Use precomputed colors to remove colorspace dependency.

  • DESCRIPTION: Add KernSmooth to Suggests to avoid R CMD check error (it is needed for smoothScatter() in basis_comparison.Rmd).

CHANGES IN canprot 0.1.0 (2017-06-13)

  • Add basis_comparison.Rmd and potential_diagrams.Rmd.

  • New functions groupplots() to make potential diagrams for groups of datasets and mergedplot() to merge those diagrams.

  • First release on CRAN.

CHANGES IN canprot 0.0.5 (2017-05-04)

  • Remove internal setbasis(); use CHNOSZ's basis() instead.

CHANGES IN canprot 0.0.4 (2017-03-19)

  • New function CNS() calculates proteomic differences of elemental abundances per residue.

  • Modify diffplot() to accept output from either ZC_nH2O() or CNS().

  • Change "AA" and "AA4" in setbasis() to "QEC" and "QEC4"; add "QEC+" (basis including H+).

CHANGES IN canprot 0.0.3 (2017-01-01)

  • New export: get_colors().

  • Plot text labels in diffplot().

  • Return values in rankplot() and xsummary().

  • Change chemical activities in setbasis("AA") (use setbasis("AA4") for old ones).

  • Move protein expression data to extdata/expression/[condition name]/.

  • Add LXM+16 dataset for colorectal cancer.

  • Add datasets from 17 studies for pancreatic cancer.

  • Add datasets from 20 studies for hypoxia or 3D culture.

  • Add datasets from 13 studies for hyperosmotic stress.

CHANGES IN canprot 0.0.2 (2016-07-25)

  • Add 'updates_file' argument to check_ID() and protcomp().

  • Rename stabplot() to rankplot().

  • Initial upload to GitHub.

CHANGES IN canprot 0.0.1 (2016-07-16)

  • Package development began on 2016-07-03, based on code and data in Supplemental Information Dataset S1 of Dick, 2016 (

  • Exported functions (in approximate order of development): "protcomp", "check_ID", "get_pdat", "ZC_nH2O", "CLES", "xsummary", "rankdiff", "stabplot", "Ehplot", "pdat_CRC", "remove_entries", "diffplot", "lapply_canprot".

  • Datasets in 'canprot' environment: human_base.Rdata (21006 proteins), human_additional.Rdata (71173 proteins), human_extra.csv (72 proteins), uniprot_updates.csv (26 proteins).

  • Datasets in inst/extdata: AKP+10.csv, BPV+11.csv, JCF+11.csv, JKMF10.csv, KKL+12.csv, KWA+14.csv, KYK+12.csv, LPL+16.csv, MCZ+13.csv, MRK+11.csv, PHL+16.csv, STK+15.csv, UNS+14.csv, WDO+15.csv, WKP+14.csv, WOD+12.csv, WTK+08.csv, XZC+10.csv, YLZ+12.csv, ZYS+10.csv.

  • Vignettes: data_sources.Rmd, summary_table.Rmd, stability_plots.Rmd.

1.1.0 by Jeffrey Dick, 9 days ago

Browse source code at

Authors: Jeffrey Dick [aut, cre] , Ben Bolker [ctb]

Documentation:   PDF Manual  

GPL (>= 2) license

Imports xtable, MASS, rmarkdown

Suggests CHNOSZ, knitr, testthat

See at CRAN