Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data

A set of tools for parsing, manipulating, and graphing data classified by a hierarchy (e.g. a taxonomy).


Build Status codecov.io Downloads from Rstudio mirror per month Downloads from Rstudio mirror CRAN version

An R package for metabarcoding research planning and analysis

Metacoder is an R package for reading, plotting, and manipulating large taxonomic data sets, like those generated from modern high-throughput sequencing, like metabarcoding (i.e. amplification metagenomics, 16S metagenomics, etc). It provides a tree-based visualization called "heat trees" used to depict statistics for every taxon in a taxonomy using color and size. It also provides various functions to do common tasks in microbiome bioinformatics on data in the taxmap format defined by the taxa package, such as:

  • Summing read counts/abundance per taxon
  • Converting counts to proportions and rarefaction of counts using vegan
  • Comparing the abundance (or other characteristics) of groups of samples (e.g., experimental treatments) per taxon
  • Combining data for groups of samples
  • Simulated PCR, via EMBOSS primersearch, for testing primer specificity and coverage of taxonomic groups
  • Converting common microbiome formats for data and reference databases into the objects defined by the taxa package.
  • Converting to and from the phyloseq format and the taxa format

Installation

This project is available on CRAN and can be installed like so:

install.packages("metacoder")

You can also install the development version for the newest features, bugs, and bug fixes:

install.packages("devtools")
devtools::install_github("grunwaldlab/metacoder")

Documentation

All the documentation for metacoder can be found on our website here:

https://grunwaldlab.github.io/metacoder_documentation/

Dependencies

The function that simulates PCR requires primersearch from the EMBOSS tool kit to be installed. This is not an R package, so it is not automatically installed. Type ?primersearch after installing and loading metacoder for installation instructions.

Relationship with other packages

Many of these operations can be done using other packages like phyloseq, which also provides tools for diversity analysis. The main strength of metacoder is that its functions use the flexible data types defined by taxa, which has powerful parsing and subsetting abilities that take into account the hierarchical relationship between taxa and user-defined data. In general, metacoder and taxa are more of an abstracted tool kit, whereas phyloseq has more specialized functions for community diversity data, but they both can do similar things. I encourage you to try both to see which fits your needs and style best. You can also combine the two in a single analysis by converting between the two data types when needed.

Citation

If you use metcoder in a publication, please cite our article in PLOS Computational Biology:

Foster ZSL, Sharpton TJ, Gr├╝nwald NJ (2017) Metacoder: An R package for visualization and manipulation of community taxonomic diversity data. PLOS Computational Biology 13(2): e1005404. https://doi.org/10.1371/journal.pcbi.1005404

Future development

Metacoder is under active development and many new features are planned. Some improvements that are being explored include:

  • Barcoding gap analysis and associated plotting functions
  • A function to aid in retrieving appropriate sequence data from NCBI for in silico PCR from whole genome sequences
  • Graphing of different node shapes in heat trees, possibly including pie graphs or PhyloPics.
  • Adding the ability to plot specific edge lengths in the heat trees so they can be used for phylogenetic trees.
  • Adding more data import and export functions to make parsing and writing common formats easier.

To see the details of what is being worked on, check out the issues tab of the Metacoder Github site.

License

This work is subject to the MIT License.

Acknowledgements

Metacoder's major dependencies are taxa, taxize, vegan, igraph, dplyr, and ggplot2.

This package includes code from the R package ggrepel to handle label overlap avoidance with permission from the author of ggrepel Kamil Slowikowski. We included the code instead of depending on ggrepel because we are using functions internal to ggrepel that might change in the future. We thank Kamil Slowikowski for letting us use his code and would like to acknowledge his implementation of the label overlap avoidance used in metacoder.

Feedback and contributions

We would like to hear about users' thoughts on the package and any errors they run into. Please report errors, questions or suggestions on the issues tab of the Metacoder Github site. We also welcome contributions via a Github pull request. You can also talk with us using our Google groups site.

News

News

metacoder 0.3.0

Bug fixes

  • Fixed bug in calc_n_samples where the message reported the number of taxa instead of the number of rows in the table.
  • Fixed bug in heat_tree_matrix that happened when factors were used for treatments (issue #240.
  • zero_low_counts now ignores NAs instead of odd error.
  • compare_groups now ignores NAs instaed of returning NaN

Improvements

  • Added more_than option to calc_n_samples so that users can set the minimum threshold for whether a sample is counted or not instead of it always 1.
  • Added calc_prop_samples function for calculating the proportion of samples with a value greater than 0 (issues #233.
  • primersearch is faster and takes less memory by using ape::DNAbin objects internally.
  • Made calc_taxon_abund about 5x faster.

New features

  • taxmap objects can be converted to phyloseq objects using as_phyloseq.
  • Added parser for uBiome data.

Changes

  • primersearch now takes and returns a taxmap object with results added as tables. primersearch_raw is a new function that behaves like the old primersearch did, returning a table.
  • The dataset option of many functions has been renamed to data to match the option name in the taxa package.
  • Numerous spelling fixes.

metacoder 0.2.1

Bug fixes

  • Fixes numerous bugs in heat_tree_matrix that happen when the input data is not exactly like that produced by compare_groups (issues #195, #196, #197).
  • Fixed how output_file was used with heat_tree_matrix. Now whole plot is saved instead of last subplot. (issue #203)
  • Fixed "unused argument" bug in parse_mothur_tax_summary when reading from a file path (issue #211).
  • Fixed bug when in zero_low_counts when using use_total = TRUE (issue #227).
  • Numerous other small fixes.
  • Fixed parse_phyloseq error when arbitrary rank names were used.

Improvements

  • Node and edge legends can now be excluded individually (Thanks @grabear!) (issue #202).
  • The output of heat_tree_matrix always has a 1:1 aspect ratio. (issue #205)
  • Numerous calculation functions added, with more consistent behavior.

metacoder 0.2.0

Bug fixes

  • Fixed bug in subtaxa that caused an error when all of subset is FALSE. (issue #143)
  • Fixed bug in filter_taxa that caused an error when all taxa are filtered out. (issue #144)

Breaking changes

  • All taxmap-related manipulation functions have been moved to the taxa package.
  • heat_tree now uses the taxmap class defined in the taxa package.
  • Numerous changes (i.e. upgrades) to primersearch

Improvements

  • Upgraded primersearch output to be cleaner and have info like the amplicon sequence and primer binding sites.
  • Added functions to identift and remove taxa with ambiguous names like "unknown"
  • code from ggrepel package now used to avoid overlapping labels. Thanks Kamil Slowikowski!
  • New function heat_tree_matrix to make plotting a pairwise matrix of heat trees for comparing treatments.
  • New parser named parse_mothur_tax_summary for mothur *.tax.summary file made by classify.seqs.
  • New parser named parse_mothur_taxonomy for mothur *.taxonomy file made by classify.seqs.
  • New parser named parse_qiime_biom for the QIIME BIOM output.
  • New parser named parse_phyloseq to convert phyloseq objects.
  • New parser named parse_newick to parse newick files.
  • New parser named parse_unite_general for unite general FASTA release. (issue #154)
  • New parser named parse_rdp for RDP FASTA release. (issue #160)
  • New parser named parse_silva_fasta for SILVA FASTA release. (issue #162)
  • New function calc_obs_props to calculate proportions from observation counts (issue #167
  • New parser named parse_greengenes for the Greengenes database. (issue #?)
  • New writer named write_greengenes to create an imitation of the Greengenes database format.
  • New writer named write_rdp to create an imitation of the RDP database format.
  • New writer named write_mothur_taxonomy to create an imitation of the mothur taxonomy format.
  • New writer named write_unite_general to create an imitation of the UNITE general FASTA release.
  • New writer named write_silva_fasta to create an imitation of the SILVA FASTA release.
  • New function named compare_treatments to compare multiple samples in multiple treatments, applying a user-defined function.
  • New function named calc_taxon_abund to sum observation values for each taxon.
  • Added col_names option to calc_taxon_abund to set names of output columns.

metacoder 0.1.3

Improvements

  • Provided helpful error message when the evaluation nested too deeply: infinite recursion / options(expressions=)? occurs due to too many labels being printed.
  • heat_tree: improved how the predicted bondries of text is calcuated, so text with any rotation, justification, or newlines influences margins correctly (i.e. does not get cut off).
  • heat_tree: Can now save multiple file outputs in different formats at once

Minor changes

  • heat_tree now gives a warning if infinite values are given to it
  • extract_taxonomy: There is now a warning message if class regex does not match (issue #123)
  • heat_tree: Increased lengend text size and reduced number of labels
  • extract_taxonomy: added batch_size option to help deal with invalid IDs better
  • Added CITATION file

Breaking changes

  • The heat_tree option margin_size funcion now takes four values instead of 2.

Bug fixes

  • heat_tree: Fixed bug when color is set explicitly (e.g. "grey") instead of raw numbers and the legend is not removed. Now a mixure of raw numbers and color names can be used.
  • Fixed bugs caused by dplyr version update
  • Fixed bug in heat_tree that made values not in the input taxmap object not associate with the right taxa. See this post.
  • extract_taxonomy: Fixed an error that occured when not all inputs could be classified and sequences were supplied
  • Fixed bug in primersearch that cased the wrong primer sequence to be returned when primers match in the reverse direction
  • Fixed a bug in parse_mothur_summary where "unclassified" had got changed to "untaxmap" during a search and replace
  • Fixed outdated example code for extract_taxonomy
  • Fixed a bug in mutate_taxa and mutate_obs that made replacing columns result in new columns with duplicate names.

metacoder 0.1.2

Breaking changes

  • plot_taxonomy and the plot method have been renamed heat_tree.

New features

  • New introduction vignette
  • Various minor bug fixes

metacoder 0.1.1

Breaking changes

  • taxon_levels have been replaced with n_supertaxa to make names conceptually consistent. Note that this means what was 1 as taxon_levels is now 0 as n_supertaxa.

New features

  • Added n_subtaxa and n_subtaxa_1 functions
  • Added taxonomy parsing examples to vignettes

metacoder 0.1.0

Breaking changes

  • Many options and functions have been renamed (#115)

New features

  • dplyr functions for taxmap objects!
  • Added a print method for taxmap objects
  • new SILVA example data set
  • extract_taxonomy works on SeqFastadna class from seqinr
  • parse_mothur_summary function: parses the mothur summary table
  • remove_redundant_names function: removes components of names of taxa in subtaxa

Changes

  • Core functions are much faster
  • More tests
  • Updated vignettes
  • Many bug fixes and minor upgrades
  • Legend now moves into plot if there is room (#118)

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.