A set of tools for parsing, manipulating, and graphing data classified by a hierarchy (e.g. a taxonomy).
Metacoder is an R package for reading, plotting, and manipulating large taxonomic data sets, like those generated from modern high-throughput sequencing, like metabarcoding (i.e. amplification metagenomics, 16S metagenomics, etc). It provides a tree-based visualization called "heat trees" used to depict statistics for every taxon in a taxonomy using color and size. It also provides various functions to do common tasks in microbiome bioinformatics on data in the taxmap
format defined by the taxa
package, such as:
vegan
taxa
package.phyloseq
format and the taxa
formatThis project is available on CRAN and can be installed like so:
install.packages("metacoder")
You can also install the development version for the newest features, bugs, and bug fixes:
install.packages("devtools")devtools::install_github("grunwaldlab/metacoder")
All the documentation for metacoder
can be found on our website here:
https://grunwaldlab.github.io/metacoder_documentation/
The function that simulates PCR requires primersearch
from the EMBOSS tool kit to be installed. This is not an R package, so it is not automatically installed. Type ?primersearch
after installing and loading metacoder for installation instructions.
Many of these operations can be done using other packages like phyloseq
, which also provides tools for diversity analysis. The main strength of metacoder
is that its functions use the flexible data types defined by taxa
, which has powerful parsing and subsetting abilities that take into account the hierarchical relationship between taxa and user-defined data. In general, metacoder
and taxa
are more of an abstracted tool kit, whereas phyloseq
has more specialized functions for community diversity data, but they both can do similar things. I encourage you to try both to see which fits your needs and style best. You can also combine the two in a single analysis by converting between the two data types when needed.
If you use metcoder in a publication, please cite our article in PLOS Computational Biology:
Foster ZSL, Sharpton TJ, Grünwald NJ (2017) Metacoder: An R package for visualization and manipulation of community taxonomic diversity data. PLOS Computational Biology 13(2): e1005404. https://doi.org/10.1371/journal.pcbi.1005404
Metacoder is under active development and many new features are planned. Some improvements that are being explored include:
To see the details of what is being worked on, check out the issues tab of the Metacoder Github site.
This work is subject to the MIT License.
Metacoder's major dependencies are taxa
, taxize
, vegan
, igraph
, dplyr
, and ggplot2
.
This package includes code from the R package ggrepel to handle label overlap avoidance with permission from the author of ggrepel Kamil Slowikowski. We included the code instead of depending on ggrepel
because we are using functions internal to ggrepel
that might change in the future. We thank Kamil Slowikowski for letting us use his code and would like to acknowledge his implementation of the label overlap avoidance used in metacoder.
We would like to hear about users' thoughts on the package and any errors they run into. Please report errors, questions or suggestions on the issues tab of the Metacoder Github site. We also welcome contributions via a Github pull request. You can also talk with us using our Google groups site.
calc_n_samples
where the message reported the number of taxa instead of the number of rows in the table.heat_tree_matrix
that happened when factors were used for treatments (issue #240.zero_low_counts
now ignores NA
s instead of odd error.compare_groups
now ignores NA
s instaed of returning NaN
more_than
option to calc_n_samples
so that users can set the minimum threshold for whether a sample is counted or not instead of it always 1.calc_prop_samples
function for calculating the proportion of samples with a value greater than 0 (issues #233.primersearch
is faster and takes less memory by using ape::DNAbin
objects internally.calc_taxon_abund
about 5x faster.taxmap
objects can be converted to phyloseq
objects using as_phyloseq
.primersearch
now takes and returns a taxmap
object with results added as tables. primersearch_raw
is a new function that behaves like the old primersearch
did, returning a table.dataset
option of many functions has been renamed to data
to match the option name in the taxa
package.heat_tree_matrix
that happen when the input data is not exactly like that produced by compare_groups
(issues #195, #196, #197).output_file
was used with heat_tree_matrix
. Now whole plot is saved instead of last subplot. (issue #203)parse_mothur_tax_summary
when reading from a file path (issue #211).zero_low_counts
when using use_total = TRUE
(issue #227).parse_phyloseq
error when arbitrary rank names were used.heat_tree_matrix
always has a 1:1 aspect ratio. (issue #205)subtaxa
that caused an error when all of subset
is FALSE
. (issue #143)filter_taxa
that caused an error when all taxa are filtered out. (issue #144)heat_tree
now uses the taxmap
class defined in the taxa package.primersearch
primersearch
output to be cleaner and have info like the amplicon sequence and primer binding sites.heat_tree_matrix
to make plotting a pairwise matrix of heat trees for comparing treatments.parse_mothur_tax_summary
for mothur *.tax.summary file made by classify.seqs.parse_mothur_taxonomy
for mothur *.taxonomy file made by classify.seqs.parse_qiime_biom
for the QIIME BIOM output.parse_phyloseq
to convert phyloseq objects.parse_newick
to parse newick files.parse_unite_general
for unite general FASTA release. (issue #154)parse_rdp
for RDP FASTA release. (issue #160)parse_silva_fasta
for SILVA FASTA release. (issue #162)calc_obs_props
to calculate proportions from observation counts (issue #167parse_greengenes
for the Greengenes database. (issue #?)write_greengenes
to create an imitation of the Greengenes database format.write_rdp
to create an imitation of the RDP database format.write_mothur_taxonomy
to create an imitation of the mothur taxonomy format.write_unite_general
to create an imitation of the UNITE general FASTA release.write_silva_fasta
to create an imitation of the SILVA FASTA release.compare_treatments
to compare multiple samples in multiple treatments, applying a user-defined function.calc_taxon_abund
to sum observation values for each taxon.col_names
option to calc_taxon_abund
to set names of output columns.evaluation nested too deeply: infinite recursion / options(expressions=)?
occurs due to too many labels being printed.heat_tree
: improved how the predicted bondries of text is calcuated, so text with any rotation, justification, or newlines influences margins correctly (i.e. does not get cut off).heat_tree
: Can now save multiple file outputs in different formats at onceheat_tree
now gives a warning if infinite values are given to itextract_taxonomy
: There is now a warning message if class regex does not match (issue #123)heat_tree
: Increased lengend text size and reduced number of labelsextract_taxonomy
: added batch_size
option to help deal with invalid IDs betterheat_tree
option margin_size
funcion now takes four values instead of 2.heat_tree
: Fixed bug when color is set explicitly (e.g. "grey") instead of raw numbers and the legend is not removed. Now a mixure of raw numbers and color names can be used.heat_tree
that made values not in the input taxmap object not associate with the right taxa. See this post.extract_taxonomy
: Fixed an error that occured when not all inputs could be classified and sequences were suppliedprimersearch
that cased the wrong primer sequence to be returned when primers match in the reverse directionparse_mothur_summary
where "unclassified" had got changed to "untaxmap" during a search and replaceextract_taxonomy
mutate_taxa
and mutate_obs
that made replacing columns result in new columns with duplicate names.plot_taxonomy
and the plot
method have been renamed heat_tree
.taxon_levels
have been replaced with n_supertaxa
to make names conceptually consistent. Note that this means what was 1
as taxon_levels
is now 0
as n_supertaxa
.n_subtaxa
and n_subtaxa_1
functionstaxmap
objects!print
method for taxmap
objectsextract_taxonomy
works on SeqFastadna
class from seqinr
parse_mothur_summary
function: parses the mothur summary tableremove_redundant_names
function: removes components of names of taxa in subtaxa