Immunoglobulin Clonal Lineage and Diversity Analysis

Provides immunoglobulin (Ig) sequence lineage reconstruction, diversity profiling, and amino acid property analysis.


Alakazam is part of the Immcantation analysis framework for Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) and provides a set of tools to investigate lymphocyte receptor clonal lineages, diversity, gene usage, and other repertoire level properties, with a focus on high-throughput immunoglobulin (Ig) sequencing.

Alakazam serves five main purposes:

  1. Providing core functionality for other R packages in the Immcantation framework. This includes common tasks such as file I/O, basic DNA sequence manipulation, and interacting with V(D)J segment and gene annotations.
  2. Providing an R interface for interacting with the output of the pRESTO and Change-O tool suites.
  3. Performing lineage reconstruction on clonal populations of Ig sequences and analyzing the topology of the resultant lineage trees.
  4. Performing clonal abundance and diversity analysis on lymphocyte repertoires.
  5. Performing physicochemical property analyses of lymphocyte receptor sequences.

News

General:

  • Fixed a bug in seqDist() wherein distance was not properly calculated in some sequences containing gap characters.
  • Added stop and gap characters to getAAMatrix() return matrix.

General:

  • Added Rcpp and data.table dependencies.
  • Modified readChangeoDb() to wrap data.table::fread() instead of utils::read.table() if the input file is not compressed.
  • Ported testSeqEqual(), getSeqDistance() and getSeqMatrix() to C++ to improve performance of collapseDuplicates() and other dependent functions.
  • Renamed testSeqEqual(), getSeqDistance() and getSeqMatrix() to seqEqual(), seqDist() and pairwiseDist(), respectively.
  • Added pairwiseEqual() which creates a logical sequence distance matrix; TRUE if sequences are identical, FALSE if not, excluding Ns and gaps.
  • Added translation of ambiguous and gap characters to X in translateDNA().
  • Fixed bug in collapseDuplicates() wherein the input data type sanity check would cause the vignette to fail to build under R 3.3.
  • Replaced the ExampleDb.gz file with a larger, more clonal, ExampleDb data object.
  • Replaced ExampleTrees with a larger set of trees.
  • Renamed multiggplot() to gridPlot().

Amino Acid Analysis:

  • Set default to normalize=FALSE for charge calculations to be more consistent with previously published repertoire sequencing results.

Diversity Analysis:

  • Added a progress argument to rarefyDiversity() and testDiversity() to enable the (previously default) progress bar.
  • Fixed a bug in estimateAbundance() were the function would fail if there was only a single input sequence per group.
  • Changed column names in data and summary slots of DiversityTest to uppercase for consistency with other tools.
  • Added dispatching of plot to plotDiversityCurve for DiversityCurve objects.

Gene Usage:

  • Added sortGenes() function to sort V(D)J genes by name or locus position.
  • Added clone argument to countGenes() to allow restriction of gene abundance to one gene per clone.

Topology Analysis:

  • Added a set of functions for lineage tree topology analysis.
  • Added a vignette showing basic tree topology analysis.

General:

  • Fixed a bug wherein the package would not build on R < 3.2.0 due to changes in base::nchar().
  • Changed R dependency to R >= 3.1.2.

General:

  • Updated license from CC BY-NC-SA 3.0 to CC BY-NC-SA 4.0.
  • Internal changes to conform to CRAN policies.

Amino Acid Analysis:

  • Fixed bug where arguments for the aliphatic() function were not being passed through the ellipsis argument of aminoAcidProperties().
  • Improved amino acid analysis vignette.
  • Added check for correctness of amino acids sequences to aminoAcidProperties().
  • Renamed AA_TRANS to ABBREV_AA.

Diversity:

  • Added evenness and bootstrap standard deviation to rarefyDiversity() output.

Lineage:

  • Added ExampleTrees data with example output from buildPhylipLineage().

General:

  • Removed plyr dependency.
  • Added dplyr, lazyeval and stringi dependencies.
  • Added strict requirement for igraph version >= 1.0.0.
  • Renamed getDNADistMatrix() and getAADistMatrix() to getDNAMatrix and getAAMatrix(), respectively.
  • Added getSeqMatrix() which calculates a pairwise distance matrix for a set of sequences.
  • Modified default plot sizing to be more appropriate for export to PDF figures with 7-8 inch width.
  • Added multiggplot() function for performing multiple panel plots.

Amino Acid Analysis:

  • Migrated amino acid property analysis from Change-O CTL to alakazam. Includes the new functions gravy(), bulk(), aliphatic(), polar(), charge(), countPatterns() and aminoAcidProperties().

Annotation:

  • Added support for unusual TCR gene names, such as 'TRGVA*01'.
  • Added removal of 'D' label (gene duplication) from gene names when parsed with getSegment(), getAllele(), getGene() and getFamily(). May be disabled by providing the argument strip_d=FALSE.
  • Added countGenes() to tabulate V(D)J allele, gene and family usage.

Diversity:

  • Added several functions related to analysis of clone size distributions, including countClones(), estimateAbundance() and plotAbundance().
  • Renamed resampleDiversity() to rarefyDiversity() and changed many of the internals. Bootstrapping is now performed on an inferred complete relative abundance distribution.
  • Added support for inclusion of copy number in clone size determination within rarefyDiversity() and testDiversity().
  • Diversity scores and confiderence intervals within rarefyDiversity() and testDiversity() are now calculated using the mean and standard deviation of the bootstrap realizations, rather than the median and upper/lower quantiles.
  • Added ability to add counts to the legend in plotDiversityCurve().

Initial public release.

General:

  • Added citations for the citation("alakazam") command.

Lineage:

  • Added more error checking to buildPhylipLineage().

Lineage:

  • Fixed issue where buildPhylipLineage() would hang on R 3.2 due to R change request PR#15508.

Prerelease for review.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("alakazam")

0.2.7 by Jason Vander Heiden, 9 days ago


http://alakazam.readthedocs.org


Report a bug at https://bitbucket.org/kleinstein/alakazam/issues


Browse source code at https://github.com/cran/alakazam


Authors: Jason Vander Heiden [aut, cre], Namita Gupta [aut], Susanna Marquez [ctb], Daniel Gadala-Maria [ctb], Roy Jiang [ctb], Steven Kleinstein [aut, cph]


Documentation:   PDF Manual  


CC BY-SA 4.0 license


Imports dplyr, graphics, grid, igraph, lazyeval, methods, progress, Rcpp, readr, scales, seqinr, stats, stringi, utils

Depends on ggplot2

Suggests knitr, rmarkdown, testthat

Linking to Rcpp

System requirements: C++11


Imported by shazam, tigger.


See at CRAN