Provides methods for high-throughput adaptive immune
receptor repertoire sequencing (AIRR-Seq; Rep-Seq) analysis. In
particular, immunoglobulin (Ig) sequence lineage reconstruction,
lineage topology analysis, diversity profiling, amino acid property
analysis and gene usage.
Citations:
Gupta and Vander Heiden, et al (2017)
Alakazam is part of the Immcantation analysis framework for Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) and provides a set of tools to investigate lymphocyte receptor clonal lineages, diversity, gene usage, and other repertoire level properties, with a focus on high-throughput immunoglobulin (Ig) sequencing.
Alakazam serves five main purposes:
For help and questions please contact the Immcantation Group or use the issue tracker.
General:
nonsquareDist
function to calculate the non-square distance matrix of
sequences.progressBar
, baseTheme
, checkColumns
and cpuCount
.Diversity:
estimateAbundance
, and plotAbundanceCurve
, will now allow group=NULL
to be specified to performance abundance calculations on ungrouped data.Gene Usage:
fill
argument to countGenes
. When set TRUE
this adds zeroes
to the group
pairs that do not exist in the data.groupGenes
to group sequences sharing same V and J gene.Toplogy Analysis:
indirect=TRUE
.makeChangeoClone
will now issue an error and terminate, instead of
continuing with a warning, when all sequences are not the same length.General:
IPUAC_AA
wherein X was not properly matching against Q.getAAMatrix
to treat * (stop codon) as a mismatch.General:
readChangeoDb
.padSeqEnds
function which pads sequences with Ns to make
then equal in length.collapseDuplicates
.Diversity:
uniform
argument to rarefyDiversity
allowing users to toggle
uniform vs non-uniform sampling.plotAbundance
to plotAbundanceCurve
.estimateAbundance
return object from a data.frame to a new
AbundanceCurve
custom class.plot
call for AbundanceCurve
to plotAbundanceCurve
.annotate
argument from plotDiversityCurve
to
plotAbundanceCurve
.score
argument to plotDiversityCurve
to toggle between
plotting diversity or evenness.plotDiversityTest
to generate a simple plot of
DiversityTest
object summaries.Gene Usage:
omit_nl
argument to getAllele
, getGene
and getFamily
to
allow optional filtering of non-localized (NL) genes.Lineage:
makeChangeoClone
preventing it from interpreting the id
argument correctly.pad_end
argument to makeChangeoClone
to allow automatic
padding of ends to make sequences the same length.General:
dry
argument to collapseDuplicates
which will annotate duplicate
sequences but not remove them when set to TRUE
.collapseDuplicates
was returning one sequence if all
sequences were considered ambiguous.Lineage:
makeChangeoClone
and buildPhylipLineage
for purposes of (optionally)
treating indels as mismatches.buildPhylipLineage
when PHYLIP doesn't generate inferred
sequences and has only one block.General:
readChangeoDb
causing the select
argument to do nothing.Gene Usage:
countGenes
when the clone
argument
is specified to CLONE_COUNT
/CLONE_FREQ
.General:
readChangeoDb
and writeChangeoDb
.General:
seqDist()
wherein distance was not properly calculated in
some sequences containing gap characters.getAAMatrix()
return matrix.General:
readChangeoDb()
to wrap data.table::fread()
instead of
utils::read.table()
if the input file is not compressed.testSeqEqual()
, getSeqDistance()
and getSeqMatrix()
to C++ to
improve performance of collapseDuplicates()
and other dependent functions.testSeqEqual()
, getSeqDistance()
and getSeqMatrix()
to
seqEqual()
, seqDist()
and pairwiseDist()
, respectively.pairwiseEqual()
which creates a logical sequence distance matrix;
TRUE if sequences are identical, FALSE if not, excluding Ns and gaps.X
in
translateDNA()
.collapseDuplicates()
wherein the input data type sanity check
would cause the vignette to fail to build under R 3.3.ExampleDb.gz
file with a larger, more clonal, ExampleDb
data object.ExampleTrees
with a larger set of trees.multiggplot()
to gridPlot()
.Amino Acid Analysis:
normalize=FALSE
for charge calculations to be more consistent
with previously published repertoire sequencing results.Diversity Analysis:
progress
argument to rarefyDiversity()
and testDiversity()
to
enable the (previously default) progress bar.estimateAbundance()
were the function would fail if there
was only a single input sequence per group.data
and summary
slots of DiversityTest
to
uppercase for consistency with other tools.plot
to plotDiversityCurve
for DiversityCurve
objects.Gene Usage:
sortGenes()
function to sort V(D)J genes by name or locus position.clone
argument to countGenes()
to allow restriction of gene
abundance to one gene per clone.Topology Analysis:
General:
base::nchar()
.General:
Amino Acid Analysis:
aliphatic()
function were not being
passed through the ellipsis argument of aminoAcidProperties()
.aminoAcidProperties()
.AA_TRANS
to ABBREV_AA
.Diversity:
rarefyDiversity()
output.Lineage:
ExampleTrees
data with example output from buildPhylipLineage()
.General:
getDNADistMatrix()
and getAADistMatrix()
to getDNAMatrix
and
getAAMatrix()
, respectively.getSeqMatrix()
which calculates a pairwise distance matrix for a set
of sequences.multiggplot()
function for performing multiple panel plots.Amino Acid Analysis:
gravy()
, bulk()
, aliphatic()
, polar()
,
charge()
, countPatterns()
and aminoAcidProperties()
.Annotation:
getSegment()
, getAllele()
, getGene()
and getFamily()
. May be
disabled by providing the argument strip_d=FALSE
.countGenes()
to tabulate V(D)J allele, gene and family usage.Diversity:
countClones()
, estimateAbundance()
and plotAbundance()
.resampleDiversity()
to rarefyDiversity()
and changed many of
the internals. Bootstrapping is now performed on an inferred complete
relative abundance distribution.rarefyDiversity()
and testDiversity()
.rarefyDiversity()
and testDiversity()
are now calculated using the mean and standard
deviation of the bootstrap realizations, rather than the median and
upper/lower quantiles.plotDiversityCurve()
.Initial public release.
General:
citation("alakazam")
command.Lineage:
buildPhylipLineage()
.Lineage:
buildPhylipLineage()
would hang on R 3.2 due to R change
request PR#15508.Prerelease for review.