Last updated on 20211109
by Giovanni Montana
Great advances have been made in the field of genetic analysis over the last years. The availability of millions
of single nucleotide polymorphisms (SNPs) in widely available databases, coupled with major advances in SNP genotyping
technology that reduce costs and increase throughput, are enabling a host of studies aimed at elucidating the genetic basis
of complex disease. The focus in this task view is on R packages implementing statistical methods and algorithms for the
analysis of genetic data and for related population genetics studies.
A number of R packages are already available and many more are most likely to be developed in the near future.
Please send your comments and suggestions to the task view maintainer.

Population Genetics
:
genetics
implements classes and methods for representing genotype and haplotype data, and has several
functions for population genetic analysis (e.g. functions for estimation and testing of
HardyWeinberg and linkage disequilibria, etc.).
A few population genetics functions are also implemented in
gap.
hwde
fits models for genotypic disequilibria. Whilst
HardyWeinberg
provides graphical representation of disequilibria via ternary plots (also known as de Finetti diagrams).
Biodem
package provides functions for Biodemographical analysis, e.g.
Fst()
calculates the Fst from the conditional kinship matrix. The
adegenet package implements a number of different methods for analysing population structure using multivariate
statistics, graphics and spatial statistics.
The hierfstat package allows the estimation of hierarchical Fstatistics from haploid or diploid genetic data with any numbers of levels in the hierarchy.

Phylogenetics
:
The Phylogenetics view has more detailed information,
the most important packages are also mentioned here.
Phylogenetic and evolution analyses can be performed via
ape. Package
ouch
provides OrnsteinUhlenbeck models for phylogenetic comparative hypotheses.
phangorn
estimates phylogenetic trees and networks using maximum likelihood, maximum parsimony, distance
methods and Hadamard conjugation.

Linkage
:
There are few native packages for performing parametric or nonparametric linkage analysis
from within R itself, the calculations must be performed using external packages. However,
there are a number of ancillary R packages that facilitate interface with these standalone
programs and using the results for further analysis and presentation.
ibdreg
uses Identity By Descent (IBD) NonParametric Linkage (NPL) statistics for related pairs calculated
externally to test for genetic linkage with covariates by regression modelling.
Whilst not official R packages one software suite in particular is worthy of mention.
PLINK
is a C++ program for genome wide linkage analysis that supports Rbased plugins via Rserve allowing
users to utilise the rich suite of statistical functions in R for analysis.

QTL mapping
:
Packages in this category develop methods for the analysis of experimental crosses
to identify markers contributing to variation in quantitative traits.
bqtl
implement both likelihoodbased and Bayesian methods for inbred crosses and recombinant inbred
lines.
qtl
provides several functions and a data structure for QTL mapping, including a function
scanone()
for genomewide scans.
wgaim
builds on the qtl by including functions for the modelling and summary of QTL intervals from the
full linkage map.

Association
:
Packages in this category provide statistical methods to test associations between individual genetic markers
and a phenotype.
gap
is a package for genetic data analysis of both population and family data; it contains functions for sample
size calculations, probability of familial disease aggregation, kinship calculation, and some tests for linkage
and association analyses. Among the other functions,
genecounting()
estimates haplotype frequencies from genotype data, and
gcontrol()
implements a Bayesian genomic control statistics for association studies. For family data,
tdthap
offers an implementation of the Transmission/Disequilibrium Test (TDT) for extended marker haplotypes.

Linkage Disequilibrium and haplotype mapping
:
A number of packages provide haplotype estimation for unrelated individuals with ambiguous haplotypes
(due to unknown linkage phase) and allow testing for associations between the estimated haplotypes and
phenotypes (including covariates) under a GLM framework.
hapassoc
performs likelihood inference of trait associations with haplotypes in GLMs.
tdthap
implements transmission/disequilibrium tests for extended marker haplotypes.

GenomeWide Association Studies (GWAS)
:
With recent technical advances in highthroughput genotyping technologies the possibility of performing
GenomeWide Association Studies is now a feasible strategy. A number of packages are available to facilitate
the analysis of these large data sets.
pbatR
provides a GUI to the powerful PBAT software which performs family and population based family and
population based studies. The software has been implemented to take advantage of parallel processing, which
vastly reduces the computational time required for GWAS.
snpMatrix
Implements classes and methods for largescale SNP association studies.

Multiple testing
:
The package
qvalue on Bioconductor
implements False Discovery Rate; the main function
qvalue()
estimates the qvalues from a list of pvalues.
Package
multtest on Bioconductor
also offers several nonparametric bootstrap and permutation resamplingbased multiple testing procedures.

Importing Sequence Data
:
There are utilities in the
seqinr
package to import sequence data from various sources, including files of aligned sequences in mase, clustal,
phylip, fasta and msf format which will be of utility to some population genetic analysis. Users interested in
using R for sequence data and bioinformatics are also referred to the
BioConductor
project.