Implements empirical Bayes approaches to genotype
polyploids from next generation sequencing data while
accounting for allele bias, overdispersion, and sequencing
error. The main functions are flexdog() and multidog(),
which allow the specification
of many different genotype distributions. Also provided are functions to
simulate genotypes, rgeno(), and read-counts, rflexdog(), as well as
functions to calculate oracle genotyping error rates, oracle_mis(), and
correlation with the true genotypes, oracle_cor(). These latter two
functions are useful for read depth calculations. Run
browseVignettes(package = "updog") in R for example usage. See
Gerard et al. (2018)
Updog provides a suite of methods for genotyping polyploids from next-generation sequencing (NGS) data. It does this while accounting for many common features of NGS data: allelic bias, overdispersion, sequencing error, and (possibly) outlying observations. It is named updog for "Using Parental Data for Offspring Genotyping" because we originally developed the method for full-sib populations, but it works now for more general populations.
The main function is flexdog
, which provides many options for the distribution of the genotypes in your sample.
Also provided are:
mupdog
, which allows for correlation between the individuals' genotypes while jointly estimating the genotypes of the individuals at all provided SNPs. The implementation uses a variational approximation. This is designed for samples where the individuals share a complex relatedness structure (e.g. siblings, cousins, uncles, half-siblings, etc). Right now there are no guarantees about this function's performance.rgeno
) and read-counts (rflexdog
). These support all of the models available in flexdog
.oracle_joint
, oracle_mis
, oracle_mis_vec
, and oracle_cor
. We mean "oracle" in the sense that we assume that the entire data generation process is known (i.e. the genotype distribution, sequencing error rate, allelic bias, and overdispersion are all known). These are good approximations when there are a lot of individuals (but not necessarily large read-depth).The original updog
package is now named updogAlpha
and may be found here.
See also ebg, fitPoly, and TET. Our best "competitor" is probably fitPoly.
See NEWS for the latest updates on the package.
I've included many vignettes in updog
, which you can access online here.
If you find a bug or want an enhancement, please submit an issue here.
You can install updog from CRAN in the usual way:
install.packages("updog")
You can install the current (unstable) version of updog from Github with:
devtools::install_github("dcgerard/updog")
If you want to use the use_cvxr = TRUE
option in flexdog
(not generally recommended), you will need to install the CVXR package. Before I could install CVXR in Ubuntu, I had to run in the terminal
sudo apt-get install libmpfr-dev
and then run in R
install.packages("Rmpfr")
Please cite
Or, using BibTex:
@article {gerard2018harnessing,author = {Gerard, David and Ferr{\~a}o, Luis Felipe Ventorim and Garcia, Antonio Augusto Franco and Stephens, Matthew},title = {Harnessing Empirical Bayes and Mendelian Segregation for Genotyping Autopolyploids from Messy Sequencing Data},year = {2018},doi = {10.1101/281550},publisher = {Cold Spring Harbor Laboratory},URL = {https://www.biorxiv.org/content/early/2018/03/16/281550},eprint = {https://www.biorxiv.org/content/early/2018/03/16/281550.full.pdf},journal = {bioRxiv}}
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Fixes a bug with option model = "s1pp"
in flexdog
. I was originally not constraining the levels of preferential pairing to be the same in both segregations of the same parent. This is now fixed. But the downside is that model = "s1pp"
is now only supported for ploidy = 4
or ploidy = 6
. This is because the optimization becomes more difficult for larger ploidy levels.
I fixed some documentation. Perhaps the biggest error comes from this snippet from the original documentation of flexdog
:
The value of
prop_mis
is a very intuitive measure for the quality of the SNP.prop_mis
is the posterior proportion of individuals mis-genotyped. So if you want only SNPS that accurately genotype, say, 95% of the individuals, you could discard all SNPs with aprop_mis
under 0.95.
This now says
The value of prop_mis is a very intuitive measure for the quality of the SNP. prop_mis is the posterior proportion of individuals mis-genotyped. So if you want only SNPS that accurately genotype, say, 95% of the individuals, you could discard all SNPs with a prop_mis over 0.05.
I've now exported some C++ functions that I think are useful. You can call them in the usual way: http://r-pkgs.had.co.nz/src.html#cpp-import.
updog
. The old version may be found in the updogAlpha
package.flexdog
.mupdog
is now live. We provide no guarantees about mupdog
's performance.oracle_mis
.rgeno
.rflexdog
.