Generates realistic simulated genetic data in families or unrelated individuals.
R package for simulating genetic marker data.
The most stable version can be found at CRAN:
The following script will generate variants in the region of the example vcf file for 200 unrelated individuals:
library(sim1000G)# Read the example file included in sim1000Gexamples_dir = system.file("examples", package = "sim1000G")vcf_file = file.path(examples_dir,"region.vcf.gz")# Alternatively provide a vcf file here:# vcf_file = "~/fs/tmp/sim4/pop1/region-chr4-312-GABRB1.vcf.gz"vcf = readVCF( vcf_file, maxNumberOfVariants = 600 , min_maf = 0.01, max_maf = 1)startSimulation(vcf, totalNumberOfIndividuals = 1000)ids = generateUnrelatedIndividuals(200)genotype = retrieveGenotypes(ids)
With the genotypes we can compare the allele frequencies with the ones in the original vcf file:
# Compare MAF of simulataed data and vcfplot( apply(genotype,2,mean)/2 , apply(vcf$gt1+vcf$gt2,1,mean)/2 )abline(0,1,lty=1,lwd=9,col=rgb(0,0,1,0.3))
An image showing the generated genotypes:
# show the genotypes as an imagegplots::heatmap.2(genotype,col=c("white","orange","red"),Colv=F, trace="none")
We can also compute the correlation between the markers and show an LD plot of the region:
# LD plot of regiongplots::heatmap.2( cor(genotype)^2 , trace="none", col=rev(heat.colors(200)) ,Rowv=F,Colv=F )
A more detailed documentation and code examples can be found at: