Imports Variant Calling Format file into R. It can detect whether a sample contains contaminant from the same species. In the first stage of the approach, a change-point detection method is used to identify copy number variations for filtering. Next, features are extracted from the data for a support vector machine model. For log-likelihood calculation, the deviation parameter is estimated by maximum likelihood method. Using a radial basis function kernel support vector machine, the contamination of a sample can be detected.
The goal of sssc is to detect whether a sample with variant information is contaminated by another sample from the same species.
This is a basic example which shows you how to detect whether vcf_example is contaminated:
library('sssc')data(vcf_example)result <- sssc(file = vcf_example)print(result$stat)#> Name LOH HomVar HetVar HomRate HighRate#> 1 sssc_test.vcf.gz 0.7248322 0.0001565125 0.02757586 0.536965 0.05350195#> HetRate LowRate AvgLL#> 1 0.3608949 0.04669261 -2.01978print(result$result)#> Name Class Regression#> 1 sssc_test.vcf.gz 1 0.7131992
Given class = 1, vcf_example is considered to be contaminated.