Same Species Sample Contamination Detection

Imports Variant Calling Format file into R. It can detect whether a sample contains contaminant from the same species. In the first stage of the approach, a change-point detection method is used to identify copy number variations for filtering. Next, features are extracted from the data for a support vector machine model. For log-likelihood calculation, the deviation parameter is estimated by maximum likelihood method. Using a radial basis function kernel support vector machine, the contamination of a sample can be detected.


The goal of sssc is to detect whether a sample with variant information is contaminated by another sample from the same species.


This is a basic example which shows you how to detect whether vcf_example is contaminated:

result <- sssc(file = vcf_example)
#>               Name       LOH       HomVar     HetVar  HomRate   HighRate
#> 1 sssc_test.vcf.gz 0.7248322 0.0001565125 0.02757586 0.536965 0.05350195
#>     HetRate    LowRate    AvgLL
#> 1 0.3608949 0.04669261 -2.01978
#>               Name Class Regression
#> 1 sssc_test.vcf.gz     1  0.7131992

Given class = 1, vcf_example is considered to be contaminated.


