Disk-Based Implementation of PeakSegFPOP

Disk-based implementation of Functional Pruning Optimal Partitioning with up-down constraints for single-sample peak calling (independently for each sample and genomic problem), can handle huge data sets (10^7 or more).


News

TODOs

Remove stderr from tail since that confuses some users: tail: cannot open .bedGraph file for reading no such file or directory Or remove head/tail entirely for portability.

C++/R function that computes total Poisson loss using bedGraph and bed file input -- this may be more numerically stable than the cost that results from PeakSegFPOP. For example

system("head labels/H3K4me3_XJ_immune/samples/tcell/McGill0107/problems/chr22:16847850-20509431/coverage.bedGraph_penalty=0_loss.tsv labels/H3K4me3_XJ_immune/samples/tcell/McGill0107/problems/chr22:16847850-20509431/coverage.bedGraph_penalty=5.83435542852498e-10_loss.tsv labels/H3K4me3_XJ_immune/samples/tcell/McGill0107/problems/chr22:16847850-20509431/coverage.bedGraph_penalty=0.000172220048971656_loss.tsv") ==> labels/H3K4me3_XJ_immune/samples/tcell/McGill0107/problems/chr22:16847850-20509431/coverage.bedGraph_penalty=0_loss.tsv <== 0 135835 67917 3661581 -6.0018530211077560921 -21976270.986880756915 infeasible 2.8921437994722953846 6

==> labels/H3K4me3_XJ_immune/samples/tcell/McGill0107/problems/chr22:16847850-20509431/coverage.bedGraph_penalty=5.83435542852498e-10_loss.tsv <== 5.8343554285249800245e-10 135831 67915 3661581 -6.0018530211010858721 -21976270.986895956099 infeasible 2.8940798153034301698 6

==> labels/H3K4me3_XJ_immune/samples/tcell/McGill0107/problems/chr22:16847850-20509431/coverage.bedGraph_penalty=0.000172220048971656_loss.tsv <== 0.00017222004897165599582 123445 61722 3661581 -6.0018501174476162063 -21976270.98465982452 infeasible 2.9924208443271766988 7

there seems to be a difference between PeakSegFPOP and manual computation of the cost, at either the fifth or sixth decimal place: penalty 0 5.8343554285249800245e-10 0.000172220048971656 PeakSegFPOP -21976270.9868 80756915 -21976270.9868 95956099 -21976270.98465 982452 manual -21976270.9868 76085401 -21976270.9868 76085401 -21976270.98465 505615

Note above there is no difference in the manual cost between 0 and 5.8e-10, but there is a difference between the PeakSegFPOP cost, at the fifth decimal place. This is a numerical error: the model with 0 penalty should always have at least as low of cost as any other models. But in the PeakSegFPOP numbers, the model with penalty=5.8e-10 has a slightly lower cost.

Test via R/data.table code: cov.dt <- fread("labels/H3K4me3_XJ_immune/samples/tcell/McGill0107/problems/chr22:16847850-20509431/coverage.bedGraph") setnames(cov.dt, c("chrom", "chromStart", "chromEnd", "count")) cov.dt[, chromStart1 := chromStart +1L] setkey(cov.dt, chromStart1, chromEnd) segs.dt <- fread("labels/H3K4me3_XJ_immune/samples/tcell/McGill0107/problems/chr22:16847850-20509431/coverage.bedGraph_penalty=5.83435542852498e-10_segments.bed") setnames(segs.dt, c("chrom", "segStart", "segEnd", "status", "mean")) segs.dt[, segStart1 := segStart + 1L] setkey(segs.dt, segStart1, segEnd) over.dt <- foverlaps(cov.dt, segs.dt, nomatch=0L) sprintf("%.12f", over.dt[, PeakSegDP::PoissonLoss(count, mean, chromEnd-chromStart)])

2018.11.28 PR#2

Remove dependency on Berkeley DB.

2018.11.20 PR#3

problem.PeakSegFPOP deletes lock file before running algo.

PeakSegFPOP_disk errors (instead of crashing R) if there is a lock file.

2018.11.14

export fread.first/last

check for empty loss/timing files

2018.10.31

Use wc/nrows/skip in fread.first/last rather than fread("head/tail -1 file")

2018.10.30

PKG_LIBS in Makevars: remove -ldb_cxx -ldb, add -pthread

2018.10.26

add -ldb_cxx -ldb to Makevars for solaris.

in SystemRequirements, link to source, name debian packages.

cast log(int) to double, const DbException& e to avoid solaris warn/err.

2018.10.25

C++ code writes number of bedGraph lines to loss file.

problem.PeakSegFPOP outputs two data.tables rather than three (timings and loss are combined).

2018.09.28

Forked from PeakSegPipeline.

problem.PeakSegFPOP: before was checking start/end pos in segments file with start/end pos in directory name. Now checking first/last line in coverage.bedGraph. No longer import namedCapture.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("PeakSegDisk")

2018.11.28 by Toby Dylan Hocking, 6 months ago


Browse source code at https://github.com/cran/PeakSegDisk


Authors: Toby Dylan Hocking


Documentation:   PDF Manual  


GPL-3 license


Imports data.table, parallel

Suggests testthat, ggplot2, penaltyLearning


See at CRAN