Partitioned Symmetric Matrices

A matrix-like class to represent a symmetric matrix partitioned into file-backed blocks.


CRAN_Status_Badge Rdoc Travis-CI Build Status AppVeyor Build Status Coverage status

symDMatrix is an R package that provides symmetric matrices partitioned into file-backed blocks.

A symmetric matrix G is partitioned into blocks as follows:

+ --- + --- + --- +
| G11 | G12 | G13 |
+ --- + --- + --- +
| G21 | G22 | G23 |
+ --- + --- + --- +
| G31 | G32 | G33 |
+ --- + --- + --- +

Because the matrix is assumed to be symmetric (i.e., Gij equals Gji), only the diagonal and upper-triangular blocks are stored and the other blocks are virtual transposes of the corresponding diagonal blocks. Each block is a file-backed matrix of type ff_matrix of the ff package.

The package defines the class and multiple methods that allow treating this file-backed matrix as a standard RAM matrix.

Tutorial

Before we start, let's create a symmetric matrix in RAM.

library(BGLR)
 
# Load genotypes from a mice data set
data(mice)
X <- mice.X
rownames(X) <- paste0("ID_", 1:nrow(X))
 
# Compute a symmetric genetic relationship matrix (G matrix) in RAM
G1 <- tcrossprod(scale(X))
G1 <- G1 / mean(diag(G1))

(1) Converting a RAM matrix into a symDMatrix

In practice, if we can hold a matrix in RAM, there is not much of a point to convert it to a symDMatrix object; however, this will help us to get started.

library(symDMatrix)
 
G2 <- as.symDMatrix(G1, blockSize = 400, vmode = "double", folderOut = "mice")

(2) Exploring operators

Now that we have a symDMatrix object, let's illustrate some operators.

# Basic operators applied to a matrix in RAM and to a symDMatrix
 
# Dimension operators
all.equal(dim(G1), dim(G2))
nrow(G1) == nrow(G2)
ncol(G1) == ncol(G2)
all.equal(diag(G1), diag(G2))
 
# Names operators
all.equal(dimnames(G1), dimnames(G2))
all(rownames(G1) == rownames(G2))
all(colnames(G1) == rownames(G2))
 
# Block operators
nBlocks(G2)
blockSize(G2)
 
# Indexing (can use booleans, integers or labels)
G2[1:2, 1:2]
G2[c("ID_1", "ID_2"), c("ID_1", "ID_2")]
tmp <- c(TRUE, TRUE, rep(FALSE, nrow(G2) - 2))
G2[tmp, tmp]
head(G2[tmp, ])
 
# Exhaustive check of indexing
for (i in 1:100) {
    n1 <- sample(1:50, size = 1)
    n2 <- sample(1:50, size = 1)
    i1 <- sample(1:nrow(X), size = n1)
    i2 <- sample(1:nrow(X), size = n2)
    TMP1 <- G1[i1, i2, drop = FALSE]
    TMP2 <- G2[i1, i2, drop = FALSE]
    stopifnot(all.equal(TMP1, TMP2))
}

(3) Creating a symDMatrix from genotypes

The function getG_symDMatrix of the BGData package computes G=XX' (with options for centering and scaling) without ever loading G in RAM. It creates the symDMatrix object directly, block by block. In this example, X is a matrix in RAM. For large genotype data sets, X could be a file-backed matrix, e.g., a BEDMatrix or ff object.

library(BGData)
 
G3 <- getG_symDMatrix(X, blockSize = 400, vmode = "double", folderOut = "mice2")
class(G3)
all.equal(diag(G1), diag(G3))
 
for(i in 1:10){
    n1 <- sample(1:25, size = 1)
    i1 <- sample(1:25, size = n1)
    for(j in 1:10){
        n2 <- sample(1:nrow(G1), size = 1)
        i2 <- sample(1:nrow(G1), size = n2)
        tmp1 <- G1[i1, i2]
        tmp2 <- G3[i1, i2]
        stopifnot(all.equal(tmp1, tmp2))
    }
}

(4) Creating a symDMatrix from ff files containing the blocks

The function symDMatrix allows creating a symDMatrix object from a list of .RData files containing ff_matrix objects. The list is assumed to provide, in order, files for G11, G12, ..., G1q, G22, G23, ..., G2q, ..., Gqq. This approach is useful for very large G matrices. If n is large it may make sense to compute the blocks of the symDMatrix object in parallel jobs (e.g., in an HPC). The function getG of the BGData package is similar to getG_symDMatrix but accepts arguments i1 and i2 which define a block of G (i.e., rows of X).

library(BGLR)
library(BGData)
library(ff)
 
# Load genotypes from a wheat data set
data(wheat)
X <- wheat.X
rownames(X) <- paste0("ID_", 1:nrow(X))
 
# Compute G matrix in RAM
centers <- colMeans(X)
scales <- apply(X = X, MARGIN = 2, FUN = sd)
G1 <- tcrossprod(scale(X, center = centers, scale = scales))
G1 <- G1 / mean(diag(G1))
 
# Compute G matrix block by block (each block computation can be distributed)
nBlocks <- 3
blockSize <- ceiling(nrow(X) / nBlocks)
i <- 1:nrow(X)
blockIndices <- split(i, ceiling(i / blockSize))
for (r in 1:nBlocks) {
    for (s in r:nBlocks) {
        blockName <- paste0("wheat_", r, "_", s - r + 1)
        block <- getG(X, center = centers, scale = scales, scaleG = TRUE,
                      i = blockIndices[[r]], i2 = blockIndices[[s]])
        block <- ff::as.ff(block, filename = paste0(blockName, ".bin"), vmode = "double")
        save(block, file = paste0(blockName, ".RData"))
    }
}
G2 <- as.symDMatrix(list.files(pattern = "^wheat.*RData$"))
attr(G2, "centers") <- centers
attr(G2, "scales") <- scales
 
all.equal(diag(G1), diag(G2)) # there will be a slight numerical penalty

Installation

Install the stable version from CRAN:

install.packages("symDMatrix")

Alternatively, install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("QuantGen/symDMatrix")

Contribute

Documentation

Further documentation can be found on RDocumentation.

News

symDMatrix 2.0.2

  • Load example symDMatrix readonly in examples to pass CRAN checks.

symDMatrix 2.0.1

  • load.symDMatrix(): Add readonly parameter and suggest to use when loading example dataset.

symDMatrix 2.0.0

The symDMatrix package is now based on the LinkedMatrix package. The internal structure of a symDMatrix object has changed; therefore, previous objects need to be regenerated. We apologize for the inconvenience, but assure you that this change will make the package as a whole more robust and efficient.

  • The symDMatrix class inherits from RowLinkedMatrix.
  • Only storing the upper triangular matrix resulted in inefficient queries as requests to the lower triangle needed to be redirected. We now store the whole matrix, but use virtual transposes for the lower triangular matrix. Virtual transposes are very efficient as the block shares the same memory mapping as the block across the diagonal and the indices are rewritten locally.
  • Matrix-like objects that do not support virtual transposes have been dropped (i.e., only the ff package is currently left as far as we know).
  • as.symDMatrix() has been kept the same, but the S4 constructor has changed.

symDMatrix 1.0.0

Initial release.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("symDMatrix")

2.0.2 by Alexander Grueneberg, 6 months ago


https://github.com/QuantGen/symDMatrix


Report a bug at https://github.com/QuantGen/symDMatrix/issues


Browse source code at https://github.com/cran/symDMatrix


Authors: Gustavo de los Campos [aut] , Alexander Grueneberg [aut, cre]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports methods, LinkedMatrix, ff, bit

Suggests BGData, BEDMatrix, testthat, covr


Depended on by BGData.


See at CRAN