R Wrapper for Java Implementation of BiBit

A simple R wrapper for the Java BiBit algorithm from "A biclustering algorithm for extracting bit-patterns from binary datasets" from Domingo et al. (2011) . An simple adaption for the BiBit algorithm which allows noise in the biclusters is also introduced as well as a function to guide the algorithm towards given (sub)patterns. Further, a workflow to derive noisy biclusters from discoverd larger column patterns is included as well.


Installing the BiBitR Package -----------------------------

install.packages("devtools") # If not yet installed on your R Version
devtools::install_github("hadley/devtools") # Only run this if your currently installed 
                                            # devtools version is <= 1.12 (recursive dependencies bug)
 
devtools::install_github("ewouddt/BiBitR")

Should the installation of BiBitR or devtools::install_github("hadley/devtools") throw an error, please install the dependencies manually, then BiBitR:

install.packages(c("flexclust","biclust"))
devtools::install_github("ewouddt/BiBitR")

BiBitR is a simple R wrapper which directly calls the original Java code for applying the BiBit algorithm after which the output is transformed to a Biclust S4 class object. The original Java code can be found at http://eps.upo.es/bigs/BiBit.html by Domingo S. Rodriguez-Baena, Antonia J. Perez-Pulido and Jesus S. Aguilar-Ruiz.

More details about the BiBit algorithm can be found in:

The bibit function uses the original Java code directly (with the intended input and output). Because the Java code was not refactored, the rJava package could not be used.

The bibit function does the following:

  1. Convert R matrix to a .arff output file.
  2. Use the .arff file as input for the Java code which is called by system().
  3. The outputted .txt file from the Java BiBit algorithm is read in and transformed to a Biclust object.

Because of this procedure, there is a chance of overhead when applying the algorithm on large datasets. Make sure your machine has enough RAM available when applying to big data.

Note: If you want to circumvent the internal R function to convert the matrix to .arff format, look at the documentation of the arff_row_col parameter of the bibit function. You can input the original input files for the java algorithm with this parameter. The original input files can also be generated with the make_arff_row_col function in the package.

The BiBit algorithm was also slightly adapted to allow some noise in the biclusters. This can be done with the bibit2 function in the BiBitR package. It is the same function as bibit, but with an additional new noise parameter which allows 0's in the discovered biclusters.

bibit2 follows the same steps as described in the Details section above. Following the general steps of the BiBit algorithm, the allowance for noise in the biclusters is inserted in the original algorithm as such:

  1. Binary data is encoded in bit words.
  2. Take a pair of rows as your starting point.
  3. Find the maximal overlap of 1's between these two rows and save this as a pattern/motif. You now have a bicluster of 2 rows and N columns in which N is the number of 1's in the motif.
  4. Check all remaining rows if they match this motif, however allow a specific amount of 0's in this matching as defined by the noise parameter. Those rows that match completely or those within the allowed noise range are added to bicluster.
  5. Go back to Step 2 and repeat for all possible row pairs.

Note: Biclusters are only saved if they satisfy the minr and minc parameter settings and if the bicluster is not already contained completely within another bicluster.

What you will end up with are biclusters not only consisting out of 1's, but biclusters in which 2 rows (the starting pair) are all 1's and in which the other rows could contain 0's (= noise).

Note: Because of the extra checks involved in the noise allowance, using noise might increase the computation time a little bit.

The noise parameter determines the amount of zero's allowed in the bicluster (i.e. in the extra added rows to the starting row pair) and can take on the following values:

  • noise=0: No noise allowed. This gives the same result as using the bibit function.
  • 0<noise<1: The noise parameter will be a noise percentage. The number of allowed 0's in a (extra) row in the bicluster will depend on the column size of the bicluster. More specifically zeros_allowed = ceiling(noise * columnsize). For example for noise=0.10 and a bicluster column size of 5, the number of allowed 0's would be 1.
  • noise>=1: The noise parameter will be the number of allowed 0's in a (extra) row in the bicluster independent from the column size of the bicluster. In this noise option, the noise parameter should be an integer.

Normally you would apply BiBit (with/without noise) directly on the binary data. However due to the nature of the algorithm, namely starting an exhaustive search from each row-pair, it is also possible to look for specific patterns of interest.

To do this, simply add 2 (artificial) identical rows which contain the pattern/motif of interest (e.g. 2 rows with 1's in specific columns and 0 everywhere else). You can do this multiple times if multiple patterns/motifs are of interest.

After applying BiBit (preferable with noise allowance) on the data with the extra artificial rows, search through all discovered biclusters to find the biclusters which contain the artificial rows, as well as the biclusters which started from the articial row pair. See functions rows_in_BC and rows_full1_in_BC.

Future To Do: Adapt the BiBit algorithm to limit it to only start from the artificial rows.

A Biclust S4 Class object.

library(BiBitR)
 
data <- matrix(sample(c(0,1),100*100,replace=TRUE,prob=c(0.9,0.1)),nrow=100,ncol=100)
data[1:10,1:10] <- 1   # BC1
data[11:20,11:20] <- 1 # BC2
data[21:30,21:30] <- 1 # BC3
data <- data[sample(1:nrow(data),nrow(data)),sample(1:ncol(data),ncol(data))]
 
result1 <- bibit(data,minr=2,minc=2)
result1
MaxBC(result1)
 
result2 <- bibit2(data,minr=5,minc=5,noise=0.2)
result2
MaxBC(result2)
 
result3 <- bibit2(data,minr=5,minc=5,noise=3)
result3
MaxBC(result3)

News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("BiBitR")

0.3.1 by De Troyer Ewoud, 2 months ago


Browse source code at https://github.com/cran/BiBitR


Authors: De Troyer Ewoud


Documentation:   PDF Manual  


GPL-3 license


Imports stats, foreign, methods, utils, viridis, cluster, dendextend, lattice, grDevices, graphics, randomcoloR, biclust

System requirements: Java


Depended on by RcmdrPlugin.BiclustGUI.


See at CRAN