Single linkage clustering and connected component analyses are often performed on biological images. 'Bioi' provides a set of functions for performing these tasks. This functionality is implemented in several key functions that can extend to from 1 to many dimensions. The single linkage clustering method implemented here can be used on n-dimensional data sets, while connected component analyses are limited to 3 or fewer dimensions.
Bioi is an R package containing implementations of solutions to common cell biology image processing problems. In particular, Bioi provides functions to perform connected component labeling on 1, 2, or 3 dimensional arrays, single linkage clustering on n-dimensional arrays, and identification of the points in one data set that are closest to each point in a second data set.
The Bioi project can be found on its GitHub repository. It can be installed from that source using functionality available in the devtools package.
install.packages("devtools")library(devtools)install_github("zcolburn/Bioi")library(Bioi)
The objective of single linkage clustering is to place all points into groups such that all points within a group can be reached from any other point in the group by crossing bridges between points that are less than a critical separation distance. A small critical separation distance may result in a larger number of groups being identified. In contrast, a large critical separation distance may result in fewer groups being identified.
Using the function euclidean_linker
, single linkage clustering can be performed in 1 or more dimensions. The function works in three modes: unpartitioned, partitioned, and parallelized. For small sample sizes (number of points less than partition_req
) the unpartitioned method is used. For larger sample sizes the partitioned method is used.
The partitioned method works by iteratively dividing the data into smaller and smaller subsections until each partition contains fewer points than partition_req
. The unpartitioned method is then used on each subsection before combining the data from each subsection.
The parallelized method works similarly to the partitioned method but operates in parallel. The number of cores to use can be specified by num_cores
. To prevent too many threads from being generated parallel_call_depth
can be specified. A higher depth will result in more threads being generated.
A common image processing task is to group all connected "object-positive" pixels in an image into single groups. The connected component labeling function implemented here can be used on 1, 2, or 3-dimensional arrays representing 1, 2, or 3-dimensional "images". This functionality can be acessed using the find_blobs
function.
Photoactivated localization microscopy (PALM) data results in large numbers of protein localizations being identified. A common task when working with dual channel PALM data is to identify the distance separating points in one data set from points in a second data set. The function find_min_dists
identifies the nearest neighbor to a point in a second data set and its distance from the point of interest.
Fixed an invalid memory read issue in euclidean_linker_cpp.cpp.
Fixed a namespace NOTE related to testthat that occurred on fedora, solaris, and osx.
Changed 'abs' to 'fabs' in euclidean_linker_cpp.cpp which was detected on debian with using the clang compiler.