Low Memory Use Trimmed K-Means

Performs the trimmed k-means clustering algorithm with lower memory use. It also provides a number of utility functions such as BIC calculations.


The tkmeans package attempts to implement the trimmed k-means algorithm of GarcĂ­a-Escudero, et. al.(2008) using as little memory as possible. Data is editted in place, the trimming is implemented using a priority queue structure in C++ trhough Rcpp and low memory use versions of utility functions are provided.

An extremely simple example:

  1. Convert the iris dataset to a matrix and rescale matrix columns.

    iris_mat <- as.matrix(iris[,1:4]) scale_params<-scale_mat_inplace(iris_mat)

  2. Cluster with 2 and 3 clusters, 10% trimming

    iris_cluster_2<- tkmeans(iris_mat, 2 , 0.1, c(1,1,1,1), 1, 10, 0.001)  
    iris_cluster_3<- tkmeans(iris_mat, 2 , 0.1, c(1,1,1,1), 1, 10, 0.001)
    
  3. Calculate BIC

    BIC_2 <-cluster_BIC(iris_mat, iris_cluster_2)  
    BIC_3 <-cluster_BIC(iris_mat, iris_cluster_3)
    
  4. Allocate using 3 clustering

    clustering <- nearest_cluster(iris_mat, iris_cluster_3)
    
  5. Plot results using reconstructed matrix

    library(lattice) 
    orig_matrix <- sweep(sweep(m,2,scale_params[2,],'*'),2,scale_params [1,], '+')  
    xyplot(orig_matrix[,1]~orig_matrix[,2], group=clustering) 
    

To install the latest version:

install.packages("devtools")  
library(devtools)  
install_github("andrewthomasjones/tkmeans")

News

lowmemtkmeans 0.1.2

Bug in initial cluster centre randomization fixed.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("lowmemtkmeans")

0.1.2 by Andrew Thomas Jones, 2 years ago


Browse source code at https://github.com/cran/lowmemtkmeans


Authors: Andrew Thomas Jones , Hien Duy Nguyen


Documentation:   PDF Manual  


GPL (>= 3) license


Imports Rcpp

Suggests testthat

Linking to Rcpp, RcppArmadillo

System requirements: C++11


See at CRAN