An evolutionary approach to performing hard partitional clustering. The algorithm uses genetic operators guided by information about the quality of individual partitions. The method looks for the best barycenters/centroids configuration (encoded as real-value) to maximize or minimize one of the given clustering validation criteria: Silhouette, Dunn Index, C-Index or Calinski-Harabasz Index. As many other clustering algorithms, 'gama' asks for k: a fixed a priori established number of partitions. If the user does not know the best value for k, the algorithm estimates it by using one of two user-specified options: minimum or broad. The first method uses an approximation of the second derivative of a set of points to automatically detect the maximum curvature (the 'elbow') in the within-cluster sum of squares error (WCSSE) graph. The second method estimates the best k value through majority voting of 24 indices. One of the major advantages of 'gama' is to introduce a bias to detect partitions which attend a particular criterion. References: Scrucca, L. (2013)
We presented an R package to perform hard partitional clustering guided by an user-specified cluster validation criterion. The algorithm obtains high cluster validation indices when applied to datasets who contains superellipsoid clusters. The algorithm is capable of estimate the number of partitions for a given dataset by an automatic inference of the elbow in WCSSE graph or by using a broad search in 24 cluster validation criteria. The package brings six different built-in datasets for experimentation, two of them are in-house datasets collected from real execution of distributed machine learning algorithms on Spark clusters. The others are well-known datasets used in the benchmark of clustering problems.
Version 1.0.3 (2019-02)