A fast reimplementation of several density-based algorithms of
the DBSCAN family for spatial data. Includes the clustering algorithms
DBSCAN (density-based spatial clustering of applications with noise)
and HDBSCAN (hierarchical DBSCAN), the ordering algorithm
OPTICS (ordering points to identify the clustering structure),
and the outlier detection algorithm LOF (local outlier factor).
The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search.
An R interface to fast kNN and fixed-radius NN search is also provided.
Hahsler, Piekenbrock and Doran (2019)
This R package provides a fast C++ (re)implementation of several density-based algorithms with a focus on the DBSCAN family for clustering spatial data. The package includes:
Clustering
Outlier Detection
Fast Nearest-Neighbor Search (using kd-trees)
The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search, and are typically faster than the native R implementations (e.g., dbscan in package fpc
), or the
implementations in WEKA, ELKI and Python's scikit-learn.
Stable CRAN version: install from within R with
install.packages("dbscan")
Current development version: Download package from AppVeyor or install from GitHub (needs devtools).
library("devtools")install_github("mhahsler/dbscan")
Load the package and use the numeric variables in the iris dataset
library("dbscan") data("iris")x <- as.matrix(iris[, 1:4])
Run DBSCAN
db <- dbscan(x, eps = .4, minPts = 4)db
DBSCAN clustering for 150 objects.
Parameters: eps = 0.4, minPts = 4
The clustering contains 4 cluster(s) and 25 noise points.
0 1 2 3 4
25 47 38 36 4
Available fields: cluster, eps, minPts
Visualize results (noise is shown in black)
pairs(x, col = db$cluster + 1L)
Calculate LOF (local outlier factor) and visualize (larger bubbles in the visualization have a larger LOF)
lof <- lof(x, k = 4)pairs(x, cex = lof)
Run OPTICS
opt <- optics(x, eps = 1, minPts = 4)opt
OPTICS clustering for 150 objects.
Parameters: minPts = 4, eps = 1, eps_cl = NA, xi = NA
Available fields: order, reachdist, coredist, predecessor, minPts, eps, eps_cl, xi
Extract DBSCAN-like clustering from OPTICS and create a reachability plot (extracted DBSCAN clusters at eps_cl=.4 are colored)
opt <- extractDBSCAN(opt, eps_cl = .4)plot(opt)
Extract a hierarchical clustering using the Xi method (captures clusters of varying density)
opt <- extractXi(opt, xi = .05)optplot(opt)
Run HDBSCAN (captures stable clusters)
hdb <- hdbscan(x, minPts = 4)hdb
HDBSCAN clustering for 150 objects.
Parameters: minPts = 4
The clustering contains 2 cluster(s) and 0 noise points.
1 2
100 50
Available fields: cluster, minPts, cluster_scores, membership_prob, outlier_scores, hc
Visualize the results as a simplified tree
plot(hdb, show_flat = T)
See how well each point corresponds to the clusters found by the model used
colors <- mapply(function(col, i) adjustcolor(col, alpha.f = hdb$membership_prob[i]), palette()[hdb$cluster+1], seq_along(hdb$cluster)) plot(x, col=colors, pch=20)
The dbscan package is licensed under the GNU General Public License (GPL) Version 3. The OPTICSXi R implementation was directly ported from the ELKI framework's Java implementation (GNU AGPLv3), with explicit permission granted by the original author, Erich Schubert.
Maintainer: Michael Hahsler