A framework for data stream modeling and associated data mining tasks such as clustering and classification. The development of this package was supported in part by NSF IIS-0948893 and NIH R21HG005912.
The package provides support for modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. The package currently focuses on data stream clustering and provides implementations of BICO, BIRCH, D-Stream and DBSTREAM.
Additional packages in the stream family are:
The development of the stream package was supported in part by NSF IIS-0948893 and NIH R21HG005912.
Stable CRAN version: install from within R with
Current development version: Download package from AppVeyor or install from GitHub (needs devtools).
Load the package and create micro-clusters via sampling.
library("stream")stream <- DSD_Gaussians(k=3, noise=0)sample <- DSC_Sample(k=20)update(sample, stream, 500)sample
Reservoir sampling Class: DSC_Sample, DSC_Micro, DSC_R, DSC Number of micro-clusters: 20
Recluster micro-clusters using k-means and plot results
kmeans <- DSC_Kmeans(k=3)recluster(kmeans, sample)plot(kmeans, stream, type="both")