Infrastructure for Data Stream Mining

A framework for data stream modeling and associated data mining tasks such as clustering and classification. The development of this package was supported in part by NSF IIS-0948893 and NIH R21HG005912.


CRAN version CRAN RStudio mirror downloads Travis-CI Build Status AppVeyor Build Status

The package provides support for modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. The package currently focuses on data stream clustering and provides implementations of BICO, BIRCH, D-Stream and DBSTREAM.

Additional packages in the stream family are:

  • streamMOA: Interface to clustering algorithms implemented in the MOA framework. Includes implementations of DenStream, ClusTree and CluStream.
  • subspaceMOA: Interface to Subspace MOA and its implementations of HDDStream and PreDeConStream.

The development of the stream package was supported in part by NSF IIS-0948893 and NIH R21HG005912.

Installation

Stable CRAN version: install from within R with

install.packages("stream")

Current development version: Download package from AppVeyor or install from GitHub (needs devtools).

install_git("mhahsler/stream")

Usage

Load the package and create micro-clusters via sampling.

library("stream")
stream <- DSD_Gaussians(k=3, noise=0)
 
sample <- DSC_Sample(k=20)
update(sample, stream, 500)
sample
Reservoir sampling
Class: DSC_Sample, DSC_Micro, DSC_R, DSC 
Number of micro-clusters: 20 

Recluster micro-clusters using k-means and plot results

kmeans <- DSC_Kmeans(k=3)
recluster(kmeans, sample)
plot(kmeans, stream, type="both")

References

News

stream 1.3-0 (05/31/18)

  • Added DSC_BIRCH. Code and Interface by Dennis Assenmacher and Matthias Carnein.
  • Added DSC_BICO. Code by Hendrik Fichtenberger, Marc Gille, Melanie Schmidt, Chris Schwiegelshohn, Christian Sohler and Interface provided by Matthias Carnein and Dennis Assenmacher.
  • DSD_ReadCSV: Fixed bug with streams that have no class/cluster label (reported by Matthias Carnein).
  • animate_cluster: noise now accepts "class" or "exclude" ("ignore" is deprecated).

stream 1.2-4 (02/25/17)

  • Use dbFetch in DSD_ReadDB (new version of RSQLite).
  • Register native C routines.

stream 1.2-3 (08/07/16)

  • fixed saveDSC for DBStream.
  • fixed handling of data with d=1 (reported by Ilana Lichtenstein).
  • plot now automatically determines if the data supports a class attribute.

stream 1.2-2 (10/28/15)

  • evaluate now reports noise information.

stream 1.2-1 (09/08/15)

  • fixed problem with failing test under Windows.

stream 1.2-0 (09/06/15)

  • generic and methods for description() added to exact descriptions from DSD, DSC and DSO objects.
  • write_stream() gained parameter append and now throws an error if it would overwrite a file.
  • DSC objects can now be saved and loaded using saveDSC and readDSC.
  • we use now DBSCAN from package dbscan.
  • DSC_DBSTREAM gained parameter metric and now also supports Manhattan and Maximum norm.
  • DSC_DBSTREAM gained parameter assignments and function get_cluster_assignments() to retrieve the MC assignment of the clustered data points.
  • cleaned up interface for animate_cluster() and animate_data().
  • DSD_ReadCSV was completely rewritten to be more reliable. Lost argument d which is now figured out automatically.
  • write_stream has now an argument called header (former name was col.names) to be consistent with DSD_ReadCSV.

stream 1.1-5 (07/02/15)

  • NAMESPACE now imports non-standard packages correctly.
  • DSC_DBSTREAM uses now Cm instead of noise.
  • fixed iterator bug for DSC_DBSTREAM.
  • evaluate gains argument noise to control if noise is ignored

stream 1.1-4 (05/24/15)

  • evaluate checks if DSD has cluster labels for external evaluation measures.
  • DSD_mlbenchmarkGenerator now shuffles data points.
  • DSC_ReadCSV gains arguments skip and header.
  • DSC_DStream: was reimplemented in C++ (Rcpp), number of grids N can now be fixed by the user.
  • DSC_tNN was renamed DSC_DBSTREAM. Uses now SOM-style micro-cluster update and was reimplemented in C++ (Rcpp).

stream 1.1-1 (01/15/15)

  • DSC_DStream: fixed bug with removing too many sporadic grids
  • DSD_ReadCSV now uses readLine so it can read properly from URLs
  • updated vignette

stream 1.1-0 (12/18/14)

  • update now directly dispatches
  • DSC_Memory replaces DSD_Wrapper
  • DSD_ReadCSV replaces DSD_ReadStream. Improved handling of blocking and end of stream.
  • added DSD_ReadDB (DBI interface)
  • get_points can now produce cluster and class information

stream 1.0-3 (07/14/14)

  • Fixed precision and recall calculation
  • Added DSC_TwoStage

stream 1.0-2 (06/16/14)

  • Warning for reclusterers removed.
  • plot can now show micro-cluster assignment areas using assignment=TRUE

stream 1.0-1 (06/12/14)

  • Improved documentation
  • Improved DSD_MG
  • plot gained a dim argument to plot only selected dimensions
  • get_assignment gained a threshold argument
  • DSC_Window added
  • DSC_Sample gained a biased argument for biased sampling
  • DSC_Wrapper can now wrap matrix-like objects (e.g., from package ff and bigmemory)

stream 1.0-0 (5/24/14)

  • added D-Stream (with attraction)
  • improved support for creating animations
  • tnn: new decay models, tNN without shared density now reclusters using density reachability
  • plot gained the type "both" that plots micro and macro-clusters
  • DSC_Hierarchical and DSC_Kmeans gained min_weight to filter low weight micro-clusters before reclustering
  • removed default radius, etc. for most clustering algorithms
  • Added DSD_MG for simulating streams with concept drift
  • moved MOA related code to streamMOA
  • suspended DSC_BIRCH because of memory issues
  • reset_stream gained a pos argument

stream 0.2-0 (2/21/14)

  • major restructuring

stream 0.1-1 (8/16/13)

  • initial version

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("stream")

1.3-1 by Michael Hahsler, a month ago


https://github.com/mhahsler/stream


Report a bug at https://github.com/mhahsler/stream/issues


Browse source code at https://github.com/cran/stream


Authors: Michael Hahsler [aut, cre, cph] , Matthew Bolanos [aut, cph] , John Forrest [ctb] , Matthias Carnein [ctb] , Dennis Assenmacher [ctb]


Documentation:   PDF Manual  


GPL-3 license


Imports clue, cluster, clusterGeneration, dbscan, fpc, graphics, grDevices, MASS, mlbench, Rcpp, stats, utils

Depends on methods, proxy

Suggests animation, DBI, rJava, RSQLite, testthat

Linking to Rcpp, BH


Depended on by streamMOA.

Suggested by otsad.


See at CRAN