Algorithms for detection of outliers based on frequent pattern
mining. Such algorithms follow the paradigm: if an instance contains more frequent patterns,
it means that this data instance is unlikely to be an anomaly (He Zengyou, Xu Xiaofei, Huang
Zhexue Joshua, Deng Shengchun (2005)
R implementation of algorithms for detection of outliers based on frequent pattern mining.
If you would like to cite our work, please use:
@InProceedings{kuchar:2017:FPI, title = {Spotlighting Anomalies using Frequent Patterns}, author = {Jaroslav Kuchař and Vojtěch Svátek}, booktitle = {Proceedings of the KDD 2017 Workshop on Anomaly Detection in Finance}, year = {2017}, volume = {71}, series = {Proceedings of Machine Learning Research}, address = {Halifax, Nova Scotia, Canada}, month = {14 Aug}, publisher = {PMLR}, issn = {1938-7228}}
Available implementations:
Package installation from GitHub:
library("devtools")devtools::install_github("jaroslav-kuchar/fpmoutliers")
library(fpmoutliers)dataFrame <- read.csv(system.file("extdata", "fp-outlier-customer-data.csv", package = "fpmoutliers"))model <- FPI(dataFrame, minSupport = 0.001)dataFrame <- dataFrame[order(model$scores, decreasing = TRUE),]print(dataFrame[1,]) # instance with the highest anomaly scoreprint(dataFrame[nrow(dataFrame),]) # instance with the lowest anomaly score
Currently not suitable for large datasets - the plot is limited by the number of rows and columns of the input data.
library("fpmoutliers")dataFrame <- read.csv( system.file("extdata", "fp-outlier-customer-data.csv", package = "fpmoutliers"))model <- FPI(dataFrame, minSupport = 0.001)# sort data by the anomaly scoredataFrame <- dataFrame[order(model$scores, decreasing = TRUE),]visualizeInstance(dataFrame, 1) # instance with the highest anomaly scorevisualizeInstance(dataFrame, nrow(dataFrame)) # instance with the lowest anomaly score
library("fpmoutliers")dataFrame <- read.csv( system.file("extdata", "fp-outlier-customer-data.csv", package = "fpmoutliers"))model <- FPI(dataFrame, minSupport = 0.001)# sort data by the anomaly scoredataFrame <- dataFrame[order(model$scores, decreasing = TRUE),]# instance with the highest anomaly scoreout <- describeInstance(dataFrame, model, 1)# instance with the lowest anomaly scoreout <- describeInstance(dataFrame, model, nrow(dataFrame))
library("fpmoutliers")data("iris")model <- fpmoutliers::build(iris)
library(fpmoutliers)library(XML)dataFrame <- read.csv(system.file("extdata", "fp-outlier-customer-data.csv", package = "fpmoutliers"))model <- FPI(dataFrame, minSupport = 0.001)saveXML(generatePMML(model, dataFrame), "example_out.xml")
All implemented methods return a list with following parameters:
minSupport
- minimum support setting for frequent itemsets miningmaxlen
- maximum length of frequent itemsetsmodel
- frequent itemset model represented as itemsets-classscores
- outlier/anomaly scores for each observation/row of the input dataframeApache License Version 2.0
Changes:
Changes: