Learned Pattern Similarity (LPS) for time series. Implements a novel approach to model the dependency structure in time series that generalizes the concept of autoregression to local auto-patterns. Generates a pattern-based representation of time series along with a similarity measure called Learned Pattern Similarity (LPS). Introduces a generalized autoregressive kernel.This package is based on the 'randomForest' package by Andy Liaw.
TODO:
Speed-up tuneLearnPattern function by removing redundant computations
Current implementation grows full tree based on the minimum number of observations per tree leaf setting and generates representation based on the maximum depth which is not reasonable computationally. Depth level is planned to be dropped. This way tree size will be controlled by single parameter.
Patterns are learned using ensemble of regression trees. Approach is embarrassingly parallel. A combine routine that can combine multiple trees in an ensemble is required to be implemented to benefit from parallelism.
The probability of each observation being used by the model is not equal because of the sampling scheme. The observations towards the start and end of the time series are less likely to appear in the segments A fair sampling strategy is needed for segment selection.
Identifying common patterns and plotting them should be improved for interpretability purposes.
Currently the package requires UCR time series database format (each time series is a row and columns are the observations over time). Therefore, time series are assumed to be the same length although LPS can handle time series of different length which requires changes in input format.
LPS can work for multivariate time series similarity. Current version needs modifications for multivariate time series.
LPS can work for categorical time series (i.e. DNA sequences). Current version requires modification for categorical variables.
======================================================================== Changes in 1.0-4:
A bug in similarity computation, earlier similarity computation for single test time series was failing if single time series is provided as array (not matrix). Now type of time series (both test and reference) is controlled internally.
Codes are cleaned for readibility.
Changes in 1.0-3:
Error (mean square error) for each individual tree is generated. This is useful if some trees do not generate valuable information. Filtering based on the error rates may improve the results.
Trees to be used for representation generation and similarity computation can be selected by the modification of 'which.tree' argument in the related functions.
Introduced totally random splitting strategy when building trees through argument 'random.split' in function 'learnPattern'. This is computationally faster compared to regression splits. Totally random splitting generates a split from uniform distribution between minimum and maximum observations. Also kd-tree type split is introduced (random.split=2) for which split value is the median of the observations at each node.
Introduced prediction capabilities. When "nodes" argument of "predict.learnPattern" is set to FALSE, average predictions over all trees for each time point are returned. Maybe useful for denoising.
Changes in 1.0-2:
learnPattern 'replace' argument was returning a segmentation fault, fixed
learnPattern now uses segment.factor=c(0.1,0.9) as default for random segment generation and ntree=200 as default for number of trees in the ensemble
Prediction of observed values is now enabled. This can be used for different purposes such as time series modeling and denoising
If replace is set to TRUE, randomly selected time series are left out-of-bag during training of each tree and predictions are made over the segments. OOB predictions are now returned by learnPattern and also OOB error is computed over the trees. Mean square error (MSE) is returned. (added the option oob.pred to determine whether predictions are returned.
MSE based on OOB predictions can be plotted by simply running plot function on learnPattern objects (if sampling is done). This estimate can be used to set the number of trees.
getTreeInfo is introduced for extracting the tree structures from the ensemble (learnPattern object)
Transformation to matrix is done internally for representing single time series
plotMDS is introduced for transforming similarity information to latent variables using traditional multidimensional scaling for plotting purposes
Identifying common patterns and plotting them is enabled for interpretability purposes.
Change in 1.0-1: