Probabilistic Suffix Trees and Variable Length Markov Chains

Provides a framework for analysing state sequences with probabilistic suffix trees (PST), the construction that stores variable length Markov chains (VLMC). Besides functions for learning and optimizing VLMC models, the PST library includes many additional tools to analyse sequence data with these models: visualization tools, functions for sequence prediction and artificial sequences generation, as well as for context and pattern mining. The package is specifically adapted to the field of social sciences by allowing to learn VLMC models from sets of individual sequences possibly containing missing values, and by accounting for case weights. The library also allows to compute probabilistic divergence between two models, and to fit segmented VLMC, where sub-models fitted to distinct strata of the learning sample are stored in a single PST. This software results from research work executed within the framework of the Swiss National Centre of Competence in Research LIVES, which is financed by the Swiss National Science Foundation. The authors are grateful to the Swiss National Science Foundation for its financial support.



  • Fixed problem when computing likelihood of models fitted to sequences of unequal lengths (reported by Aron Lindberg)


  • Changed email adress


  • added 'inst' directory and 'CITATION' file mentioning the JSS paper


  • included the 'seqgbar.R' file containing the seqgbar function from the TraMineR package.
  • fixed NOTES and WARNINGS during package check (requested by CRAN)


  • fixed bug when one of the states is named "n" (reported by Marios Iliofotou). Renamed the "n" column (number of occurences of a context) in the outcome of cprob to "[n]" and added warnings in cprob() and pstree().
  • Suppressed horizontal bar identifying a leave on graphical representations of a PST, while non-terminal nodes are still identifyed by a line and a grey circle.


  • fixed bug in generate() when providing the first state in a sequence (s1 argument)
  • generate(): added cnames argument to provide names for the state sequence object containing the generated sequences
  • Added RColorBrewer to depends field in DESCRIPTION, otherwise the example using brewer.pal() avoids the package to be build
  • Added a likelihood internal function that computes the (log)-likelihood of a PST, i.e., predicts the data to which the PST was fitted with the PST. This function is used in pstree(), prune()
  • Added a 'lik' option to prune() and pstree(). If TRUE (default), the likelihood of the newly created or pruned PST is computed


  • changed NAMESPACE and DESCRIPTION to avoid NOTE in package check
  • changed welcome message in zzz.R
  • added 'base' argument in predict() function


  • updated 'pmine' method:
    • unapproriate error message when no pattern satisfying criterion (reported by Aron Lindberg)
    • now both pmin and pmax can be specified
    • added 'prefix' option.


  • displaying properly y axis label in pplot and pqplot


  • modified examples in help pages to meet CRAN requirements ('Examples should run for no more than a few seconds each')


  • help pages prepared for submission to the CRAN


  • added a 'logLik' slot to 'PSTf' objects so that the log-likelihood is computed once, when the model is learned. The logLik method now extracts the 'logLik' slot of a 'PSTf' object, instead of computing the value by predicting the learning sample with the object.


  • BIC generic function now works on PST (added the 'nobs' method to get the number of observations, i.e. number of symbols in the learning sample)


  • new version of the generate method, much faster


  • added the pdist method to compute probabilistic distance measure between models


  • changes in the print method for objects of class PSTr and PSTf
  • changes in the pstree method
  • cprob method: added 'to.list' option used by pstree


  • changed arugments for the prune function: "gain" defines the information gain function used for pruning decisions and "cutoff" is the cutoff value
  • changed arugments for the tune function: "gain" and "cutoff" see modification for the prune function
  • pqplot: added uniform color plotting for the prediction quality measure when plotseq=TRUE


  • added gain method to check result of a gain function and cutoff value comparing one node to its parent


  • changed definition of class PSTf (added "call" slot)
  • minor changes in the prune method


  • changes in the prune method


  • important changes in the plot method for PST objects. New main recursive plotting function called plotTree and separate functions plotEdge and plotNode which allow in the future the user to build his own functions to plot the content of each node and the edges linking the nodes
  • many changes in documentation files


  • cleaning code in plotNode()
  • added and modified documentation files



Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.94 by Alexis Gabadinho, 5 years ago

Browse source code at

Authors: Alexis Gabadinho [aut, cre, cph]

Documentation:   PDF Manual  

Task views: Missing Data

GPL (>= 2) license

Imports methods, stats4

Depends on TraMineR, RColorBrewer

See at CRAN