Statistical Models for Word Frequency Distributions

Statistical models and utilities for the analysis of word frequency distributions. The utilities include functions for loading, manipulating and visualizing word frequency data and vocabulary growth curves. The package also implements several statistical models for the distribution of word frequencies in a population. (The name of this package derives from the most famous word frequency distribution, Zipf's law.)


RECENT CHANGES to the 'zipfR' package:

Work in progress:

  • posterior distribution for LNRE model as Bayesian prior, with density function postdlnre(), including log-transformed version postldlnre(), as well as cumulative probability postplnre() and quantiles postqlnre() where available
  • Good-Turing estimates with gtlnre()

Version 0.6-10

  • maintenance release for compatibility with new CRAN checks and restrictions
  • zipfR plotting utilities (zipfR.begin.plot, zipfR.end.plot, etc.) are deprecated; device="x11" and device="quartz" now both open the default device with, which is not guaranteed to be an on-screen device
  • update stale URLs and maintainer e-mail addresses

Version 0.6-8

  • zipfR tutorial has been rewritten as genuine package vignette
  • write.tfl() and read.tfl() now allow character encoding of the disk file to be declared; read.tfl() safely reads type strings containing quotes

Version 0.6-6

  • interim release for compliance with new, stricter CRAN requirements
  • Zipf-ranking plots for type-frequency lists (of class "tfl")
  • file I/O automatically compresses/decompresses ".gz" files in functions read.tfl(), read.spc(), read.vgc(), write.tfl(), write.spc(), write.vgc()
  • dropped support for very old R versions (before 2.3.1 or so), where read.delim() does not accept comment.char= option; now requires 2.10.1+
  • removed zipfR.legend() function since standard legend() now offers the same convenient placement options
  • added citation details (file inst/CITATION), which can be displayed with citation(package="zipfR")
  • upgraded to GPL v3

Version 0.6-5

  • moved development to SVN repository on R-Forge

Version 0.6-4

  • various bug fixes and minor improvements
  • improved robustness and tolerance for rounding errors in expected frequency spectra

Version 0.6-2

  • added function read.multiple.objects to do just that

Version 0.6-1

  • bugfix: read.vgc() died on input files without at least a V1 column

Version 0.6-0

  • first public release on CRAN
  • improved parameter estimation, can be fine-tuned by users with choice of cost functions and minimization algorithms
  • default settings work well for most data sets
  • minor bug fixes and improved documentation
  • updated version of tutorial

Version 0.5-0

  • first beta release of the zipfR toolkit

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.6-66 by Stefan Evert, a year ago

Browse source code at

Authors: Stefan Evert <[email protected]> , Marco Baroni <[email protected]>

Documentation:   PDF Manual  

Task views: Probability Distributions, Natural Language Processing

GPL-3 license

Imports methods, utils, stats, graphics, grDevices

Imported by GeodRegr, MSCquartets, MetaLandSim, QBAsyDist, pvaluefunctions, qGaussian.

Depended on by stratifyR.

Suggested by languageR.

See at CRAN