Statistics and Data Sets for Corpus Frequency Data

Utility functions for the statistical analysis of corpus frequency data. This package is a companion to the open-source course "Statistical Inference: A Gentle Introduction for Computational Linguists and Similar Creatures" ('SIGIL').


News

Version 0.5:

  • consolidated and revised example data sets included in the package
  • various small, but convenient utility functions added
  • maintainer information updated

Version 0.4-9:

  • transitory version in which data sets and utility functions for the SIGIL course had been moved into a separate package
  • since the new SIGIL package wasn't accepted by CRAN, the data sets have been moved back into corpora
  • intended re-design of corpora package was cancelled
  • in future, it will be used to collect miscellaneous utility functions for analyzing corpus frequency data

Version 0.4-3:

  • interim release to ensure compatiblity with stricter CRAN checks
  • added data set with Biber register features for all BNC texts (from Gasthaus 2007)
  • some minor corrections

Version 0.4-1:

  • large simulated census data set for examples and illustrations in the SIGIL course
  • simulated type-token statistics from English Wikipedia (based on Wackypedia corpus)
  • convenience function for random samples of rows from a data frame (sample.df)

Version 0.4-0:

  • re-launch of the corpora package on R-Forge
  • first version 0.4-0 has only minor changes over previous release 0.3-2

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("corpora")

0.5 by Stefan Evert, 2 years ago


http://SIGIL.R-Forge.R-Project.org/


Browse source code at https://github.com/cran/corpora


Authors: Stefan Evert [http://www.stefan-evert.de/]


Documentation:   PDF Manual  


Task views: Natural Language Processing


GPL-3 license


Imports methods, stats, utils, grDevices


See at CRAN