A Framework for Coalescent Simulation

Coalescent simulators can rapidly simulate biological sequences evolving according to a given model of evolution. You can use this package to specify such models, to conduct the simulations and to calculate additional statistics from the results. It relies on existing simulators for doing the simulation, and currently supports the programs 'ms', 'msms' and 'scrm'. It also supports finite-sites mutation models by combining the simulators with the program 'seq-gen'.

Project Status: Active - The project has reached a stable, usable state and is being actively developed. Linux Build Status Windows Build status Coverage Status CRAN Status

Coala is an R package for simulating biological sequences according to a given model of evolution. It can call a number of efficient simulators based on coalescent theory. All simulators can be combined with the program seq-gen to simulate finite site mutation models. Coala also directly imports the simulation results into R, and can calculate various summary statistics from the results.


The package can be installed from CRAN using


If you want to use the simulation programs ms, msms or seqgen, they need to be installed separately. This is described in the "Using External Simulators" vignette and in the wiki.

Usage & Help

Coala comes with a vignette that explains the packages concepts and is a good place to start. It also has a vignette containing a few example applications.

Detailed information about coala's functions is provided via R's help system. Call help(_function_) in R to view them. They usually also contain examples and further links.

The ABC vignette gives an example on how coala can be used to conduct the simulations for Approximate Bayesian Computation.

Also take a look at the project wiki for additional resources. You can ask questions on coala's mailing list.


In the following example, we create a simple panmictic model, simulate it and calculate the site frequency spectrum (SFS) of the simulation results:

model <- coal_model(sample_size = 10, loci_number = 2) +
  feat_mutation(5) +
result <- simulate(model)

More examples can be found in the examples vignette.


If you encounter problems when using coala, please file a bug report or mail to coala-pkg (at) googlegroups.com.

Supported Simulators

The package supports the coalescent simulators ms, scrm and msms. All simulators can be combined with seq-gen to simulate finite sites mutation models. The programs msms and seq-gen must be installed manually. The R version of scrm should be installed automatically, and the R version ms if the package phyclust is installed.


To follow or participate in the development of coala, please install the development version from GitHub using


on Linux and OS X. This requires that you have devtools and a compiler or Xcode installed. Bug reports and pull request on GitHub are highly appreciated. The extending coala vignette contains information on how to create new summary statistics and add simulators to coala. The wiki also contains a few resources for developers.


coala 0.5.2

  • Fix sumstat_file() with ms (#188). Thanks to @acottin for reporting this issue!

coala 0.5.1

  • This is a small maintainence release
  • Fix a number of minor issues pointed out by hadley/strict (#186)
  • Register native routines to fix the new R CMD check NOTE (#187)

coala 0.5.0

  • Major internal refactoring on how simulators interface with coala (#174).
  • Support for calculating an expanded version of MCMF (#173, #179). This feature was contributed by Jorge E. Amaya Romero (@jorgeamaya).
  • Introduces the optional locus_group argument for features. Using it, features can be defined only for a subset of the loci in the model (#161, #181). Thanks to @andrewparkermorgan for suggesting this feature.

coala 0.4.1

  • The four gamete condition now respects unphased data. If the data is unphased, the four gamete condition is only counted as violated if it is violated for all possible phasing of the data (#162).
  • Skip unittests if testthat is not available (#165).
  • Add compatibility with upcoming version 1.7.2-0 of scrm (#167).
  • Add a warning is symmetric is used together with pop_from or pop_to in feat_migration (#168).
  • Add citation information (#168).
  • Fix compatibility with rehh 2.0.0 (#172).

coala 0.4.0

  • Adds the create_abc_param and create_abc_sumstat functions for converting the simulation results into the format needed for abc::abc function (#151).
  • Improves the documentation significantly and adds more examples and links to help pages (#150).
  • Changes name of get_population_indiviuals to get_population_individuals (#150).
  • Adds an option to active_msms() to download msms' jar file (#153).
  • Adds support for partial models. Now, arbitrary sets of features, loci, parameters and summary statistics can be combined via + and then be added to one or more models later (#155).

coala 0.3.0

Major improvements

  • Support for more selection models, including ones for local adaptation (#137).
  • Adds as.segsites.GENOME function that converts genetic data imported with the package PopGenome to coala's format (#139).

Small Changes

  • Adds feat_ignore_singletons, which is a feature that makes coala ignore singletons when calculating the summary statistics (#138).
  • Use ms from package phyclust instead of requiring that the binary is installed on the system (#140).
  • Ensure that msms uses only one CPU core (#142).

coala 0.2.2

  • Fixes the broken nucleotide diversity and Tajima's D summary statistics (#133).
  • Adds support for calculating joint frequency spectra for more than two populations (#132).

coala 0.2.1

  • Fixes a test that failed on R 3.1.x due to a bug in the tests code (#127).
  • Fixes version requirement for testthat (#127).
  • Adds the calc_sumstats_from_data function for calculating summary statistics from biological data (#124).
  • Exports the functions related to segregating sites (#122).

coala 0.2.0

Major improvements

  • Adds support for distribution independent repetitions on multiple CPU cores (#116).
  • Improves support for polyploid models. The ploidy parameter is now provided in the coal_model instead of in feat_unphased (#115).
  • Adds the MCMF summary statistic (#94).
  • Adds support for the omega statistic using OmegaPlus (#109).

Small Changes

  • Adds option to calculate iHS in sumstat_ihh() and made the statistic return a data.frame instead of a list.
  • Adds optional support for calculating the JSFS per locus instead of globally (#112).
  • Adds optional in-place transformation of summary statistics (#110).
  • Adds support for simulating a fixed number of mutations with ms and msms (#19).
  • Writes seq-gen output into memory instead of in files before it is parsed (#99).
  • Adds optional support for parameter zero inflation for a deterministic fraction of loci instead of a random number. Can be used by setting random = FALSE in par_zero_inflation (#97).
  • get_outgroup now returns NA if the model has no outgroup rather than throwing an error.

Bug Fixes

  • Fixes the simulation of sizes changes in one populations models with msms (#105).
  • Remove broken implementation of the nSL statistic (sumstat_nsl())
  • Fixes site frequency calculation when an outgroup is present (#96).
  • Fixes multiple errors that occurred in edge cases when calculating ihh (#98).

coala 0.1.1

  • Fixes a memory corruption that occurred only in tests (#90).
  • Updates README.md.
  • Corrects various typos.

coala 0.1.0

  • Initial release version
  • Thanks to Ann Kathrin Huylmans for suggesting the name 'coala' and to Soumya Ranganathan for proofreading.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.6.0 by Paul Staab, a year ago


Report a bug at https://github.com/statgenlmu/coala/issues

Browse source code at https://github.com/cran/coala

Authors: Paul Staab [aut, cre] , Dirk Metzler [aut, ths] , Jorge E. Amaya Romero [ctb]

Documentation:   PDF Manual  

MIT + file LICENSE license

Imports assertthat, digest, methods, parallel, R6, Rcpp, rehh, scrm, stats, utils

Suggests abc, knitr, phyclust, rmarkdown, testthat

Linking to Rcpp, RcppArmadillo

Suggested by jaatha, jackalope.

See at CRAN