Population Assignment using Genetic, Non-Genetic or Integrated Data in a Machine Learning Framework

Use Monte-Carlo and K-fold cross-validation coupled with machine- learning classification algorithms to perform population assignment, with functionalities of evaluating discriminatory power of independent training samples, identifying informative loci, reducing data dimensionality for genomic data, integrating genetic and non-genetic data, and visualizing results.

Travis-CI Build Status CRAN status GitHub release license

Population Assignment using Genetic, Non-Genetic or Integrated Data in a Machine-learning Framework


This R package helps perform population assignment and infer population structure using a machine-learning framework. It employs supervised machine-learning methods to evaluate the discriminatory power of your data collected from source populations, and is able to analyze large genetic, non-genetic, or integrated (genetic plus non-genetic) data sets. This framework is designed for solving the upward bias issue discussed in previous studies. Main features are listed as follows.

  • Use principle component analysis (PCA) for dimensionality reduction (or data transformation)
  • Use Monte-Carlo cross-validation to estimate mean and variance of assignment accuracy
  • Use K-fold cross-validation to estimate membership probability
  • Allow to resample various sizes of training datasets (proportions or fixed numbers of individuals and proportions of loci)
  • Allow to choose from various proportions of training loci either randomly or based on locus Fst values
  • Provide several machine-learning classification algorithms, including LDA, SVM, naive Bayes, decision tree, and random forest, to build tunable predictive models.
  • Output results in publication-quality plots that can be modified using ggplot2 functions

Install assignPOP

You can install the released version from CRAN or the up-to-date version from this Github respository.

  • To install from CRAN

    • Simply enter install.packages("assignPOP") in your R console
  • To install from Github

    • step 1. Install devtools package by entering install.packages("devtools")
    • step 2. Import the library, library(devtools)
    • step 3. Then enter install_github("alexkychen/assignPOP")

Note: When you install the package from Github, you may need to install additional packages before the assignPOP can be successfully installed. Follow the hints that R provided and then re-run install_github("alexkychen/assignPOP").

Package tutorial

Please visit our tutorial website for more infomration

What's new

Changes in ver. 1.1.4

  • 2018.3.8 Fix missing assign.matrix function

Changes in ver. 1.1.3

  • 2017.6.15 Add unit tests (using package testthat)

Changes in ver. 1.1.2

  • 2017.5.13 Change function name read.genpop to read.Genepop; Add function read.Structure.
  • 2017.5.2 Update read.genpop function, now can read haploid data

Cite this package

Chen K-Y, Marschall EA, Sovic MG, Fries AC, Gibbs HL, Ludsin SA. assignPOP: An R package for population assignment using genetic, non-genetic, or integrated data in a machine-learning framework. Methods in Ecology and Evolution. 2018;9:439–446. https://doi.org/10.1111/2041-210X.12897

Previous version

Previous packages can be found and downloaded at archive branch


Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


1.2.2 by Kuan-Yu (Alex) Chen, a year ago


Browse source code at https://github.com/cran/assignPOP

Authors: Kuan-Yu (Alex) Chen [aut, cre] , Elizabeth A. Marschall [aut] , Michael G. Sovic [aut] , Anthony C. Fries [aut] , H. Lisle Gibbs [aut] , Stuart A. Ludsin [aut]

Documentation:   PDF Manual  

GPL (>= 2) license

Imports caret, doParallel, e1071, foreach, ggplot2, MASS, parallel, randomForest, reshape2, stringr, tree

Suggests gtable, iterators, klaR, stringi, knitr, rmarkdown, testthat

See at CRAN