Synthetic Population Generator

Generates high-entropy integer synthetic populations from marginal and (optionally) seed data using quasirandom sampling, in arbitrary dimensionality (Smith, Lovelace and Birkin (2017) ). The package also provides an implementation of the Iterative Proportional Fitting (IPF) algorithm (Zaloznik (2011) ).


CRAN_Status_Badge CRAN Downloads PyPI version Travis Build Status Appveyor Build status codecov License DOI status

  • adds new functionality for multidimensional integerisation.
  • deletes previously deprecated functionality synthPop and synthPopG.

Multidimensional integerisation

Building on the prob2IntFreq function - which takes a discrete probability distribution and a count, and returns the closest integer population to the distribution that sums to the count - a multidimensional equivalent integerise is introduced.

In one dimension, for example:

>>> import numpy as np
>>> import humanleague
>>> p=np.array([0.1, 0.2, 0.3, 0.4])
>>> humanleague.prob2IntFreq(p11)
{'freq': array([1, 2, 3, 5]), 'rmse': 0.3535533905932736}

produces the optimal (i.e. closest possible) integer population to the discrete distribution.

The integerise function generalises this problem and applies it to higher dimensions: given an n-dimensional array of real numbers where the 1-d marginal sums in every dimension are integral (and thus the total population is too), it attempts to find an integral array that also satisfies these constraints.

The QISI algorithm is repurposed to this end. As it is a sampling algorithm it cannot guarantee that a solution is found, and if so, whether the solution is optimal. If it fails this does not prove that a solution does not exist for the given input.

>>> a = np.array([[ 0.3,  1.2,  2. ,  1.5], 
                  [ 0.6,  2.4,  4. ,  3. ], 
                  [ 1.5,  6. , 10. ,  7.5], 
                  [ 0.6,  2.4,  4. ,  3. ]])
# marginal sums 
>> sum(a)
array([ 3., 12., 20., 15.])
>>> sum(a.T)
array([ 5., 10., 25., 10.])
# perform integerisation 
>>> r = humanleague.integerise(a)
>>> r["conv"]
True
>>> r["result"]
array([[ 0,  2,  2,  1], 
       [ 0,  3,  4,  3], 
       [ 2,  6, 10,  7], 
       [ 1,  1,  4,  4]])
>>> r["rmse"]
0.5766281297335398
# check marginals are preserved 
>>> sum(r["result"]) == sum(a)
array([ True,  True,  True,  True])
>>> sum(r["result"].T) == sum(a.T)
array([ True,  True,  True,  True])

Removed functions

The functions synthPop and synthPopG implement restricted versions of algorithms that are available in other functions.

Use qis ins place of synthPop, and qisi in place of synthPopG.

Introduction

humanleague is a python and an R package for microsynthesising populations from marginal and (optionally) seed data. The package is implemented in C++ for performance.

The package contains algorithms that use a number of different microsynthesis techniques:

The latter provides a bridge between deterministic reweighting and combinatorial optimisation, offering advantages of both techniques:

  • generates high-entropy integral populations
  • can be used to generate multiple populations for sensitivity analysis
  • goes some way to address the 'empty cells' issues that can occur in straight IPF
  • relatively fast compuation time

The algorithms:

  • support arbitrary dimensionality* for both the marginals and the seed.
  • produce statistical data to ascertain the likelihood/degeneracy of the population (where appropriate).

The package also contains the following utility functions:

  • a Sobol sequence generator
  • construct a closest integer population from a discrete univariate probability distribution.
  • an algorithm for sampling an integer population from a discrete multivariate probability distribution, constrained to the marginal sums in every dimension.
  • 'flatten' a multidimensional population into a table: this converts a multidimensional array containing the population count for each state into a table listing individuals and their characteristics.

Version 1.0.1 reflects the work described in the Quasirandom Integer Sampling (QIS) paper.

R installation

Official release:

> install.packages("humanleague")

For development version

> devtools::install_github("virgesmith/humanleague")

Or, for the legacy version

> devtools::install_github("virgesmith/[email protected]")

python installation

Requires Python 3 and numpy. PyPI package:

python3 -m pip install humanleague --user

[Conda-forge package is being worked on]

Build, install and test (from local cloned repo)

$ ./setup.py install --user
$ ./setup.py test

Examples

Consult the package documentation, e.g.

> library(humanleague)
> ?humanleague

in R, or for python:

>>> import humanleague as hl
>>> help(hl)

News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("humanleague")

2.1.0 by Andrew Smith, 5 months ago


Browse source code at https://github.com/cran/humanleague


Authors: Andrew Smith [aut, cre] , Steven Johnson [ctb] (Sobol sequence generator implementation) , Massachusetts Institute of Technology [cph] (Sobol sequence generator implementation) , John Burkhardt [ctb, cph] (C++ implementation of incomplete gamma function) , G Bhattacharjee [ctb] (Original FORTRAN implementation of incomplete gamma function)


Documentation:   PDF Manual  


MIT + file LICENCE license


Imports Rcpp

Suggests testthat

Linking to Rcpp


See at CRAN