Optimal Project Prioritization

A decision support tool for prioritizing conservation projects. Prioritizations can be developed by maximizing expected feature richness, expected phylogenetic diversity, the number of features that meet persistence targets, or identifying a set of projects that meet persistence targets for minimal cost. Constraints (e.g. lock in specific actions) and feature weights can also be specified to further customize prioritizations. After defining a project prioritization problem, solutions can be obtained using exact algorithms, heuristic algorithms, or random processes. In particular, it is recommended to install the 'Gurobi' optimizer (available from < https://www.gurobi.com>) because it can identify optimal solutions very quickly. Finally, methods are provided for comparing different prioritizations and evaluating their benefits.


lifecycle Travis Build Status AppVeyor Build Status Coverage Status CRAN_Status_Badge

The oppr R package is decision support tool for prioritizing conservation projects. Prioritizations can be developed by maximizing expected feature richness, expected phylogenetic diversity, the number of features that meet persistence targets, or identifying a set of projects that meet persistence targets for minimal cost. Constraints (e.g. lock in specific actions) and feature weights can also be specified to further customize prioritizations. After defining a project prioritization problem, solutions can be obtained using exact algorithms, heuristic algorithms, or random processes. In particular, it is recommended to install the 'Gurobi' optimizer because it can identify optimal solutions very quickly. Finally, methods are provided for comparing different prioritizations and evaluating their benefits.

Installation

The latest official version of the oppr R package can be installed using the following R code. We also recommend installing the Gurobi optimization suite and gurobi R package to obtain solutions very quickly. For instructions on installing these software packages, please refer to this installation guide.

install.packages("oppr", repos = "https://cran.rstudio.com/")

If you wish to plot phylogenetic trees, you will need install the ggtree package from Bioconductor since it is not available on The Comprehensive R Archive Network.

if (!require(devtools))
  install.packages("devtools")
if (!require(ggtree))
  devtools::install_bioc("ggtree")

Alternatively, the latest developmental version can be installed using the following code. Please note that while developmental versions may contain additional features not present in the official version, they may also contain coding errors.

if (!require(devtools))
  install.packages("devtools")
devtools::install_github("prioritizr/oppr")

Usage

Here we will provide a short example showing how the oppr R package can be used to prioritize funding for conservation projects. To start off, we will set the seed for the random number generator to ensure you get the same results as shown here, and load the oppr R package.

set.seed(500)
library(oppr)

Now we will load some data sets that are distributed with the package. First, we will load the sim_features object. This table contains information on the conservation features (e.g. species). Specifically, each row corresponds to a different feature, and each column contains information associated with the features. In this table, the "name" column contains the name of each feature, and the "weight" column denotes the relative importance for each feature.

data(sim_features)
 
# print table
print(sim_features)
## # A tibble: 5 x 2
##   name  weight
##   <chr>  <dbl>
## 1 F1     0.211
## 2 F2     0.211
## 3 F3     0.221
## 4 F4     0.630
## 5 F5     1.59

Next, we will load the sim_actions object. This table stores information about the various management actions (i.e. tibble). Each row corresponds to a different action, and each column describes different properties associated with the actions. These actions correspond to specific management actions that have known costs. For example, they may relate to pest eradication activities (e.g. trapping) in sites of conservation importance. In this table, the "name" column contains the name of each action, and the "cost" column denotes the cost of each action. It also contains additional columns for customizing the solutions, but we will ignore them for now. Note that the last action---the "baseline_action"---has a zero cost and is used with the a baseline project (see below).

# load data
data(sim_actions)
 
# print table
print(sim_actions)
## # A tibble: 6 x 4
##   name             cost locked_in locked_out
##   <chr>           <dbl> <lgl>     <lgl>     
## 1 F1_action        94.4 FALSE     FALSE     
## 2 F2_action       101.  FALSE     FALSE     
## 3 F3_action       103.  TRUE      FALSE     
## 4 F4_action        99.2 FALSE     FALSE     
## 5 F5_action        99.9 FALSE     TRUE      
## 6 baseline_action   0   FALSE     FALSE

Additionally, we will load the sim_projects object. This table stores information about various conservation projects. Each row corresponds to a different project, and each column describes various properties associated with the projects. These projects correspond to groups of conservation actions. For example, a conservation project may pertain to a set of conservation actions that relate to a single feature or single geographic locality. In this table, the "name" column contains the name of each project, the "success" column denotes the probability of each project succeeding if it is funded, the "F1"--"F5" columns show the probability of each feature is expected to persist if each project is funded (NA values mean that a feature does not benefit from a project), and the "F1_action"--"F5_action" columns indicate which actions are associated with which projects. Note that the last project---the "baseline_project"---is associated with the "baseline_action" action. This project has a zero cost and represents the baseline probability of each feature persisting if no other project is funded. This is important because we can't find a cost-effective solution if we don't know how much better each project improves a species' chance at persistence. Finally, although most projects in this example directly relate to a single feature, you can input projects that directly affect the persistence of multiple features.

# load data
data(sim_projects)
 
# print table
print(sim_projects, width = Inf)
## # A tibble: 6 x 13
##   name             success     F1     F2      F3     F4     F5 F1_action
##   <chr>              <dbl>  <dbl>  <dbl>   <dbl>  <dbl>  <dbl> <lgl>    
## 1 F1_project         0.919  0.791 NA     NA      NA     NA     TRUE     
## 2 F2_project         0.923 NA      0.888 NA      NA     NA     FALSE    
## 3 F3_project         0.829 NA     NA      0.502  NA     NA     FALSE    
## 4 F4_project         0.848 NA     NA     NA       0.690 NA     FALSE    
## 5 F5_project         0.814 NA     NA     NA      NA      0.617 FALSE    
## 6 baseline_project   1      0.298  0.250  0.0865  0.249  0.182 FALSE    
##   F2_action F3_action F4_action F5_action baseline_action
##   <lgl>     <lgl>     <lgl>     <lgl>     <lgl>          
## 1 FALSE     FALSE     FALSE     FALSE     FALSE          
## 2 TRUE      FALSE     FALSE     FALSE     FALSE          
## 3 FALSE     TRUE      FALSE     FALSE     FALSE          
## 4 FALSE     FALSE     TRUE      FALSE     FALSE          
## 5 FALSE     FALSE     FALSE     TRUE      FALSE          
## 6 FALSE     FALSE     FALSE     FALSE     TRUE

After loading the data, we can begin formulating the project prioritization problem. Here our goal is to maximize the overall probability that each feature is expected to persist into the future (i.e. the feature richness), whilst also accounting for the relative importance of each feature and the fact that our resources are limited such that we can only spend at most $400 on funding management actions. Now, let's build a project prioritization problem object that represents our goal.

# build problem
p <- problem(projects = sim_projects, actions = sim_actions,
             features =  sim_features, project_name_column = "name",
             project_success_column = "success", action_name_column = "name",
             action_cost_column = "cost", feature_name_column = "name") %>%
     add_max_richness_objective(budget = 400) %>%
     add_feature_weights(weight = "weight") %>%
     add_binary_decisions() %>%
     add_default_solver(verbose = FALSE)
 
# print problem
print(p)
## Project Prioritization Problem
##   actions          F1_action, F2_action, F3_action, ... (6 actions)
##   projects         F1_project, F2_project, F3_project, ... (6 projects)
##   features         F1, F2, F3, ... (5 features)
##   action costs:    min: 0, max: 103.22583
##   project success: min: 0.81379, max: 1
##   objective:       Maximum richness objective [budget (400)]
##   targets:         none
##   weights:         min: 0.21136, max: 1.59167
##   decisions        Binary decision 
##   constraints:     <none>
##   solver:          Gurobi [first_feasible (0), gap (0), number_solutions (1), presolve (2), solution_pool_method (2), threads (1), time_limit (2147483647), time_limit (2147483647), verbose (0)]

Next, we can solve this problem to obtain a solution. By default, we will obtain the optimal solution to our problem using an exact algorithm solver (e.g. using Gurobi or lpSolveAPI).

# solve problem
s <- solve(p)
# print solution
print(s, width = Inf)
## # A tibble: 1 x 21
##   solution status    obj  cost F1_action F2_action F3_action F4_action
##      <int> <chr>   <dbl> <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
## 1        1 OPTIMAL  1.75  395.         1         1         0         1
##   F5_action baseline_action F1_project F2_project F3_project F4_project
##       <dbl>           <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
## 1         1               1          1          1          0          1
##   F5_project baseline_project    F1    F2     F3    F4    F5
##        <dbl>            <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1          1                1 0.808 0.865 0.0865 0.688 0.592

The s table contains the solution and also various statistics associated with the solution. Here, each row corresponds to a different solution. Specifically, the "solution" column contains an identifier for the solution (which may be useful for methods that output multiple solutions), the "obj" column contains the objective value (i.e. the expected feature richness for this problem), the "cost" column stores the cost of the solution, and the "status" column contains information from the solver about the solution. Additionally, it contains columns for each action ("F1_action", "F2_actions", "F3_actions", ..., "baseline_action") which indicate if each action was prioritized for funding in the solution. Additionally, it contains columns for each project ("F1_project", "F2_project", "F3_project", ..., "baseline_project") that indicate if the project was completely funded or not. Finally, it contains column for each feature ("F1, "F2", "F3, ...) which indicate the probability that each feature is expected to persist into the future under each solution (for information on how this is calculated see ?add_max_richness_objective). Since tabular data can be difficult to understand, let's visualize how well this solution would conserve the features. Note that features which benefit from fully funded projects, excepting the baseline project, are denoted with an asterisk.

# visualize solution
plot(p, s)

This has just been a taster of the oppr R package. For more information, see the package vignette.

Citation

To cite the oppr package in publications, please use:

  Hanson JO, Schuster R, Strimas-Mackey M, Bennett J (2019) oppr:
  Optimal Project Prioritization R package version 0.0.3.
  https://CRAN.R-project.org/package=oppr

A BibTeX entry for LaTeX users is

  @Manual{,
    author = {Jeffrey O Hanson and Richard Schuster and Matthew Strimas-Mackey and Joeseph Bennett},
    title = {oppr: Optimal Project Prioritization},
    year = {2019},
    note = {R package version 0.0.3},
    url = {https://CRAN.R-project.org/package=oppr},
  }

News

oppr 0.0.3

  • CRAN release.

oppr 0.0.2.1

  • Retain debugging symbols to conform with CRAN policies.

oppr 0.0.2

  • CRAN release.

oppr 0.0.1.1

  • Fix address sanitizer issues causing CRAN checks to fail.
  • Tests successfully complete when the shiny R package is not installed.

oppr 0.0.1

  • CRAN release.

oppr 0.0.0.19

  • Add argument to add_heuristic_solver to skip initial step for removing projects and actions that exceed the budget. While this initial step improves solution quality, it is not conventionally used in project prioritization algorithms and so should be omitted to provide accurate benchmarks.

oppr 0.0.0.18

  • Reduce precision of extinction probability calculations when formulating a problem with a maximum expected phylogenetic diversity objective (i.e. add_max_phylo_div_objective). Specifically, 1'000 points instead of 10'000 points are now used for piece-wise linear components. It appears that reducing the precision in this manner does not affect the correctness of results, but substantially reduces the time needed to solve problems to optimality in certain situations.

oppr 0.0.0.17

  • Update add_heuristic_solver algorithm so that cost-effectiveness values are calculated with projects sharing costs (e.g. if two projects share an action that costs $100, then this action contributes $50 to the cost of each project). This update makes the algorithm similar to backwards heuristics conventionally used in prioritizing species recovery projects (i.e. https://github.com/p-robot/ppp; #14).

oppr 0.0.0.16

  • Fix bug in add_heuristic_solver function introduced in version 0.0.0.15.

oppr 0.0.0.15

  • Update add_heuristic_solver algorithm so that it removes projects, and not actions, in an iterative fashion. This update (i) makes the algorithm comparable to the backwards heuristics conventionally used in prioritizing species recovery projects (i.e. https://github.com/p-robot/ppp) and (ii) substantially reduces run time (#14).

oppr 0.0.0.14

  • Fix bugs in add_heuristic_solver and add_random_solver arising from floating point comparison issue. These were causing infeasible solutions to be returned in R version 3.4.4.

oppr 0.0.0.13

  • Fix bug in project_cost_effectiveness reporting incorrect costs, and cost-effectiveness values.

oppr 0.0.0.12

  • Assorted documentation tweaks.

oppr 0.0.0.11

  • Update add_heuristic_solver algorithm so that all actions and projects which exceed the budget are automatically removed prior to the iterative action removal.
  • Update add_random_solver algorithms so that projects are selected instead of individual actions. This means that solutions from this solver are (i) similar to those in previous project prioritization studies and (ii) more likely to deliver better solutions (#13).

oppr 0.0.0.10

  • Rename package to oppr since ppr is already on CRAN.
  • Fix issue with replacement_costs yielding incorrect results for baseline projects when used with SYMPHONY solvers.

oppr 0.0.0.9

  • Add new project_cost_effectiveness function to calculate the cost-effectiveness for each conservation project in a problem.

oppr 0.0.0.8

  • Fix typos in documentation (#8).
  • The solution_statistics function outputs which projects are completely funded in each solution (#9).
  • Add example for saving tabular data to vignette (#10).
  • Add examples to vignette for working with the solution output (#11).

oppr 0.0.0.7

  • Fix annoying "Found more than one class "tbl_df" in cache; using the first, from namespace 'tibble'" text.

oppr 0.0.0.6

  • Actually fix bug when solving problems with a phylogenetic objective and branches that have a constant probability of persistence (#6).
  • Fix bug in add_max_phylo_div_objective yielding incorrect solutions when features are ordered differently in the phylogenetic and tabular input data.
  • Fix bug in solution_statistics yielding objective values for phylogenetic problems when features are ordered differently in the phylogenetic and tabular input data.
  • Fix bug when handling phylogenetic data when a species is associated with two tip branches. Although such data probably indicate errors in the phylogenetic data, this functionality could be useful when combining multiple datasets.

oppr 0.0.0.5

  • Add return_data argument to plot_feature_persistence and plot_phylo_persistence so that plotting data can be obtained for creating custom plots.

oppr 0.0.0.4

  • Fix bug in add_relative_targets and add_manual_targets (when relative targets supplied) calculations. This result in incorrect calculations.
  • Fix issue with expected persistence probabilities not accounting for the "do nothing" scenario (#7).

oppr 0.0.0.3

  • The gurobi solver (i.e. add_gurobi_solver function) now uses NumericFocus=3 to help avoid numerical issues.
  • The compile function now throws a warning if problems are likely to have numerical issues.

oppr 0.0.0.2

  • Fix bug when solving problems with a phylogenetic objective and branches that have a constant probability of persistence (#6). Hindsight shows this attempt did not cover all edge cases.
  • Add additional data sanity checks to problem. It will now throw descriptive error messages if features are missing baseline probabilities, or are associated with baseline probabilities below 1e-11.
  • Fix unit test for simulate_ptm_data that had a very small chance of failing due to simulating a data set where an action is not associated with any project.
  • Feature columns in simulated data produced using simulate_ppp_data and simulate_ptm_data are now sorted.

oppr 0.0.0.1

  • Initial commit.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("oppr")

0.0.4 by Jeffrey O Hanson, 4 months ago


https://prioritizr.github.io/oppr, https://github.com/prioritizr/oppr


Report a bug at https://github.com/prioritizr/oppr/issues


Browse source code at https://github.com/cran/oppr


Authors: Jeffrey O Hanson [aut, cre] , Richard Schuster [aut] , Matthew Strimas-Mackey [aut] , Joseph Bennett [aut]


Documentation:   PDF Manual  


GPL-3 license


Imports utils, methods, stats, Matrix, magrittr, uuid, proto, cli, assertthat, tibble, ape, tidytree, ggplot2, viridisLite, lpSolveAPI

Suggests testthat, knitr, roxygen2, rmarkdown, gurobi, ggtree, Rsymphony, lpsymphony, shiny, rhandsontable, tidyr

Linking to Rcpp, RcppArmadillo, RcppProgress

System requirements: C++11


See at CRAN