A collection of open source libraries for numerical computing (numerical integration, optimization, etc.) and their integration with 'Rcpp'.
Rcpp is a powerful tool to write fast C++ code to speed up R programs. However, it is not easy, or at least not straightforward, to compute numerical integration or do optimization using pure C++ code inside Rcpp.
RcppNumerical integrates a number of open source numerical computing libraries into Rcpp, so that users can call convenient functions to accomplish such tasks.
Rcpp::sourceCpp()
, add// [[Rcpp::depends(RcppEigen)]] // [[Rcpp::depends(RcppNumerical)]]
in the C++ source file.
Imports: RcppNumerical
and LinkingTo: Rcpp, RcppEigen, RcppNumerical
to the DESCRIPTION
file,
and import(RcppNumerical)
to the NAMESPACE
file.The one-dimensional numerical integration code contained in RcppNumerical is based on the NumericalIntegration library developed by Sreekumar Thaithara Balan, Mark Sauder, and Matt Beall.
To compute integration of a function, first define a functor derived from
the Func
class (under the namespace Numer
):
;
The first function evaluates one point at a time, and the second version overwrites each point in the array by the corresponding function values. Only the second function will be used by the integration code, but usually it is easier to implement the first one.
RcppNumerical provides a wrapper function for the NumericalIntegration library with the following interface:
inline double
f
: The functor of integrand.lower
, upper
: Limits of integral.err_est
: Estimate of the error (output).err_code
: Error code (output). See inst/include/integration/Integrator.h
Line 676-704.subdiv
: Maximum number of subintervals.eps_abs
, eps_rel
: Absolute and relative tolerance.rule
: Integration rule. Possible values are
GaussKronrod{15, 21, 31, 41, 51, 61, 71, 81, 91, 101, 121, 201}
. Rules with
larger values have better accuracy, but may involve more function calls.See a full example below, which can be compiled using the Rcpp::sourceCpp
function in Rcpp.
// [[Rcpp::depends(RcppEigen)]] // [[Rcpp::depends(RcppNumerical)]] // P(0.3 < X < 0.8), X ~ Beta(a, b) ; // [[Rcpp::export]] Rcpp::List
Runing the integrate_test()
function in R gives
integrate_test()## $true## [1] 0.2528108#### $approximate## [1] 0.2528108#### $error_estimate## [1] 2.806764e-15#### $error_code## [1] 0
Multi-dimensional integration in RcppNumerical is done by the Cuba library developed by Thomas Hahn.
To calculate the integration of a multivariate function, one needs to define
a functor that inherits from the MFunc
class:
;
Here Constvec
represents a read-only vector with the definition
// Constant reference to a vector typedef const Eigen::Ref<const Eigen::VectorXd> Constvec;
(Basically you can treat Constvec
as a const Eigen::VectorXd
. Using
Eigen::Ref
is mainly to avoid memory copy. See the explanation
here.)
The function provided by RcppNumerical for multi-dimensional integration is
inline double
f
: The functor of integrand.lower
, upper
: Limits of integral. Both are vectors of the same
dimension of f
.err_est
: Estimate of the error (output).err_code
: Error code (output). Non-zero values indicate failure of
convergence.maxeval
: Maximum number of function evaluations.eps_abs
, eps_rel
: Absolute and relative tolerance.See the example below:
// [[Rcpp::depends(RcppEigen)]] // [[Rcpp::depends(RcppNumerical)]] // P(a1 < X1 < b1, a2 < X2 < b2), (X1, X2) ~ N([0], [1 rho]) // ([0], [rho 1]) ; // [[Rcpp::export]] Rcpp::List
We can test the result in R:
library(mvtnorm)trueval = pmvnorm(c(-1, -1), c(1, 1), sigma = matrix(c(1, 0.5, 0.5, 1), 2))integrate_test2()## $approximate## [1] 0.4979718#### $error_estimate## [1] 4.612333e-09#### $error_code## [1] 0trueval - integrate_test2()$approximate## [1] 2.893336e-11
Currently RcppNumerical contains the L-BFGS algorithm for unconstrained minimization problems based on the LBFGS++ library.
Again, one needs to first define a functor to represent the multivariate function to be minimized.
;
Same as the case in multi-dimensional integration, Constvec
represents a
read-only vector and Refvec
a writable vector. Their definitions are
// Reference to a vector typedef Eigen::Ref<Eigen::VectorXd> Refvec;typedef const Eigen::Ref<const Eigen::VectorXd> Constvec;
The f_grad()
member function returns the function value on vector x
,
and overwrites grad
by the gradient.
The wrapper function for LBFGS++ is
inline int
f
: The function to be minimized.x
: In: the initial guess. Out: best value of variables found.fx_opt
: Out: Function value on the output x
.maxit
: Maximum number of iterations.eps_f
: Algorithm stops if |f_{k+1} - f_k| < eps_f * |f_k|
.eps_g
: Algorithm stops if ||g|| < eps_g * max(1, ||x||)
.Below is an example that illustrates the optimization of the Rosenbrock function
f(x1, x2) = 100 * (x2 - x1^2)^2 + (1 - x1)^2
:
// [[Rcpp::depends(RcppEigen)]] // [[Rcpp::depends(RcppNumerical)]] // f = 100 * (x2 - x1^2)^2 + (1 - x1)^2 // True minimum: x1 = x2 = 1 ; // [[Rcpp::export]] Rcpp::List
Calling the generated R function optim_test()
gives
optim_test()## $xopt## [1] 1 1#### $fopt## [1] 3.12499e-15#### $status## [1] 0
It may be more meaningful to look at a real application of the RcppNumerical package. Below is an example to fit logistic regression using the L-BFGS algorithm. It also demonstrates the performance of the library.
// [[Rcpp::depends(RcppEigen)]] // [[Rcpp::depends(RcppNumerical)]] typedef Eigen::Map<Eigen::MatrixXd> MapMat;typedef Eigen::Map<Eigen::VectorXd> MapVec; ; // [[Rcpp::export]] Rcpp::NumericVector
Here is the R code to test the function:
set.seed(123)n = 5000p = 100x = matrix(rnorm(n * p), n)beta = runif(p)xb = c(x %*% beta)p = exp(xb) / (1 + exp(xb))y = rbinom(n, 1, p) system.time(res1 <- glm.fit(x, y, family = binomial())$coefficients)## user system elapsed## 0.229 0.006 0.234system.time(res2 <- logistic_reg(x, y))## user system elapsed## 0.005 0.000 0.006max(abs(res1 - res2))## [1] 0.0001873564
It is much faster than the standard glm.fit()
function in R! (Although
glm.fit()
calculates some other quantities besides beta.)
RcppNumerical also provides the fastLR()
function to run fast logistic
regression, which is a modified and more stable version of the code above.
system.time(res3 <- fastLR(x, y)$coefficients)## user system elapsed## 0.007 0.001 0.008max(abs(res1 - res3))## [1] 7.066969e-06