Provides optimized functions and flexible combinatorial iterators implemented in C++ with 'Rcpp' for solving problems in combinatorics and computational mathematics. Utilizes parallel programming via 'RcppThread' for maximal performance. Also makes use of the RMatrix class from the 'RcppParallel' library. There are combination/permutation functions with constraint parameters that allow for generation of all results of a vector meeting specific criteria (e.g. generating integer partitions/compositions or finding all combinations such that the sum is between two bounds). Capable of generating specific combinations/permutations (e.g. retrieve only the nth lexicographical result) which sets up nicely for parallelization as well as random sampling. Gmp support permits exploration where the total number of results is large (e.g. comboSample(10000, 500, n = 4)). Additionally, there are several high performance number theoretic functions that are useful for problems common in computational mathematics. Some of these functions make use of the fast integer division library 'libdivide'. The primeSieve function is based on the segmented sieve of Eratosthenes implementation by Kim Walisch. It is also efficient for large numbers by using the cache friendly improvements originally developed by Tomás Oliveira. Finally, there is a prime counting function that implements Legendre's formula based on the work of Kim Walisch.
A collection of high performance functions implemented in C++ with Rcpp for solving problems in combinatorics and computational mathematics. Utilizes the library RcppThread where multithreading is needed. We also make use of the RMatrix.h header file from RcppParallel for thread safe accessors for Rcpp matrices. Featured functions:
primeSieve
- Generates all primes less than a billion in under 0.5 seconds.system.time(b <- primeSieve(10^9, nThreads = 8))user system elapsed2.339 0.047 0.418
primeCount
- Counts the number of primes below a 1e12 in ~130 milliseconds and 1e15 in under 50 seconds.system.time(primeCount(1e12, nThreads = 8)) user system elapsed 0.804 0.005 0.127
comboGeneral
/permuteGeneral
- Generate all combinations/permutations of a vector (including multisets) meeting specific criteria.
Parallel
argument. You can also apply each of the five compiled functions given by the argument constraintFun
in parallel as well. E.g. Obtaining the row sums of all combinations:
comboGeneral(20, 10, constraintFun = "sum", Parallel = TRUE)
lower
and upper
make it possible to generate combinations/permutations in chunks allowing for parallelization via the package parallel
. This is convenient when you want to apply a custom function to the output in parallel as well (see this stackoverflow post for a use case).comboSample
/permuteSample
- Easily generate random samples of combinations/permutations in parallel.
sample
when the total number of results is small and for larger cases, the sampling is done in a very similar fashion to urand.bigz
from the gmp
package.The primeSieve
function and the primeCount
function are both based off of the excellent work by Kim Walisch. The respective repos can be found here: kimwalisch/primesieve; kimwalisch/primecount
Additionally, many of the sieving functions make use of the fast integer division library libdivide by ridiculousfish.
install.packages("RcppAlgos")## Or install the development versiondevtools::install_github("jwood000/RcppAlgos")
Easily executed with a very simple interface. Output is in lexicographical order.
## Find all 3-tuples combinations without## repetition of the numbers c(1, 2, 3, 4).comboGeneral(4, 3)[,1] [,2] [,3][1,] 1 2 3[2,] 1 2 4[3,] 1 3 4[4,] 2 3 4## Find all 3-tuples permutations without## repetition of the numbers c(1, 2, 3, 4).permuteGeneral(4, 3)[,1] [,2] [,3][1,] 1 2 3[2,] 1 2 4[3,] 1 3 2[4,] 1 3 4. . . .. . . .[21,] 4 2 1[22,] 4 2 3[23,] 4 3 1[24,] 4 3 2## For combinations/permutations with repetition, simply## set the repetition argument to TRUEcomboGeneral(4, 3, repetition = TRUE)[,1] [,2] [,3][1,] 1 1 1[2,] 1 1 2[3,] 1 1 3. . . .. . . .## They are very efficientsystem.time(comboGeneral(25, 13))user system elapsed0.116 0.062 0.178system.time(comboGeneral(25, 13, nThreads = 8))user system elapsed0.192 0.228 0.058nrow(comboGeneral(25,13))[1] 5200300
Oftentimes, one needs to find combinations/permutations that meet certain requirements.
## Generate some random dataset.seed(101)s <- sample(500, 20)## Find all 5-tuples permutations without repetition## of s (defined above) such that the sum is equal to 1176.p <- permuteGeneral(v = s,m = 5,constraintFun = "sum",comparisonFun = "==",limitConstraints = 1176,keepResults = TRUE)tail(p)[,1] [,2] [,3] [,4] [,5] [,6][3955,] 354 287 149 187 199 1176[3956,] 354 287 149 199 187 1176[3957,] 354 287 187 149 199 1176[3958,] 354 287 187 199 149 1176[3959,] 354 287 199 149 187 1176[3960,] 354 287 199 187 149 1176## Get combinations such that the product is between## 3600 and 4000 (including 3600 but not 4000)comboGeneral(5, 7, TRUE, constraintFun = "prod",comparisonFun = c(">=","<"),limitConstraints = c(3600, 4000),keepResults = TRUE)[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8][1,] 1 2 3 5 5 5 5 3750[2,] 1 3 3 4 4 5 5 3600[3,] 1 3 4 4 4 4 5 3840[4,] 2 2 3 3 4 5 5 3600[5,] 2 2 3 4 4 4 5 3840[6,] 3 3 3 3 3 3 5 3645[7,] 3 3 3 3 3 4 4 3888
Sometimes, the standard combination/permutation functions don't quite get us to our desired goals. For
example, one may need all permutations of a vector with some of the elements repeated a specific
amount of times (i.e. a multiset). Consider the following vector a <- c(1,1,1,1,2,2,2,7,7,7,7,7)
and one
would like to find permutations of a
of length 6. Using traditional methods, we would need to generate all
permutations, then eliminate duplicate values. Even considering that permuteGeneral
is very efficient,
this approach is clunky and not as fast as it could be. Observe:
getPermsWithSpecificRepetition <- function(z, n) {b <- permuteGeneral(z, n)myDupes <- duplicated(apply(b, 1, paste, collapse=""))b[!myDupes, ]}a <- c(1,1,1,1,2,2,2,7,7,7,7,7)system.time(test <- getPermsWithSpecificRepetition(a, 6))user system elapsed4.168 0.046 4.230
Situations like this call for the use of the freqs
argument. Simply, enter the number
of times each unique element is repeated and Voila!
system.time(test2 <- permuteGeneral(unique(a), 6, freqs = rle(a)$lengths))user system elapsed0 0 0identical(test, test2)[1] TRUE
Here are some more general examples with multisets:
## Generate all permutations of a vector with specific## length of repetition for each element (i.e. multiset)permuteGeneral(3, freqs = c(1,2,2))[,1] [,2] [,3] [,4] [,5][1,] 1 2 2 3 3[2,] 1 2 3 2 3[3,] 1 2 3 3 2[4,] 1 3 2 2 3[5,] 1 3 2 3 2. . . . . .. . . . . .[26,] 3 2 3 1 2[27,] 3 2 3 2 1[28,] 3 3 1 2 2[29,] 3 3 2 1 2[30,] 3 3 2 2 1## or combinations of a certain lengthcomboGeneral(3, 2, freqs = c(1,2,2), constraintFun = "prod")[,1] [,2] [,3][1,] 1 2 2[2,] 1 3 3[3,] 2 2 4[4,] 2 3 6[5,] 3 3 9
facPerms <- permuteGeneral(factor(c("low", "med", "high"),levels = c("low", "med", "high"),ordered = TRUE),freqs = c(1,4,2))[,1] [,2] [,3] [,4] [,5] [,6] [,7][1,] low med med med med high high[2,] low med med med high med high[3,] low med med med high high med[4,] low med med high med med high[5,] low med med high med high medLevels: low < med < high
Using the parameter Parallel
, we can easily generate combinations/permutations with great efficiency.
## RcppAlgos uses the number of cores available minus oneRcppAlgos::stdThreadMax()[1] 8 identical(comboGeneral(20, 10, freqs = rep(1:4, 5)), comboGeneral(20, 10, freqs = rep(1:4, 5), Parallel = TRUE))[1] TRUE ## Using 7 coreslibrary(microbenchmark)microbenchmark(serial = comboGeneral(20, 10, freqs = rep(1:4, 5)), parallel = comboGeneral(20, 10, freqs = rep(1:4, 5), Parallel = TRUE))Unit: milliseconds expr min lq mean median uq max neval serial 237.80421 244.06242 253.76806 249.45499 263.65091 284.7350 100 parallel 82.11218 85.88929 88.78034 87.72153 90.01558 126.3167 100
And applying any of the constraint functions in parallel is highly efficient as well. Consider obtaining the row sums of all combinations:
## base R using combn and FUNcombnSum <- combn(20, 10, sum)algosSum <- comboGeneral(20, 10, constraintFun = "sum") identical(as.integer(combnSum), algosSum[,11])[1] TRUE ## Using parallelparalSum <- comboGeneral(20, 10, constraintFun = "sum", Parallel = TRUE)identical(paralSum, algosSum)[1] TRUE microbenchmark(serial = comboGeneral(20, 10, constraintFun = "sum"), parallel = comboGeneral(20, 10, constraintFun = "sum", Parallel = TRUE), combnSum = combn(20, 10, sum))Unit: milliseconds expr min lq mean median uq max neval serial 3.426027 3.736279 4.285025 3.908142 4.312603 10.247652 100 parallel 1.233491 1.371532 1.742455 1.450557 1.970033 6.102456 100 combnSum 206.819174 222.254082 227.762687 224.331548 227.135461 293.065110 100
rowSums
and rowMeans
In fact, finding row sums or row means is even faster than simply applying the highly efficient rowSums
/rowMeans
after the combinations have already been generated:
## Pre-generate combinationscombs <- comboGeneral(25, 10) ## Testing rowSums alone against generating combinations as well as summingmicrobenchmark(serial = comboGeneral(25, 10, constraintFun = "sum"), parallel = comboGeneral(25, 10, constraintFun = "sum", Parallel = TRUE), rowsums = rowSums(combs))Unit: milliseconds expr min lq mean median uq max neval serial 113.74535 124.91470 127.85210 127.15044 130.47335 172.37043 100 parallel 41.23766 49.78379 51.46558 51.07019 52.58048 88.31258 100 rowsums 95.92139 98.21449 104.66987 101.32041 105.46913 152.69522 100 all.equal(rowSums(combs), comboGeneral(25, 10, constraintFun = "sum", Parallel = TRUE)[,11])[1] TRUE ## Testing rowMeans alone against generating combinations as well as obtain row meansmicrobenchmark(serial = comboGeneral(25, 10, constraintFun = "mean"), parallel = comboGeneral(25, 10, constraintFun = "mean", Parallel = TRUE), rowmeans = rowMeans(combs))Unit: milliseconds expr min lq mean median uq max neval serial 173.55130 183.94174 203.09916 202.17554 217.77389 274.7437 100 parallel 58.11018 61.17694 79.59311 79.21929 97.96688 127.3256 100 rowmeans 102.56019 103.72126 113.12717 106.89584 117.95641 165.1638 100 all.equal(rowMeans(combs), comboGeneral(25, 10, constraintFun = "mean", Parallel = TRUE)[,11])[1] TRUE
In both cases above, RcppAlgos
is doing double the work nearly twice as fast!!!
lower
and upper
There are arguments lower
and upper
that can be utilized to generate chunks of combinations/permutations without having to generate all of them followed by subsetting. As the output is in lexicographical order, these arguments specify where to start and stop generating. For example, comboGeneral(5, 3)
outputs 10 combinations of the vector 1:5
chosen 3 at a time. We can set lower
to 5 in order to start generation from the 5th lexicographical combination. Similarly, we can set upper
to 4 in order only generate the first 4 combinations. We can also use them together to produce only a certain chunk of combinations. For example, setting lower
to 4 and upper
to 6 only produces the 4th, 5th, and 6th lexicographical combinations. Observe:
comboGeneral(5, 3, lower = 4, upper = 6)[,1] [,2] [,3][1,] 1 3 4[2,] 1 3 5[3,] 1 4 5## is equivalent to the following:comboGeneral(5, 3)[4:6, ][,1] [,2] [,3][1,] 1 3 4[2,] 1 3 5[3,] 1 4 5
In addition to being useful by avoiding the unnecessary overhead of generating all combination/permutations followed by subsetting just to see a few specific results, lower
and upper
can be utilized to generate large number of combinations/permutations in parallel. Observe:
## Over 3 billion resultscomboCount(35, 15)[1] 3247943160## 10086780 evenly divides 3247943160system.time(lapply(seq(1, 3247943160, 10086780), function(x) {temp <- comboGeneral(35, 15, lower = x, upper = x + 10086779)## do somethingx}))user system elapsed99.495 49.993 149.467## Enter parallellibrary(parallel)system.time(mclapply(seq(1, 3247943160, 10086780), function(x) {temp <- comboGeneral(35, 15, lower = x, upper = x + 10086779)## do somethingx}, mc.cores = 6))user system elapsed125.361 85.916 35.641
These arguments are also useful when one needs to explore combinations/permutations of really large vectors:
set.seed(222)myVec <- rnorm(1000) ## HUGE number of combinationscomboCount(myVec, 50, repetition = TRUE)Big Integer ('bigz') :[1] 109740941767310814894854141592555528130828577427079559745647393417766593803205094888320 ## Let's look at one hundred thousand combinations in the range (1e15 + 1, 1e15 + 1e5)system.time(b <- comboGeneral(myVec, 50, TRUE, lower = 1e15 + 1, upper = 1e15 + 1e5)) user system elapsed 0.008 0.001 0.010 b[1:5, 45:50] [,1] [,2] [,3] [,4] [,5] [,6][1,] 0.5454861 0.4787456 0.7797122 2.004614 -1.257629 -0.7740501[2,] 0.5454861 0.4787456 0.7797122 2.004614 -1.257629 0.1224679[3,] 0.5454861 0.4787456 0.7797122 2.004614 -1.257629 -0.2033493[4,] 0.5454861 0.4787456 0.7797122 2.004614 -1.257629 1.5511027[5,] 0.5454861 0.4787456 0.7797122 2.004614 -1.257629 1.0792094
We can also produce random samples of combinations/permutations with comboSample
and permuteSample
. This is really useful when we need a reproducible set of random combinations/permutations. Many of the traditional ways of doing this involved relying on heavy use of sample
and hoping that we don't generate duplicate results. Both functions have a similar interface to their respective General
functions. Observe:
comboSample(10, 8, TRUE, n = 5, seed = 84)[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8][1,] 3 3 3 6 6 10 10 10[2,] 1 3 3 4 4 7 9 10[3,] 3 7 7 7 9 10 10 10[4,] 3 3 3 9 10 10 10 10[5,] 1 2 2 3 3 4 4 7## We can also use sampleVec to generate specific results## E.g. the below generates the 1st, 5th, 25th, 125th, and## 625th lexicographical combinationscomboSample(10, 8, TRUE, sampleVec = c(1, 5, 25, 125, 625))[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8][1,] 1 1 1 1 1 1 1 1[2,] 1 1 1 1 1 1 1 5[3,] 1 1 1 1 1 1 3 8[4,] 1 1 1 1 1 3 6 9[5,] 1 1 1 1 5 6 10 10## Is the same as:comboGeneral(10, 8, TRUE)[5^(0:4), ][,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8][1,] 1 1 1 1 1 1 1 1[2,] 1 1 1 1 1 1 1 5[3,] 1 1 1 1 1 1 3 8[4,] 1 1 1 1 1 3 6 9[5,] 1 1 1 1 5 6 10 10
Just like the General
counterparts (i.e. combo/permuteGeneral
), we can easily explore combinations/permutations of large vectors where the total number of results is enormous in parallel (using Parallel
or nThreads
).
## Uses stdThreadMax() - 1 = 7 thread (in this case)permuteSample(500, 10, TRUE, n = 5, seed = 123, Parallel = TRUE) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10][1,] 55 435 274 324 200 152 6 313 121 377[2,] 196 166 331 154 443 329 155 233 354 442[3,] 235 325 94 27 370 117 302 86 229 126[4,] 284 104 464 104 207 127 117 9 390 414[5,] 456 76 381 456 219 23 376 187 11 123 permuteSample(factor(state.abb), 15, n = 3, seed = 50, nThreads = 4) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15][1,] ME FL DE OK ND CA PA AL ID MO NM HI KY MT NJ [2,] AZ CA AL CT ME SD ID SC OK NH HI TN ND IA MT [3,] MD MO NC MT NH AL VA MA VT WV NJ NE MN MS MI 50 Levels: AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN ... WY permuteCount(factor(state.abb), 15)Big Integer ('bigz') :[1] 2943352142120754524160000
You can also pass user defined functions by utilizing the argument FUN
. This feature's main purpose is for convenience, however it is somewhat more efficient than generating all combinations/permutations and then using a function from the apply
family (N.B. the argument Parallel
has no effect when FUN
is employed).
funCustomComb = function(n, r) { combs = comboGeneral(n, r) lapply(1:nrow(combs), function(x) cumprod(combs[x,]))} identical(funCustomComb(15, 8), comboGeneral(15, 8, FUN = cumprod))[1] TRUE library(microbenchmark)microbenchmark(f1 = funCustomComb(15, 8), f2 = comboGeneral(15, 8, FUN = cumprod), unit = "relative")unit: relative expr min lq mean median uq max neval f1 6.946481 6.891553 6.334866 6.821221 6.934111 2.686777 100 f2 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100 comboGeneral(15, 8, FUN = cumprod, upper = 3)[[1]][1] 1 2 6 24 120 720 5040 40320 [[2]][1] 1 2 6 24 120 720 5040 45360 [[3]][1] 1 2 6 24 120 720 5040 50400 ## Works with the sampling functions as wellpermuteSample(5000, 1000, n = 3, seed = 101, FUN = sd)[[1]][1] 1431.949 [[2]][1] 1446.859 [[3]][1] 1449.272
RcppAlgos
comes equipped with several functions for quickly generating essential components
for problems common in computational mathematics. All functions below can be executed in parallel by using the argument nThreads
.
The following sieving functions (primeFactorizeSieve
, divisorsSieve
, numDivisorSieve
, & eulerPhiSieve
) are very useful and flexible. Generate components up to a number or between two bounds.
## get the number of divisors for every number from 1 to nnumDivisorSieve(20)[1] 1 2 2 3 2 4 2 4 3 4 2 6 2 4 4 5 2 6 2 6## If you want the complete factorization from 1 to n, use divisorsListsystem.time(allFacs <- divisorsSieve(10^5, namedList = TRUE))user system elapsed0.045 0.004 0.049allFacs[c(4339, 15613, 22080)]$`4339`[1] 1 4339$`15613`[1] 1 13 1201 15613$`22080`[1] 1 2 3 4 5 6 8 10 12 15[11] 16 20 23 24 30 32 40 46 48 60[21] 64 69 80 92 96 115 120 138 160 184[31] 192 230 240 276 320 345 368 460 480 552[41] 690 736 920 960 1104 1380 1472 1840 2208 2760[51] 3680 4416 5520 7360 11040 22080## Between two boundsprimeFactorizeSieve(10^12, 10^12 + 5)[[1]][1] 2 2 2 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5 5 5 5 5 5 5[[2]][1] 73 137 99990001[[3]][1] 2 3 166666666667[[4]][1] 61 14221 1152763[[5]][1] 2 2 17 149 197 501001[[6]][1] 3 5 66666666667## Creating a named objecteulerPhiSieve(20, namedVector = TRUE)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 201 1 2 2 4 2 6 4 6 4 10 4 12 6 8 8 16 6 18 8system.time(a <- eulerPhiSieve(1e12, 1e12 + 1e7))user system elapsed1.092 0.044 1.143## Using nThreads for greater efficiencysystem.time(b <- eulerPhiSieve(1e12, 1e12 + 1e7, nThreads = 8))user system elapsed3.784 0.014 0.518identical(a, b)[1] TRUE
There are three very fast vectorized functions for general factoring (e.g. all divisors of number), primality testing, as well as prime factoring (divisorsRcpp
, isPrimeRcpp
, primeFactorize
)
## get result for individual numbersprimeFactorize(123456789)[1] 3 3 3607 3803## or for an entire vectorset.seed(100)myVec <- sample(-100000000:100000000, 5)divisorsRcpp(myVec, namedList = TRUE)$`-38446778`[1] -38446778 -19223389 -2 -1 1[6] 2 19223389 38446778$`-48465500`[1] -48465500 -24232750 -12116375 -9693100 -4846550[6] -2423275 -1938620 -969310 -484655 -387724[11] -193862 -96931 -500 -250 -125[16] -100 -50 -25 -20 -10[21] -5 -4 -2 -1 1[26] 2 4 5 10 20[31] 25 50 100 125 250[36] 500 96931 193862 387724 484655[41] 969310 1938620 2423275 4846550 9693100[46] 12116375 24232750 48465500$`10464487`[1] 1 11 317 3001 3487 33011[7] 951317 10464487$`-88723370`[1] -88723370 -44361685 -17744674 -8872337 -10[6] -5 -2 -1 1 2[11] 5 10 8872337 17744674 44361685[16] 88723370$`-6290143`[1] -6290143 -1 1 6290143## Creating a named objectisPrimeRcpp(995:1000, namedVector = TRUE)995 996 997 998 999 1000FALSE FALSE TRUE FALSE FALSE FALSEsystem.time(a <- primeFactorize(1e12:(1e12 + 3e4)))user system elapsed1.202 0.002 1.207## Using nThreads for greater efficiencysystem.time(b <- primeFactorize(1e12:(1e12 + 3e4), nThreads = 8))user system elapsed1.674 0.003 0.220identical(a, b)[1] TRUE
Both of these functions are based on the excellent algorithms developed by Kim Walisch. First primeSieve
:
## Quickly generate large primes over small intervaloptions(scipen = 50)system.time(myPs <- primeSieve(10^13+10^3, 10^13))user system elapsed0.035 0.002 0.036myPs[1] 10000000000037 10000000000051 10000000000099 10000000000129[5] 10000000000183 10000000000259 10000000000267 10000000000273[9] 10000000000279 10000000000283 10000000000313 10000000000343[13] 10000000000391 10000000000411 10000000000433 10000000000453[17] 10000000000591 10000000000609 10000000000643 10000000000649[21] 10000000000657 10000000000687 10000000000691 10000000000717[25] 10000000000729 10000000000751 10000000000759 10000000000777[29] 10000000000853 10000000000883 10000000000943 10000000000957[33] 10000000000987 10000000000993## Object created is smallobject.size(myPs)312 bytes## primes under a billion!!!system.time(a <- primeSieve(10^9))user system elapsed1.263 0.095 1.368## Using nThreadssystem.time(b <- primeSieve(10^9, nThreads = 8))user system elapsed2.339 0.047 0.418
Since version 2.3.0
, we are implementing the cache-friendly improvements for larger primes originally developed by Tomás Oliveira.
## Version <= 2.2.0.. i.e. older versionssystem.time(old <- RcppAlgos2.2::primeSieve(1e15, 1e15 + 1e9))user system elapsed7.615 0.140 7.792## v2.3.0 is over 3x faster!system.time(a <- primeSieve(1e15, 1e15 + 1e9))user system elapsed2.530 0.197 2.744## And using nThreads we are ~8x fastersystem.time(b <- primeSieve(1e15, 1e15 + 1e9, nThreads = 8))user system elapsed5.087 0.778 0.997identical(a, b)[1] TRUEidentical(a, old)[1] TRUE
The library by Kim Walisch relies on OpenMP for parallel computation with Legendre's Formula. Currently, the default compiler on macOS
is clang
, which does not support OpenMP
. James Balamuta (a.k.a. TheCoatlessProfessor... well at least we think so) has written a great article on this topic, which you can find here: https://thecoatlessprofessor.com/programming/openmp-in-r-on-os-x/. One of the goals of RcppAlgos
is to be accessible by all users. With this in mind, we set out to count primes in parallel without OpenMP
.
At first glance, this seems trivial as we have a function in Primes.cpp
called phiWorker
that counts the primes up to x
. If you look in phi.cpp in the primecount
library by Kim Walisch, we see that OpenMP
does its magic on a for loop that makes repeated calls to phi
(which is what phiWorker
is based on). All we need to do is break this loop into n intervals where n is the number of threads. Simple, right?
We can certainly do this, but what you will find is that n - 1 threads will complete very quickly and the nth thread will be left with a heavy computation. In order to alleviate this unbalanced load, we take advantage of thread pooling provided by RcppThread
which allows us to reuse threads efficiently as well as breaking up the loop mentioned above into smaller intervals. The idea is to completely calculate phi
up to a limit m
using all n threads and then gradually increase m. The advantage here is that we are benefiting greatly from the caching done by the work of the previous n threads.
With this is mind, here are some results:
## Enumerate the number of primes below trillionsystem.time(underOneTrillion <- primeCount(10^12))user system elapsed0.481 0.000 0.483underOneTrillion[1] 37607912018## Enumerate the number of primes below a billion in 2 millisecondslibrary(microbenchmark)microbenchmark(primeCount(10^9))Unit: millisecondsexpr min lq mean median uq max nevalprimeCount(10^9) 1.958806 1.960901 2.05824 1.994066 2.099495 3.610174 100primeCount(10^9)[1] 50847534system.time(underOneHundredTrillion <- primeCount(1e14, nThreads = 8))user system elapsed49.894 0.102 6.774underOneHundredTrillion[1] 3204941750802## From Kim Walisch's primecount library:## Josephs-MBP:primecount-4 josephwood$ ./primecount 1e14 --legendre --time## 3204941750802## Seconds: 4.441
I welcome any and all feedback. If you would like to report a bug, have a question, or have suggestions for possible improvements, please contact me here: [email protected]
RcppAlgos 0.2.5 (Release date: 2018-01-04)
RcppAlgos 0.2.4 (Release date: 2017-12-18)
RcppAlgos 0.2.3 (Release date: 2017-12-18)
RcppAlgos 0.2.2 (Release date: 2017-12-15)
RcppAlgos 0.2.1 (Release date: 2017-11-29)
RcppAlgos 0.2.0 (Release date: 2017-11-28)
RcppAlgos 0.1.2 (Release date: 2017-11-03)
RcppAlgos 0.1.1 (Release date: 2017-11-03)
RcppAlgos 0.1.0 (Release date: 2017-10-26)