High Performance Tools for Combinatorics and Computational Mathematics

Provides optimized functions and flexible combinatorial iterators implemented in C++ with 'Rcpp' for solving problems in combinatorics and computational mathematics. Utilizes parallel programming via 'RcppThread' for maximal performance. Also makes use of the RMatrix class from the 'RcppParallel' library. There are combination/permutation functions with constraint parameters that allow for generation of all results of a vector meeting specific criteria (e.g. generating integer partitions/compositions or finding all combinations such that the sum is between two bounds). Capable of generating specific combinations/permutations (e.g. retrieve only the nth lexicographical result) which sets up nicely for parallelization as well as random sampling. Gmp support permits exploration where the total number of results is large (e.g. comboSample(10000, 500, n = 4)). Additionally, there are several high performance number theoretic functions that are useful for problems common in computational mathematics. Some of these functions make use of the fast integer division library 'libdivide'. The primeSieve function is based on the segmented sieve of Eratosthenes implementation by Kim Walisch. It is also efficient for large numbers by using the cache friendly improvements originally developed by Tomás Oliveira. Finally, there is a prime counting function that implements Legendre's formula based on the work of Kim Walisch. Overview

A collection of high performance functions implemented in C++ with Rcpp for solving problems in combinatorics and computational mathematics. Utilizes the library RcppThread where multithreading is needed. We also make use of the RMatrix.h header file from RcppParallel for thread safe accessors for Rcpp matrices. Featured functions:

• primeSieve - Generates all primes less than a billion in under 0.5 seconds.
• primeCount - Counts the number of primes below a 1e12 in ~130 milliseconds and 1e15 in under 50 seconds.
• comboGeneral/permuteGeneral - Generate all combinations/permutations of a vector (including multisets) meeting specific criteria.
• Produce results in parallel using the Parallel argument. You can also apply each of the five compiled functions given by the argument constraintFun in parallel as well. E.g. Obtaining the row sums of all combinations:
• comboGeneral(20, 10, constraintFun = "sum", Parallel = TRUE)
• Alternatively, the arguments lower and upper make it possible to generate combinations/permutations in chunks allowing for parallelization via the package parallel. This is convenient when you want to apply a custom function to the output in parallel as well (see this stackoverflow post for a use case).
• GMP support allows for exploration of combinations/permutations of vectors with many elements.
• comboSample/permuteSample - Easily generate random samples of combinations/permutations in parallel.
• You can pass a vector of specific indices or rely on the internal sampling functions. We call sample when the total number of results is small and for larger cases, the sampling is done in a very similar fashion to urand.bigz from the gmp package.

The primeSieve function and the primeCount function are both based off of the excellent work by Kim Walisch. The respective repos can be found here: kimwalisch/primesieve; kimwalisch/primecount

Additionally, many of the sieving functions make use of the fast integer division library libdivide by ridiculousfish.

Usage

Common Combinatorial Functions

Easily executed with a very simple interface. Output is in lexicographical order.

Using Constraints

Oftentimes, one needs to find combinations/permutations that meet certain requirements.

Working with Multisets

Sometimes, the standard combination/permutation functions don't quite get us to our desired goals. For example, one may need all permutations of a vector with some of the elements repeated a specific amount of times (i.e. a multiset). Consider the following vector a <- c(1,1,1,1,2,2,2,7,7,7,7,7) and one would like to find permutations of a of length 6. Using traditional methods, we would need to generate all permutations, then eliminate duplicate values. Even considering that permuteGeneral is very efficient, this approach is clunky and not as fast as it could be. Observe:

Enter freqs

Situations like this call for the use of the freqs argument. Simply, enter the number of times each unique element is repeated and Voila!

Here are some more general examples with multisets:

Parallel Computing

Using the parameter Parallel, we can easily generate combinations/permutations with great efficiency.

And applying any of the constraint functions in parallel is highly efficient as well. Consider obtaining the row sums of all combinations:

Faster than rowSums and rowMeans

In fact, finding row sums or row means is even faster than simply applying the highly efficient rowSums/rowMeans after the combinations have already been generated:

In both cases above, RcppAlgos is doing double the work nearly twice as fast!!!

Using arguments lower and upper

There are arguments lower and upper that can be utilized to generate chunks of combinations/permutations without having to generate all of them followed by subsetting. As the output is in lexicographical order, these arguments specify where to start and stop generating. For example, comboGeneral(5, 3) outputs 10 combinations of the vector 1:5 chosen 3 at a time. We can set lower to 5 in order to start generation from the 5th lexicographical combination. Similarly, we can set upper to 4 in order only generate the first 4 combinations. We can also use them together to produce only a certain chunk of combinations. For example, setting lower to 4 and upper to 6 only produces the 4th, 5th, and 6th lexicographical combinations. Observe:

In addition to being useful by avoiding the unnecessary overhead of generating all combination/permutations followed by subsetting just to see a few specific results, lower and upper can be utilized to generate large number of combinations/permutations in parallel. Observe:

These arguments are also useful when one needs to explore combinations/permutations of really large vectors:

Sampling

We can also produce random samples of combinations/permutations with comboSample and permuteSample. This is really useful when we need a reproducible set of random combinations/permutations. Many of the traditional ways of doing this involved relying on heavy use of sample and hoping that we don't generate duplicate results. Both functions have a similar interface to their respective General functions. Observe:

Just like the General counterparts (i.e. combo/permuteGeneral), we can easily explore combinations/permutations of large vectors where the total number of results is enormous in parallel (using Parallel or nThreads).

User Defined Functions

You can also pass user defined functions by utilizing the argument FUN. This feature's main purpose is for convenience, however it is somewhat more efficient than generating all combinations/permutations and then using a function from the apply family (N.B. the argument Parallel has no effect when FUN is employed).

Mathematical Computation

RcppAlgos comes equipped with several functions for quickly generating essential components for problems common in computational mathematics. All functions below can be executed in parallel by using the argument nThreads.

The following sieving functions (primeFactorizeSieve, divisorsSieve, numDivisorSieve, & eulerPhiSieve) are very useful and flexible. Generate components up to a number or between two bounds.

Vectorized Functions

There are three very fast vectorized functions for general factoring (e.g. all divisors of number), primality testing, as well as prime factoring (divisorsRcpp, isPrimeRcpp, primeFactorize)

primeSieve & primeCount

Both of these functions are based on the excellent algorithms developed by Kim Walisch. First primeSieve:

Larger primes

Since version 2.3.0, we are implementing the cache-friendly improvements for larger primes originally developed by Tomás Oliveira.

primeCount

The library by Kim Walisch relies on OpenMP for parallel computation with Legendre's Formula. Currently, the default compiler on macOS is clang, which does not support OpenMP. James Balamuta (a.k.a. TheCoatlessProfessor... well at least we think so) has written a great article on this topic, which you can find here: https://thecoatlessprofessor.com/programming/openmp-in-r-on-os-x/. One of the goals of RcppAlgos is to be accessible by all users. With this in mind, we set out to count primes in parallel without OpenMP.

At first glance, this seems trivial as we have a function in Primes.cpp called phiWorker that counts the primes up to x. If you look in phi.cpp in the primecount library by Kim Walisch, we see that OpenMP does its magic on a for loop that makes repeated calls to phi (which is what phiWorker is based on). All we need to do is break this loop into n intervals where n is the number of threads. Simple, right?

With this is mind, here are some results:

Contact

I welcome any and all feedback. If you would like to report a bug, have a question, or have suggestions for possible improvements, please contact me here: [email protected]

News

RcppAlgos 0.2.5 (Release date: 2018-01-04)

• Removed unnecessary files.
• Fixed bug in primeSieve that occurred when a number with a decimal was passed (e.g. 2.01).
• Adjusted accepted lower bound for numDivisorSieve, eulerPhiSieve, divisorsList, primeSieve, and primeFactorizationList.
• Fixed bug when non-unique elements are present with factors.

RcppAlgos 0.2.4 (Release date: 2017-12-18)

• Fixed bug that occurs when non-unique elements are present for combinations with replacement.

RcppAlgos 0.2.3 (Release date: 2017-12-18)

• Fixed segmentation fault error highlighted by valgrind check in version 0.2.2.
• Updated DESCRIPTION file.

RcppAlgos 0.2.2 (Release date: 2017-12-15)

• Fixed bug in constraint functions that occurred when m = 1 and the constraint limit was equal to the last element in v. It was returning a 2x1 matrix with the same value twice. It is now correctly returning a 1x1 matrix with the correct value 1 time.
• Reorganized source code such that all utility functions for the combinatoric functions are now in their own file. Additionally added header for this file.
• All combinatoric functions can now utilize the rowCap argument. Before, rowCap only applied to the combinatorial functions with constraints.
• comboGeneral can now find all combinations of multisets.
• Both comboGeneral and permuteGeneral can utilize the argument m when dealing with multisets. Before, permuteGeneral would simply return all permutations of a multiset. Now you can specify the lengths of the output.

RcppAlgos 0.2.1 (Release date: 2017-11-29)

• Fixed bug that would occur in two edge cases involving the constraint functions.
• Slightly modified formatting for primeSieve.Rd

RcppAlgos 0.2.0 (Release date: 2017-11-28)

• Updated combination algorithms. They are now more than twice as fast.
• Updated constraint functions so that memory access is always within container bounds
• Consolidated redundant code for outputting different Rcpp types (e.g. IntegerMatrix, CharacterMatrix, etc.) via a templated approach.
• Added the function permuteGeneral that is analogous to comboGeneral only instead of combinations, it gives all permutations. It has an additional argument (i.e. 'freqs') that is used to generate permutations of multisets.
• All combinatoric functions now support factor types.

RcppAlgos 0.1.2 (Release date: 2017-11-03)

• Corrected minor typo in README file.
• Fixed minor error regarding explicitly comparing variables to large numbers that are typed out. Simply adding a decimal along with a zero remedies the situation.

RcppAlgos 0.1.1 (Release date: 2017-11-03)

• Improved ComboConstraint function by removing unnecessary subsetting.
• Improved PrimeSieve internal C++ algorithm.
• Corrected the errors with respect to the math functions in C++. Explicitly overloaded the parameters of these functions by casting them to the double type.

RcppAlgos 0.1.0 (Release date: 2017-10-26)

• Initial Release