Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
3.3
-
None
-
None
Description
org.apache.commons.math3.distribution.EnumeratedDistribution.sample() performs a linear search to find the appropriate element in the probability space (called singletons here) given a random double value. For large probability spaces, this is not effective. Instead, we should cache the cumulative probabilities and do a binary search.
Rough implementation:
EnumeratedDistribution.java
void computeCumulative() { cumulative = new double[size]; double sum = 0; for (int i = 1; i < weights.length - 1; i++) { cumulative[i] = cumulative[i-1] + weights[i-1]; } cumulative[size - 1] = 1; }
and then
EnumeratedDistribution.java
int sampleIndex() { double randomValue = random.nextDouble(); int result = Arrays.binarySearch(cumulative, randomValue); if (result >= 0) return result; int insertionPoint = -result-1; return insertionPoint; }