Uploaded image for project: 'Commons Math'
  1. Commons Math
  2. MATH-1152

Suboptimal implementation of EnumeratedDistribution.sample()

Rank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.3
    • 3.4
    • None
    • None

    Description

      org.apache.commons.math3.distribution.EnumeratedDistribution.sample() performs a linear search to find the appropriate element in the probability space (called singletons here) given a random double value. For large probability spaces, this is not effective. Instead, we should cache the cumulative probabilities and do a binary search.

      Rough implementation:

      EnumeratedDistribution.java
      void computeCumulative() {
        cumulative = new double[size]; 
        double sum = 0;
        for (int i = 1; i < weights.length - 1; i++) {
            cumulative[i] = cumulative[i-1] + weights[i-1];
         }
        cumulative[size - 1] = 1;
      }
      

      and then

      EnumeratedDistribution.java
      int sampleIndex() {
       double randomValue = random.nextDouble();
       int result = Arrays.binarySearch(cumulative, randomValue);
       if (result >= 0) return result;
       int insertionPoint = -result-1;
       return insertionPoint;
      }
      
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            sandris Andras Sereny
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment