Uploaded image for project: 'Commons RNG'
  1. Commons RNG
  2. RNG-101

MarsagliaTsangWang discrete probability sampler

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Implemented
    • 1.3
    • 1.3
    • sampling
    • None

    Description

      The following paper provides a method for sampling from a discrete probability distribution that requires a single 30-bit unisigned integer per sample:

      George Marsaglia, Wai Wan Tsang, Jingbo Wang (2004) 
      Fast Generation of Discrete Random Variables. 
      Journal of Statistical Software. Vol. 11, Issue. 3, pp. 1-11.
      

      JSS Vol 11, Issue 3

      The paper gives a general method that requires input probabilities expressed as unsigned 30 bit integers. The probabilities thus have an assumed divisor of 2^30^. The probabilities must sum to 2^30^. These are then efficiently tabulated for fast look-up using 5 base-64 digits from the 30-bit number. Sampling then takes 1 random integer per sample. No samples can be obtained for any observation where the probability is < 1^-32^.

      The paper provides software to compute the probability distributions for Poisson and Binomial and the look-up tables.

      It is applicable to any distribution by normalising to 2^30^:

      // Compute the normalisation: 2^30 / sum
      // (assuming sum has been precomputed)
      final double normalisation = (1 << 30) / sum;
      final int[] prob = new int[probabilities.length];
      int sum = 0;
      int max = 0;
      int mode = 0;
      for (int i = 0; i < prob.length; i++) {
          // Add 0.5 for rounding
          final int p = (int) (probabilities[i] * normalisation + 0.5);
          sum += p;
          // Find the mode (maximum probability)
          if (max < p) {
              max = p;
              mode = i;
          }
          prob[i] = p;
      }
      
      // The sum must be >= 2^30.
      // Here just compensate the difference onto the highest probability.
      prob[mode] += (1 << 30) - sum;
      

      Attachments

        Issue Links

          Activity

            People

              aherbert Alex Herbert
              aherbert Alex Herbert
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h