 # MarsagliaTsangWang discrete probability sampler

XMLWordPrintableJSON

#### Details

• New Feature
• Status: Closed
• Major
• Resolution: Implemented
• 1.3
• None

#### Description

The following paper provides a method for sampling from a discrete probability distribution that requires a single 30-bit unisigned integer per sample:

```George Marsaglia, Wai Wan Tsang, Jingbo Wang (2004)
Fast Generation of Discrete Random Variables.
Journal of Statistical Software. Vol. 11, Issue. 3, pp. 1-11.
```

JSS Vol 11, Issue 3

The paper gives a general method that requires input probabilities expressed as unsigned 30 bit integers. The probabilities thus have an assumed divisor of 2^30^. The probabilities must sum to 2^30^. These are then efficiently tabulated for fast look-up using 5 base-64 digits from the 30-bit number. Sampling then takes 1 random integer per sample. No samples can be obtained for any observation where the probability is < 1^-32^.

The paper provides software to compute the probability distributions for Poisson and Binomial and the look-up tables.

It is applicable to any distribution by normalising to 2^30^:

```// Compute the normalisation: 2^30 / sum
// (assuming sum has been precomputed)
final double normalisation = (1 << 30) / sum;
final int[] prob = new int[probabilities.length];
int sum = 0;
int max = 0;
int mode = 0;
for (int i = 0; i < prob.length; i++) {
final int p = (int) (probabilities[i] * normalisation + 0.5);
sum += p;
// Find the mode (maximum probability)
if (max < p) {
max = p;
mode = i;
}
prob[i] = p;
}

// The sum must be >= 2^30.
// Here just compensate the difference onto the highest probability.
prob[mode] += (1 << 30) - sum;
```

#### People Alex Herbert Alex Herbert
0 Vote for this issue
Watchers:
1 Start watching this issue

#### Dates

Created:
Updated:
Resolved:

#### Time Tracking

Estimated: Not Specified
Remaining: 0h
Logged: 1h