Uploaded image for project: 'Commons RNG'
  1. Commons RNG
  2. RNG-181

LXM generators to implement SplittableUniformRandomProvider

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Implemented
    • 1.5
    • 1.5
    • core
    • None
    • Easy

    Description

      The LXM family of generators use a linear congruential generator (LCG) mixed with a xor-based generator. The paper introducing these generators indicates that these are suitable for splitting as the streams are demonstrated to be independent when the addition parameter for the LCG is different. The authors provide results for outputting the combined (interleaved) sequence from up to 2^24 generators with different LCG add parameters through Test U01 BigCrush with no failures.

      These generators are splittable in the JDK 17 java.util.random package. Splitting functionality should be added to the family in Commons RNG.

      A single split of the generator is made by randomly seeding a new generator using the existing one. This is no different than simply creating a new generator from any source of randomness.

      When the generator is used to make a Stream<SplittableUniformRandomProvider> then each stream instance can be created using a different addition parameter. This can be done by exploiting the fact that the spliterator for the stream maintains the stream position as a count in [0, streamSize). The addition parameter can be created by using this count. To ensure a second stream will have different seeding the count can be combined with random bits that change for each stream instance.

      JDK Implementation

      The exact mechanism used in the JDK is not documented (in the public javadoc) so may be subject to change. The present source code adds the stream position bits to random bits to create the addition. The seed random bits are generated with the some of the lowest bits set to 1. The addition parameter is created by shifting the seed random addition left until the lowest 1-bit is above the highest 1-bit of the position. The two can be combined (with xor) to create an addition which will be unique among the stream. 

      The present implementation in the JDK creates the initial seed as a series of n-bit characters using a seed generation algorithm; the lowest n-bits are set to 1 to act as a marker for the end of the seed. The algorithm to create the initial seed characters uses the overflow from a multiply congruential generator (MCG) with m=15 to create the digits as 4-bits. The source code indicates that the character size could be any supported number in [2, 63]. However the more bits that are set to mark the lower position of the seed impacts the amount of randomness left in the remaining bits.

      The initial seed characters are created using Math.multiplyHigh which is signed. The result is that the digits in the seed have a non-uniform distribution as the overflow character is subject to sign extension correction in the multiply. If the same algorithm uses JDK 18s Math.unsignedMultiplyHigh then the digits have a uniform distribution.

      Since the intrinsic functions Math.multiplyHigh and Math.unsignedMultiplyHigh are not available in JDK 8 then a similar approach using a MCG would be slow.

      Details to take away are that the counter and initial random seed are combined. Overlap of the seed and the counter are avoided by shifting the seed left. Some bits are used as a marker in the seed which reduces the randomness. Ideally the initial seed should have randomly distributed bits; this is not the case in the JDK 17 implementation due to the generation algorithm used.

      Proposed Implementation

      To create a random addition combine the stream position with random bits.

      Notes:

      1. The stream position will be a positive long. However the addition for the LCG must be odd. This is done in the LXM family by setting the lowest 1 bit to 1. Thus the count must be shifted left 1-bit to avoid duplications in the seed sequence. This removes the ability to add extra bits to seed additions when the stream size exceeds 2^63. Such large streams are not expected in common use.
      2. A counter that is enumerated from 0 to 2^n - 1 will be uniformly distributed for each bit in the first n-bits. However when used as an addition to a sum in an LCG all of these possible additions may not trickle up changes very far into higher bits. 1-bits must be present in higher positions of the addition to effect higher bits in the sum. If all the higher bits are the same then the trickle up effect will be similar. So it is an advantage to have the seed counter combined with different random bits.

      A random seed could be xor'd with the counter. However the upper bits for all additions would be the same above the highest 1-bit in the sequence size. This reduces the variation in the addition parameter for use by the LXM generator.

      Thus is makes sense to create a random seed, then shift it left as the counter increases in magnitude. Note that the shift to avoid overlap with the counter could be maintained using a separate shift counter. However if the initial random seed bits have many zeros in the low bits it may be possible to left shift this and have nothing to combine with a large value counter. This may lead to duplications of previous ((seed << shift) | count) values as the counter increases in magnitude. If there is always at least 1 bit to prepend to the counter then no duplications will occur as any larger counts that match a previous ((seed << shift) | count) will also have at least 1 bit prepended. This limits the seed addition to 63-bits of randomness.

      Since the lowest 1-bit is enabled for the LCG add parameter this leaves 62-bits of randomness to seed the addition parameter. A stream with more than 2^62 elements would not be able to ensure a unique addition parameter for each RNG in the stream. Such streams are unlikely to be used. A more realistic parallel stream size of less than 1024 generators would only consume 9 bits for the counter and the seed addition will contain at least 53 bits of randomness accounting for the two bits lost to the marker bit and the odd LCG add parameter.

      Code changes

      This functionality can be introduced in a new abstract base class for splittable generators that can split using a seed value. The seed value would be ensured to be unique within a stream (subject to the size limitations above). The current LXM generators extend either IntProvider or LongProvider. The LongProviders all use a 64 or 128-bit LCG and can thus use the entire seed value when splitting. The IntProvider is the L32X64Mix generator which uses a 32-bit LCG. This can only use 31-bits from the seed for the LCG add parameter. It limits the number of unique add parameters within a stream to 2^30 before duplications may be observed. This generator is splittable in the JDK. It is used as the default generator in JDK 17 for RandomGenerator.getDefault(). Supporting splitting in Commons RNG provides parity with the modern JDK random implementation.

      Attachments

        Activity

          People

            aherbert Alex Herbert
            aherbert Alex Herbert
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 0.5h
                0.5h