Commons Math
  1. Commons Math
  2. MATH-826

RFE: low discrepancy sequence generator

    Details

    • Type: Wish Wish
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3
    • Labels:

      Description

      Please provide a low discrepancy sequence random number generator.

      http://en.wikipedia.org/wiki/Low-discrepancy_sequence

      Support for such a generator is poor in Java and existing implementations tend to be standalone and do not provide a user friendly API. Ideally, such a generator would implement Random, but using the Apache Commons Math random API would be acceptable.

      1. MATH-826.zip
        24 kB
        Thomas Neidhart

        Activity

        Hide
        Gilles added a comment -

        Thanks for the suggestion.
        You are warmly welcome to help bring this into Commons Math by providing an implementation.

        Show
        Gilles added a comment - Thanks for the suggestion. You are warmly welcome to help bring this into Commons Math by providing an implementation.
        Show
        Thomas Neidhart added a comment - - edited Some pointers to information on Sobol sequences: http://www.deltaquants.com/sobol-sequence.html http://web.maths.unsw.edu.au/~fkuo/sobol/joe-kuo-notes.pdf http://web.maths.unsw.edu.au/~fkuo/sobol/
        Hide
        Thomas Neidhart added a comment -

        First draft of a random generator based on a Sobol sequence.

        The user can select the dimension when creating the generator. Thus when a user is interested in 3d points, 3 generators are needed.

        The generator includes direction numbers from http://web.maths.unsw.edu.au/~fkuo/sobol/ for the first 100 dimensions (they are under BSD license), but custom ones can be provided.

        Does this make sense?

        Show
        Thomas Neidhart added a comment - First draft of a random generator based on a Sobol sequence. The user can select the dimension when creating the generator. Thus when a user is interested in 3d points, 3 generators are needed. The generator includes direction numbers from http://web.maths.unsw.edu.au/~fkuo/sobol/ for the first 100 dimensions (they are under BSD license), but custom ones can be provided. Does this make sense?
        Hide
        Luc Maisonobe added a comment -

        This seems very interesting.

        I have two remarks. The first one (less important) is that we could put the complete list of direction numbers in the original text file and upload it from within the jar as a resource in the assets directory.

        The second remark (more important) is that I don't think having Sobol sequences implement AbstractRandomGenerator is good. I would rather have it implement RandomVectorGenerator as they are clearly oriented towards multi-dimensional numbers generation.

        Show
        Luc Maisonobe added a comment - This seems very interesting. I have two remarks. The first one (less important) is that we could put the complete list of direction numbers in the original text file and upload it from within the jar as a resource in the assets directory. The second remark (more important) is that I don't think having Sobol sequences implement AbstractRandomGenerator is good. I would rather have it implement RandomVectorGenerator as they are clearly oriented towards multi-dimensional numbers generation.
        Hide
        Thomas Neidhart added a comment -

        Hmm, I totally missed the RandomVectorGenerator, which makes indeed much more sense.

        Regarding the assets: the big file with 21000 dimensions has ~1.8 MB, which is too much I guess. We should limit to a reasonable size and provide an option for users to load other data files.

        Show
        Thomas Neidhart added a comment - Hmm, I totally missed the RandomVectorGenerator, which makes indeed much more sense. Regarding the assets: the big file with 21000 dimensions has ~1.8 MB, which is too much I guess. We should limit to a reasonable size and provide an option for users to load other data files.
        Hide
        Luc Maisonobe added a comment -

        You are right, we could reduce the file to say dimension 1000 which is slightly above 60 kilobytes. Letting the user provide custom directions is a good idea.

        I also wonder if we should not use more bits for double generation. 32 bits seems small to me. In the BitStreamGenerator we use 52 bits as follows:

        public double nextDouble() {
            final long high = ((long) next(26)) << 26;
            final int  low  = next(26);
            return (high | low) * 0x1.0p-52d;
        }
        

        It seemed to me Sobol sequences can generate almost arbitrary number of bits. Could we set it up so it generates 52 bits and then multiply by the same constant 0x1.0p-52d?

        Another point is that there are two ways to generate the sequence. The fast one you used which is based on Gray code and hence is a simple recursive operation (x = x ^ direction[c]), and a direct one which can generate any number in the sequence from its index. We could perhaps propose another method skipTo(int index) that would use the longer method and reinitialize x from this point. Users could then use skipTo to go anywhere they want in the sequence, and then they would retrieve consecutive vectors starting from there. The use cases for such a method are:

        • drop the first 2^n points, as some people think it is better (but Joe Kuo does not seem to agree as shown in his notes)
        • allow user to generate different sequences, by changing the start point
        • implement a poor-man seed setting, the pseudo-seed being the start index
        Show
        Luc Maisonobe added a comment - You are right, we could reduce the file to say dimension 1000 which is slightly above 60 kilobytes. Letting the user provide custom directions is a good idea. I also wonder if we should not use more bits for double generation. 32 bits seems small to me. In the BitStreamGenerator we use 52 bits as follows: public double nextDouble() { final long high = (( long ) next(26)) << 26; final int low = next(26); return (high | low) * 0x1.0p-52d; } It seemed to me Sobol sequences can generate almost arbitrary number of bits. Could we set it up so it generates 52 bits and then multiply by the same constant 0x1.0p-52d? Another point is that there are two ways to generate the sequence. The fast one you used which is based on Gray code and hence is a simple recursive operation (x = x ^ direction [c] ), and a direct one which can generate any number in the sequence from its index. We could perhaps propose another method skipTo(int index) that would use the longer method and reinitialize x from this point. Users could then use skipTo to go anywhere they want in the sequence, and then they would retrieve consecutive vectors starting from there. The use cases for such a method are: drop the first 2^n points, as some people think it is better (but Joe Kuo does not seem to agree as shown in his notes) allow user to generate different sequences, by changing the start point implement a poor-man seed setting, the pseudo-seed being the start index
        Hide
        Thomas Neidhart added a comment -

        Yeah, that's a good idea, I will implement the changes and attach a new version.

        Show
        Thomas Neidhart added a comment - Yeah, that's a good idea, I will implement the changes and attach a new version.
        Hide
        Thomas Neidhart added a comment -

        Updated patch with suggested changes, unit tests and asset file containing the direction numbers for the first 1000 dimensions.

        Still tbd: provide a way for users to use custom direction numbers.

        Show
        Thomas Neidhart added a comment - Updated patch with suggested changes, unit tests and asset file containing the direction numbers for the first 1000 dimensions. Still tbd: provide a way for users to use custom direction numbers.
        Hide
        Luc Maisonobe added a comment -

        This seems good to me.
        +1 to commit.

        I just wonder if something like http://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightParallel should be used to find the rightmost significant bit instead of the explicit loop.

        Show
        Luc Maisonobe added a comment - This seems good to me. +1 to commit. I just wonder if something like http://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightParallel should be used to find the rightmost significant bit instead of the explicit loop.
        Hide
        Thomas Neidhart added a comment -

        Added in r1481558 together with a new constructor that allows users to provide their own input stream containing the direction numbers.

        Thats a very nice link, thanks for that! But if I understand the section about zeros on the right correctly, it only works for trailing zeros, but in this case we want to find the first zero from the right (there can be 1's before), so I am not sure if we can use these methods.

        Show
        Thomas Neidhart added a comment - Added in r1481558 together with a new constructor that allows users to provide their own input stream containing the direction numbers. Thats a very nice link, thanks for that! But if I understand the section about zeros on the right correctly, it only works for trailing zeros, but in this case we want to find the first zero from the right (there can be 1's before), so I am not sure if we can use these methods.
        Hide
        Thomas Neidhart added a comment -

        Resolving for now, if somebody knows how to improve the bit searching, please go ahead and commit directly.

        Show
        Thomas Neidhart added a comment - Resolving for now, if somebody knows how to improve the bit searching, please go ahead and commit directly.

          People

          • Assignee:
            Unassigned
            Reporter:
            Sam Halliday
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development