Uploaded image for project: 'Commons RNG'
  1. Commons RNG
  2. RNG-185

ArraySampler to have factory methods to sample from arrays

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.6
    • None
    • sample
    • None
    • Easy

    Description

      The ArraySampler currently offers shuffle support for arrays, similar to the ListSampler which shuffles a List.

      It does not offer an equivalent method to sample a subset from a list. The ListSampler API is:

       

      // Sample a List of size k from the input list
      public static <T> List<T> sample(UniformRandomProvider rng,
                                       List<T> collection,
                                       int k)

      The subset is chosen using a permutation from the PermutationSampler. This method is static and each invocation creates a new PermutationSampler. That class maintains an array of indices for all elements of the list. Thus repeat invocation must recreate this list.

       

      An improvement would be:

      • Return a Sampler<double[]>
      • Allow choice between a permutation (the order of the sample does matter) or a combination (the order of the sample does not matter)

      A suggested API would be:

       

      public static ObjectSampler<double[]> 
          permutationSampler(UniformRandomProvider rng,
                             double[] array,
                             int k)
      public static ObjectSampler<double[]>
          combinationSampler(UniformRandomProvider rng,
                             double[] array,
                             int k) 

      To implement this for all array types is a lot of repeat boiler plate code, and currently does not have a use case to merit its inclusion. Note that sampling of this type for any array can be performed using e.g.:

       

       

      final PermutationSampler s = new PermutationSampler(rng, array.length, k); 
      
      ObjectSampler<double[]> sampler = () -> {
          final int[] indices = s.sample();
          final double[] sample = new double[indices.length];
          for (int i = 0; i < sample.length; i++) {
              sample[i] = array[indices[i]];
          }
          return sample;
      };

      Note that one advantage of a direct implementation is that the indices array created by the PermutationSampler can be created as a subset of the input array using the same method. This removes generation of an int[] for each sample. This would be effectively extending the package-private method in SubsetSamplerUtils that performs a partial shuffle of an array to all array types:

      static int[] partialSample(int[] domain,
                                 int steps,
                                 UniformRandomProvider rng,
                                 boolean upper)

      That method is used by both the PermutationSampler and CombinationSampler to partially shuffle the indices. The choice to return the upper or lower half of the part-shuffled array is an optimisation for the CombinationSampler.

      This ticket is a placeholder for discussion on this type of functionality and possible use cases.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              aherbert Alex Herbert
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: