Uploaded image for project: 'Commons Statistics'
  1. Commons Statistics
  2. STATISTICS-63

Port o.a.c.math.stat.ranking to a commons-statistics-ranking module

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Implemented
    • 1.0
    • 1.1
    • ranking
    • None
    • Easy

    Description

      The o.a.c.math4.legacy.stat.ranking package contains:

      NaNStrategy.java
      NaturalRanking.java
      RankingAlgorithm.java
      TiesStrategy.java

      There are no dependencies on other math packages.

      The TiesStrategy enum contains a RANDOM option:

      "Ties get a random integral value from among applicable ranks."

      I would suggest this is changed to

      "Ties get a randomly assigned unique value from among applicable ranks."

      This is a minor change. But it allows ties to always be distinguished, which seems to be the purpose of a tie strategy. The current implementation in math just picks a random number and so ties can be resolved by assigning the same rank to multiple points (thus not resolving anything).

      For example:

      [0, 1, 1, 1, 2]

      Can have an output of:

      [0, 1, 2, 3, 4]
      [0, 1, 1, 1, 4]
      [0, 3, 3, 3, 4]
      etc

      The suggested change would enumerate the ranks for the ties and then shuffle them. All ranks would then be unique:

      [0, 1, 2, 3, 4]
      [0, 1, 3, 2, 4]
      [0, 3, 2, 1, 4]
      etc

      A second issue with the ranking package is it brings in a dependency on UniformRandomProvider. If you do not supply one then an instance is created (which may not be needed).

      Given that we now support JDK 8 I suggest the default uses an instance of SplittableRandom. The user can override this by supplying a source of random bits as a LongSupplier. This can be used as a source of randomness for UniformRandomProvider from RNG. This is a functional interface and using the long bits it can create random rank indices as required. The package then does not expose non-JDK interfaces in its public API.

      Currently the NaturalRanking class has 6 constructors to set combinations for the three properties: TiesStrategy; NaNStragtegy; and source of randomness. Current API:

      public NaturalRanking()
      public NaturalRanking(TiesStrategy)
      public NaturalRanking(NaNStrategy)
      public NaturalRanking(NaNStrategy, TiesStrategy)
      public NaturalRanking(UniformRandomProvider)
      public NaturalRanking(NaNStrategy, UniformRandomProvider)

      The constructors that accept a TiesStrategy create a generator even though the TiesStrategy may not require it (i.e. is not RANDOM). The generator should be created on demand when ties occur in the data.

      Note: The set of constructors could be changed to a builder pattern. This would add builder object creation overhead for any new strategy. It also does not allow implicit setting of the TiesStrategy to RANDOM if a constructor with a source of randomness is used. An initial port can maintain the current 6 constructors. It can be changed before the first release.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              aherbert Alex Herbert
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: