Commons Math
  1. Commons Math
  2. MATH-1065

EnumeratedRealDistribution.inverseCumulativeProbability returns values not in the samples set

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.2
    • Fix Version/s: 3.3
    • Labels:
      None

      Description

      The method EnumeratedRealDistribution.inverseCumulativeProbability() sometimes returns values that are not in the initial samples domain...
      I will attach a test to exploit this bug.

        Activity

        Hide
        matteodg added a comment -

        Failing tests for numerical issues are:

        • testFailing1
        • testFailing2
        • testFailing3
        • testFailing4

        Failing tests with very wrong results are:

        • testFailing5
        • testFailing6
        Show
        matteodg added a comment - Failing tests for numerical issues are: testFailing1 testFailing2 testFailing3 testFailing4 Failing tests with very wrong results are: testFailing5 testFailing6
        Hide
        Thomas Neidhart added a comment -

        Hi Matteo,

        regarding the numerical issues:

        the distribution works internally with an absolute accuracy of 1e-6. The results should be compared with an epsilon value like this.

        There is currently no easy way to set another accuracy, but you can instantiate an EnumeratedRealDistribution like this, and override the getSolverAbsoluteAccuracy method:

                DISTRIBUTION = new EnumeratedRealDistribution(new double[]{
                    14.0, 18.0, 21.0, 28.0, 31.0, 33.0
                }, new double[]{
                    4.0 / 16.0, 5.0 / 16.0, 0.0 / 16.0, 3.0 / 16.0, 1.0 / 16.0, 3.0 / 16.0
                }) {
        
                    @Override
                    protected double getSolverAbsoluteAccuracy() {
                        return 1e-9;
                    }
                    
                };
        
        Show
        Thomas Neidhart added a comment - Hi Matteo, regarding the numerical issues: the distribution works internally with an absolute accuracy of 1e-6. The results should be compared with an epsilon value like this. There is currently no easy way to set another accuracy, but you can instantiate an EnumeratedRealDistribution like this, and override the getSolverAbsoluteAccuracy method: DISTRIBUTION = new EnumeratedRealDistribution(new double[]{ 14.0, 18.0, 21.0, 28.0, 31.0, 33.0 }, new double[]{ 4.0 / 16.0, 5.0 / 16.0, 0.0 / 16.0, 3.0 / 16.0, 1.0 / 16.0, 3.0 / 16.0 }) { @Override protected double getSolverAbsoluteAccuracy() { return 1e-9; } };
        Hide
        Thomas Neidhart added a comment -

        For the other failing tests: I am not sure what the enumerated distribution tries to achieve:

        • a discrete probability distribution
        • a distribution with continuous and discrete parts

        in case of a discrete probability distribution the inverseCumulativeProbability method is clearly wrong, and would have to fixed like this:

            public double inverseCumulativeProbability(final double p) throws OutOfRangeException {
                double probability = 0;
                double x = getSupportLowerBound();
                for (final Pair<Double, Double> sample : innerDistribution.getPmf()) {
                    if (sample.getValue() == 0.0) {
                        continue;
                    }
                    probability += sample.getValue();
                    if (probability <= p) {
                        x = sample.getKey();
                    }
        
                    if (probability >= p) {
                        break;
                    }
                }
        
                return x;
            }
        

        But then your other test cases are wrong imho:

          inverseCumulativeProbability(0.5) = 14 instead of 18
          inverseCumulativeProbability(0.5624) = 14 instead of 18
          inverseCumulativeProbability(0.5626) = 18 instead of 28
          inverseCumulativeProbability(0.7600) = 28 instead of 31
        
        Show
        Thomas Neidhart added a comment - For the other failing tests: I am not sure what the enumerated distribution tries to achieve: a discrete probability distribution a distribution with continuous and discrete parts in case of a discrete probability distribution the inverseCumulativeProbability method is clearly wrong, and would have to fixed like this: public double inverseCumulativeProbability(final double p) throws OutOfRangeException { double probability = 0; double x = getSupportLowerBound(); for (final Pair<Double, Double> sample : innerDistribution.getPmf()) { if (sample.getValue() == 0.0) { continue; } probability += sample.getValue(); if (probability <= p) { x = sample.getKey(); } if (probability >= p) { break; } } return x; } But then your other test cases are wrong imho: inverseCumulativeProbability(0.5) = 14 instead of 18 inverseCumulativeProbability(0.5624) = 14 instead of 18 inverseCumulativeProbability(0.5626) = 18 instead of 28 inverseCumulativeProbability(0.7600) = 28 instead of 31
        Hide
        Phil Steitz added a comment - - edited

        I think the intent of this class is to represent a discrete, but real-valued distribution. EnumeratedDistributions are discrete distributions with a finite, "enumerated" set of values. The problem here (as Thomas notes) is that EnumeratedRealDistribution extends AbstractRealDistribution and does not override the default inverse cum method that basically assumes a continuous distribution. What it should implement is a discrete algorithm like what Thomas has suggested.

        Show
        Phil Steitz added a comment - - edited I think the intent of this class is to represent a discrete, but real-valued distribution. EnumeratedDistributions are discrete distributions with a finite, "enumerated" set of values. The problem here (as Thomas notes) is that EnumeratedRealDistribution extends AbstractRealDistribution and does not override the default inverse cum method that basically assumes a continuous distribution. What it should implement is a discrete algorithm like what Thomas has suggested.
        Hide
        Thomas Neidhart added a comment -

        Thanks for the better explanation of the problem.
        I am still searching for a mathematical definition of such a distribution as I mainly looked at the referenced wikipedia page and derived the above algorithm from the cdf graph that is available there, although I am of course unsure if it is correct.

        Show
        Thomas Neidhart added a comment - Thanks for the better explanation of the problem. I am still searching for a mathematical definition of such a distribution as I mainly looked at the referenced wikipedia page and derived the above algorithm from the cdf graph that is available there, although I am of course unsure if it is correct.
        Hide
        Phil Steitz added a comment - - edited

        By contract in RealDistribution (same actually in IntegerDistribution), what the inverse cum needs to return is

         
        inf{x in R | P(X<=x) >= p} for 0 < p <= 1}
        inf{x in R | P(X<=x) > 0} for p = 0.
        

        Looks to me like your algorithm returns

         sup{x in R | P(X <= x) < p} 

        which is not the same. The key is to be consistent with the way we have defined the inverse cum for both discrete ("Integer") and continuous ("Real") distributions.

        Show
        Phil Steitz added a comment - - edited By contract in RealDistribution (same actually in IntegerDistribution), what the inverse cum needs to return is inf{x in R | P(X<=x) >= p} for 0 < p <= 1} inf{x in R | P(X<=x) > 0} for p = 0. Looks to me like your algorithm returns sup{x in R | P(X <= x) < p} which is not the same. The key is to be consistent with the way we have defined the inverse cum for both discrete ("Integer") and continuous ("Real") distributions.
        Hide
        Thomas Neidhart added a comment -

        Hi Phil,

        thanks for looking into this. I updated the code to this:

            public double inverseCumulativeProbability(final double p) throws OutOfRangeException {
        
                if (p < 0.0 || p > 1.0) {
                    throw new OutOfRangeException(p, 0, 1);
                }
        
                double probability = 0;
                double x = getSupportLowerBound();
                for (final Pair<Double, Double> sample : innerDistribution.getPmf()) {
                    if (sample.getValue() == 0.0) {
                        continue;
                    }
                    probability += sample.getValue();
                    x = sample.getKey();
        
                    if (probability >= p) {
                        break;
                    }
                }
        
                return x;
            }
        

        Which returns the same results as the EnumeratedIntegerDistribution for the same input values and also succeeds in running all the test cases from the attached unit test.

        Show
        Thomas Neidhart added a comment - Hi Phil, thanks for looking into this. I updated the code to this: public double inverseCumulativeProbability(final double p) throws OutOfRangeException { if (p < 0.0 || p > 1.0) { throw new OutOfRangeException(p, 0, 1); } double probability = 0; double x = getSupportLowerBound(); for (final Pair<Double, Double> sample : innerDistribution.getPmf()) { if (sample.getValue() == 0.0) { continue; } probability += sample.getValue(); x = sample.getKey(); if (probability >= p) { break; } } return x; } Which returns the same results as the EnumeratedIntegerDistribution for the same input values and also succeeds in running all the test cases from the attached unit test.
        Hide
        Phil Steitz added a comment -

        Code and tests look correct to me.

        Show
        Phil Steitz added a comment - Code and tests look correct to me.
        Hide
        Thomas Neidhart added a comment -

        Applied changes in r1566274.

        Thanks for the report and the testcase!

        Show
        Thomas Neidhart added a comment - Applied changes in r1566274. Thanks for the report and the testcase!
        Hide
        Luc Maisonobe added a comment -

        Closing all resolved issue now available in released 3.3 version.

        Show
        Luc Maisonobe added a comment - Closing all resolved issue now available in released 3.3 version.

          People

          • Assignee:
            Unassigned
            Reporter:
            matteodg
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development