Commons Math
  1. Commons Math
  2. MATH-692

Cumulative probability and inverse cumulative probability inconsistencies

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.0, 1.1, 1.2, 2.0, 2.1, 2.2, 3.0
    • Fix Version/s: 3.0
    • Labels:
      None

      Description

      There are some inconsistencies in the documentation and implementation of functions regarding cumulative probabilities and inverse cumulative probabilities. More precisely, '<' and '<=' are not used in a consistent way.

      Besides I would move the function inverseCumulativeProbability(double) to the interface Distribution. A true inverse of the distribution function does neither exist for Distribution nor for ContinuosDistribution. Thus we need to define the inverse in terms of quantiles anyway, and this can already be done for Distribution.

      On the whole I would declare the (inverse) cumulative probability functions in the basic distribution interfaces as follows:

      Distribution:

      • cumulativeProbability(double x): returns P(X <= x)
      • cumulativeProbability(double x0, double x1): returns P(x0 < X <= x1) [see also 1)]
      • inverseCumulativeProbability(double p):
        returns the quantile function inf {x in R | P(X<=x) >= p}

        [see also 2), 3), and 4)]

      1) An aternative definition could be P(x0 <= X <= x1). But this requires to put the function probability(double x) or another cumulative probability function into the interface Distribution in order be able to calculate P(x0 <= X <= x1) in AbstractDistribution.
      2) This definition is stricter than the definition in ContinuousDistribution, because the definition there does not specify what to do if there are multiple x satisfying P(X<=x) = p.
      3) A modification could be defined for p=0: Returning sup

      {x in R | P(X<=x) = 0}

      would yield the infimum of the distribution's support instead of a mandatory -infinity.
      4) This affects issue MATH-540. I'd prefere the definition from above for the following reasons:

      • This definition simplifies inverse transform sampling (as mentioned in the other issue).
      • It is the standard textbook definition for the quantile function.
      • For integer distributions it has the advantage that the result doesn't change when switching to "x in Z", i.e. the result is independent of considering the intergers as sole set or as part of the reals.

      ContinuousDistribution:
      nothing to be added regarding (inverse) cumulative probability functions

      IntegerDistribution:

      • cumulativeProbability(int x): returns P(X <= x)
      • cumulativeProbability(int x0, int x1): returns P(x0 < X <= x1) [see also 1) above]
      1. MATH-692_integerDomain_patch1.patch
        55 kB
        Christian Winter
      2. Math-692_realDomain_patch1.patch
        26 kB
        Christian Winter

        Activity

        Hide
        Sébastien Brisard added a comment -

        The issue about selection of an appropriate seed has been raised elsewhere. No definitive answer has been provided so far, so I suggest we consider this issue as solved for the time being.

        Show
        Sébastien Brisard added a comment - The issue about selection of an appropriate seed has been raised elsewhere. No definitive answer has been provided so far, so I suggest we consider this issue as solved for the time being.
        Hide
        Sébastien Brisard added a comment -

        Thus difficulties in finding a suitable seed are an indicator for problems in the code.

        That's exactly the point I've raised on the mailing-list: out of three seeds (100, 1000 and 1001), only one works. Of course, I would not dare to call that representative statistics, but I'm wondering whether or not we should be worried...

        Show
        Sébastien Brisard added a comment - Thus difficulties in finding a suitable seed are an indicator for problems in the code. That's exactly the point I've raised on the mailing-list: out of three seeds (100, 1000 and 1001), only one works. Of course, I would not dare to call that representative statistics, but I'm wondering whether or not we should be worried...
        Hide
        Christian Winter added a comment -

        I guess there is no alternative to this way of making probabilistic test cases pass. However, I understand your bad feeling with this kind of failure fixing. The problem is that probabilistic tests are quiet fuzzy: Neither a passed test nor a failed test provides a clear answer whether something is right or wrong in the implementation. There is just a high chance to pass such a test with a correct implementation. The chance for failure increases with an erroneous implementation due to systematic deviations in the generated data. These chances tell whether it is easy to find a seed which passes the tests or not. Thus difficulties in finding a suitable seed are an indicator for problems in the code.

        Show
        Christian Winter added a comment - I guess there is no alternative to this way of making probabilistic test cases pass. However, I understand your bad feeling with this kind of failure fixing. The problem is that probabilistic tests are quiet fuzzy: Neither a passed test nor a failed test provides a clear answer whether something is right or wrong in the implementation. There is just a high chance to pass such a test with a correct implementation. The chance for failure increases with an erroneous implementation due to systematic deviations in the generated data. These chances tell whether it is easy to find a seed which passes the tests or not. Thus difficulties in finding a suitable seed are an indicator for problems in the code.
        Hide
        Sébastien Brisard added a comment -

        I see you already changed the seed in Well1024aTest.

        Yes I did, but is this really how we want Well2004aTest to pass?

        Show
        Sébastien Brisard added a comment - I see you already changed the seed in Well1024aTest. Yes I did, but is this really how we want Well2004aTest to pass?
        Hide
        Christian Winter added a comment -

        Thanks for committing the patch, Sébastien. I see you already changed the seed in Well1024aTest. This hopefully removes the failure.

        I'll have a look into Maven to prepare a better patch next time.

        Show
        Christian Winter added a comment - Thanks for committing the patch, Sébastien. I see you already changed the seed in Well1024aTest . This hopefully removes the failure. I'll have a look into Maven to prepare a better patch next time.
        Hide
        Sébastien Brisard added a comment -

        The committed patch actually causes failure of Well1024Test in o.a.c.m.random.

        Show
        Sébastien Brisard added a comment - The committed patch actually causes failure of Well1024Test in o.a.c.m.random .
        Hide
        Sébastien Brisard added a comment -

        Well, we've recently run into some troubles with SVN, but it seems everything is working fine again. Patch MATH-692_integerDomain_patch1.patch (with minor checkstyle changes) committed in revision 1226041.

        Please do not forget to run mvn clean; mvn site:site and check the reports (in particular, checkstyle) prior to submitting a patch!

        Thanks for this contribution.

        Show
        Sébastien Brisard added a comment - Well, we've recently run into some troubles with SVN, but it seems everything is working fine again. Patch MATH-692 _integerDomain_patch1.patch (with minor checkstyle changes) committed in revision 1226041 . Please do not forget to run mvn clean; mvn site:site and check the reports (in particular, checkstyle ) prior to submitting a patch! Thanks for this contribution.
        Hide
        Sébastien Brisard added a comment -

        Hi Christian,
        thanks for this contribution. I am away for a few days, but am very happy to commit this patch as soon as I am back, if you are not in too much of a hurry.
        Thanks again,
        Sébastien

        Show
        Sébastien Brisard added a comment - Hi Christian, thanks for this contribution. I am away for a few days, but am very happy to commit this patch as soon as I am back, if you are not in too much of a hurry. Thanks again, Sébastien
        Hide
        Christian Winter added a comment -

        This is the patch which adjusts the integer distributions to the agreements above.

        The changes to the test cases for the random generators may be unexpected. But these changes initially were triggered by adjusting RandomDataTest.checkNextPoissonConsistency(double) to the new contract for integer distributions. Then some random generator tests failed due to chance. While adjusting their seeds, I found some other tests with a high failure probability. Thus I also set some failure probabilities to 0.01 in order to find suitable seeds more quickly.

        My next task on this issue is to adjust the user guid.

        Show
        Christian Winter added a comment - This is the patch which adjusts the integer distributions to the agreements above. The changes to the test cases for the random generators may be unexpected. But these changes initially were triggered by adjusting RandomDataTest.checkNextPoissonConsistency(double) to the new contract for integer distributions. Then some random generator tests failed due to chance. While adjusting their seeds, I found some other tests with a high failure probability. Thus I also set some failure probabilities to 0.01 in order to find suitable seeds more quickly. My next task on this issue is to adjust the user guid.
        Hide
        Christian Winter added a comment -

        Hi Sébastien,

        my changes in the integer distributions don't solve MATH-718. Instead I found a probably related problem with the Pascal distribution.

        The integer distribution patch for this issue still isn't ready. I will provide it next week.

        Christian

        Show
        Christian Winter added a comment - Hi Sébastien, my changes in the integer distributions don't solve MATH-718 . Instead I found a probably related problem with the Pascal distribution. The integer distribution patch for this issue still isn't ready. I will provide it next week. Christian
        Hide
        Sébastien Brisard added a comment -

        Hi Christian,
        If you agree with that, I suggest that you also take care of MATH-718, as the two issues seem to be very much connected.
        Sébastien

        Show
        Sébastien Brisard added a comment - Hi Christian, If you agree with that, I suggest that you also take care of MATH-718 , as the two issues seem to be very much connected. Sébastien
        Hide
        Christian Winter added a comment -

        As mentioned by Sébastien in MATH-699, the implementation of IntegerDistribution.inverseCumulativeProbability(double p) can benefit from the ideas which came up for RealDistribution.inverseCumulativeProbability(double p) in that thread.

        Thus I will remove getDomainLowerBound(double p) and getDomainUpperBound(double p) from the integer distributions. I checked that all current implementations of the lower/upper bound methods provide the whole support of the distribution as starting bracket. This means that using getSupportLowerBound() and getSupportUpperBound() for the starting bracket won't degrade the performance of the current distribution implementations. However, a user might want the improve the performance of his distribution implementations by providing a more targeted starting bracket for probability p. Thus I will swap the solving step to a protected function solveInverseCumulativeProbability(double p, int lower, int upper), so that it gets easy to override inverseCumulativeProbability with an implementation which finds a better starting bracket.

        Furthermore, Phil's idea with Chebyshev's inequality can be applied to the generic implementation of inverseCumulativeProbability in order to get a better starting bracket.

        Show
        Christian Winter added a comment - As mentioned by Sébastien in MATH-699 , the implementation of IntegerDistribution.inverseCumulativeProbability(double p) can benefit from the ideas which came up for RealDistribution.inverseCumulativeProbability(double p) in that thread. Thus I will remove getDomainLowerBound(double p) and getDomainUpperBound(double p) from the integer distributions. I checked that all current implementations of the lower/upper bound methods provide the whole support of the distribution as starting bracket. This means that using getSupportLowerBound() and getSupportUpperBound() for the starting bracket won't degrade the performance of the current distribution implementations. However, a user might want the improve the performance of his distribution implementations by providing a more targeted starting bracket for probability p . Thus I will swap the solving step to a protected function solveInverseCumulativeProbability(double p, int lower, int upper) , so that it gets easy to override inverseCumulativeProbability with an implementation which finds a better starting bracket. Furthermore, Phil's idea with Chebyshev's inequality can be applied to the generic implementation of inverseCumulativeProbability in order to get a better starting bracket.
        Hide
        Sébastien Brisard added a comment -

        Patch Math-692_realDomain_patch1.patch (20111108) applied in rev 1200179, with minor modifications (mostly checkstyle fixes).
        Thanks Christian!

        Show
        Sébastien Brisard added a comment - Patch Math-692_realDomain_patch1.patch (20111108) applied in rev 1200179, with minor modifications (mostly checkstyle fixes). Thanks Christian!
        Hide
        Phil Steitz added a comment -

        Hey, that's how it always works

        I don't know about "wise" but I certainly qualify as "old" by any standard, so will have a look once you have reviewed and committed.

        Thanks!

        Show
        Phil Steitz added a comment - Hey, that's how it always works I don't know about "wise" but I certainly qualify as "old" by any standard, so will have a look once you have reviewed and committed. Thanks!
        Hide
        Sébastien Brisard added a comment -

        All,
        since I'm already working on this package, I'm happy to commit the patch on behalf of Christian. However, since I'm a relatively new committer, I would feel more confident if one of the "old, wise committers" could double check the svn log afterwards.

        Best,
        Sébastien

        Show
        Sébastien Brisard added a comment - All, since I'm already working on this package, I'm happy to commit the patch on behalf of Christian. However, since I'm a relatively new committer, I would feel more confident if one of the "old, wise committers" could double check the svn log afterwards. Best, Sébastien
        Hide
        Christian Winter added a comment -

        Here is the first patch for this issue (unfortunately with some delay). It adjusts the distributions with real domain to the definitions in this issue, and it mainly changes documentations.

        I could not move inverseCumulativeProbability(double) up to Distribution because there would be a conflict with IntegerDistribution.inverseCumulativeProbability(double): This method returns int. This problem will be removed by solving issue MATH-703.

        The implementation of inverseCumulativeProbability(double) is not changed as Sébastien is working on this.

        I will provide the patch for the integer distributions as soon as I have adjusted the test data to the new inequalities and reverified the adjusted test data.

        Show
        Christian Winter added a comment - Here is the first patch for this issue (unfortunately with some delay). It adjusts the distributions with real domain to the definitions in this issue, and it mainly changes documentations. I could not move inverseCumulativeProbability(double) up to Distribution because there would be a conflict with IntegerDistribution.inverseCumulativeProbability(double): This method returns int. This problem will be removed by solving issue MATH-703 . The implementation of inverseCumulativeProbability(double) is not changed as Sébastien is working on this. I will provide the patch for the integer distributions as soon as I have adjusted the test data to the new inequalities and reverified the adjusted test data.
        Hide
        Sébastien Brisard added a comment -

        Please note that MATH-699 has been created specifically to handle plateaux.

        Sébastien

        Show
        Sébastien Brisard added a comment - Please note that MATH-699 has been created specifically to handle plateaux. Sébastien
        Hide
        Sébastien Brisard added a comment -

        Hi Christian,

        Hi Sébastien,

        the problem with the plateau is indeed one issue which needs to be solved.

        I'm working on it...

        Additionally, AbstractDistribution will need an implementation of inverseCumulativeProbability. In fact both implementations should be the same except for the solver to be used. Thus inverseCumulativeProbability should be implemented just once in AbstractDistribution, and invoking the solver should be put to a separate procedure so that it can be overridden in AbstractContinuousDistribution.

        OK, for now, I'm concentrating on making the current impl in AbstractContinuousDistribution more robust. The other impl should be easier.

        A third point is the choice of the solvers. For AbstractDistribution we need a solver which works even for discontinuous cdfs (BisectionSolver can do the job, but maybe the implementations of the faster IllinoisSolver, PegasusSolver, BrentSolver, or another solver can cope with discontinuities, too). For AbstractContinuousDistribution it would be beneficial to use a DifferentiableUnivariateRealSolver. However, the NewtonSolver cannot be used due to uncertainty of convergence and an alternative doesn't seem to exist by now. So we have to choose one of the other solvers for now.

        The current implementation uses a Brent solver. I think the solver itself is only one side of the issue. The other point is the algorithm used to bracket the solution, in order to ensure that the result is consistent with the definition of the cumprob. As for the DifferentiableUnivariateRealSolver, I'm not too sure. I guess it depends on what is meant by "continuous distribution". For me, it means that the random variable takes values in a continuous set, and possibly its distribution is defined by a density. However, in my view, nothing prevents occurences of Dirac functions, in which case the cum sum is only piecewise C1. It's all a matter of definition, of course, and I'll ask the question on the forum to check whether or not people want to allow for such a situation.

        As all these points are interdependent, I guess it's best to solve them as a whole. If you like, you can do this.

        Best Regards,
        Christian

        Yes, I'm very interested.

        Best regards,
        Sébastien

        Show
        Sébastien Brisard added a comment - Hi Christian, Hi Sébastien, the problem with the plateau is indeed one issue which needs to be solved. I'm working on it... Additionally, AbstractDistribution will need an implementation of inverseCumulativeProbability. In fact both implementations should be the same except for the solver to be used. Thus inverseCumulativeProbability should be implemented just once in AbstractDistribution, and invoking the solver should be put to a separate procedure so that it can be overridden in AbstractContinuousDistribution. OK, for now, I'm concentrating on making the current impl in AbstractContinuousDistribution more robust. The other impl should be easier. A third point is the choice of the solvers. For AbstractDistribution we need a solver which works even for discontinuous cdfs (BisectionSolver can do the job, but maybe the implementations of the faster IllinoisSolver, PegasusSolver, BrentSolver, or another solver can cope with discontinuities, too). For AbstractContinuousDistribution it would be beneficial to use a DifferentiableUnivariateRealSolver. However, the NewtonSolver cannot be used due to uncertainty of convergence and an alternative doesn't seem to exist by now. So we have to choose one of the other solvers for now. The current implementation uses a Brent solver. I think the solver itself is only one side of the issue. The other point is the algorithm used to bracket the solution, in order to ensure that the result is consistent with the definition of the cumprob. As for the DifferentiableUnivariateRealSolver , I'm not too sure. I guess it depends on what is meant by "continuous distribution". For me, it means that the random variable takes values in a continuous set, and possibly its distribution is defined by a density. However, in my view, nothing prevents occurences of Dirac functions, in which case the cum sum is only piecewise C1. It's all a matter of definition, of course, and I'll ask the question on the forum to check whether or not people want to allow for such a situation. As all these points are interdependent, I guess it's best to solve them as a whole. If you like, you can do this. Best Regards, Christian Yes, I'm very interested. Best regards, Sébastien
        Hide
        Christian Winter added a comment -

        Another point for discussion:
        I'd like to introduce
        getDomainBracket(double p): returns double[]
        to AbstractDistribution as helper function for inverseCumulativeProbability. This allows to avoid searching a bracket where a bracket can be specified directly.
        The function getDomainBracket could be made abstract (which means to remove getInitialDomain, getDomainLowerBound, and getDomainUpperBound as these functions aren't needed any more), or it could have a default implementation (according to the corresponding part of the current implementation of inverseCumulativeProbability) which uses getInitialDomain, getDomainLowerBound, and getDomainUpperBound. However, getInitialDomain, getDomainLowerBound, and getDomainUpperBound should not be abstract in the latter case. Otherwise a derived class would be forced to implement something it potentially doesn't use. Thus the functions getInitialDomain, getDomainLowerBound, and getDomainUpperBound should have default implementations which either return default values (0, -infinity, +infinity) or throw an exception saying something like "has to be implemented".

        Show
        Christian Winter added a comment - Another point for discussion: I'd like to introduce getDomainBracket(double p): returns double[] to AbstractDistribution as helper function for inverseCumulativeProbability. This allows to avoid searching a bracket where a bracket can be specified directly. The function getDomainBracket could be made abstract (which means to remove getInitialDomain, getDomainLowerBound, and getDomainUpperBound as these functions aren't needed any more), or it could have a default implementation (according to the corresponding part of the current implementation of inverseCumulativeProbability) which uses getInitialDomain, getDomainLowerBound, and getDomainUpperBound. However, getInitialDomain, getDomainLowerBound, and getDomainUpperBound should not be abstract in the latter case. Otherwise a derived class would be forced to implement something it potentially doesn't use. Thus the functions getInitialDomain, getDomainLowerBound, and getDomainUpperBound should have default implementations which either return default values (0, -infinity, +infinity) or throw an exception saying something like "has to be implemented".
        Hide
        Christian Winter added a comment -

        Hi Sébastien,

        the problem with the plateau is indeed one issue which needs to be solved.

        Additionally, AbstractDistribution will need an implementation of inverseCumulativeProbability. In fact both implementations should be the same except for the solver to be used. Thus inverseCumulativeProbability should be implemented just once in AbstractDistribution, and invoking the solver should be put to a separate procedure so that it can be overridden in AbstractContinuousDistribution.

        A third point is the choice of the solvers. For AbstractDistribution we need a solver which works even for discontinuous cdfs (BisectionSolver can do the job, but maybe the implementations of the faster IllinoisSolver, PegasusSolver, BrentSolver, or another solver can cope with discontinuities, too). For AbstractContinuousDistribution it would be beneficial to use a DifferentiableUnivariateRealSolver. However, the NewtonSolver cannot be used due to uncertainty of convergence and an alternative doesn't seem to exist by now. So we have to choose one of the other solvers for now.

        As all these points are interdependent, I guess it's best to solve them as a whole. If you like, you can do this.

        Best Regards,
        Christian

        Show
        Christian Winter added a comment - Hi Sébastien, the problem with the plateau is indeed one issue which needs to be solved. Additionally, AbstractDistribution will need an implementation of inverseCumulativeProbability. In fact both implementations should be the same except for the solver to be used. Thus inverseCumulativeProbability should be implemented just once in AbstractDistribution, and invoking the solver should be put to a separate procedure so that it can be overridden in AbstractContinuousDistribution. A third point is the choice of the solvers. For AbstractDistribution we need a solver which works even for discontinuous cdfs (BisectionSolver can do the job, but maybe the implementations of the faster IllinoisSolver, PegasusSolver, BrentSolver, or another solver can cope with discontinuities, too). For AbstractContinuousDistribution it would be beneficial to use a DifferentiableUnivariateRealSolver. However, the NewtonSolver cannot be used due to uncertainty of convergence and an alternative doesn't seem to exist by now. So we have to choose one of the other solvers for now. As all these points are interdependent, I guess it's best to solve them as a whole. If you like, you can do this. Best Regards, Christian
        Hide
        Sébastien Brisard added a comment -

        Hi Christian,
        I've started looking into this issue. As I said, you will have to be patient with me .
        I can see there already is a default implementation of AbstractContinuousDistribution.inverseCumulativeProbability. So what exactly would you like me to do? Is this implementation fragile? Would you like me to improve robustness? Provide full testing?

        I think there might be issues when the PDF falls down to zero in a range (in which case the cum exhibits a plateau). The returned value might differ from the mathematical definition you proposed. Is this what you want me to work on? Have you already identified other issues?

        Best regards,
        Sébastien

        Show
        Sébastien Brisard added a comment - Hi Christian, I've started looking into this issue. As I said, you will have to be patient with me . I can see there already is a default implementation of AbstractContinuousDistribution.inverseCumulativeProbability . So what exactly would you like me to do? Is this implementation fragile? Would you like me to improve robustness? Provide full testing? I think there might be issues when the PDF falls down to zero in a range (in which case the cum exhibits a plateau). The returned value might differ from the mathematical definition you proposed. Is this what you want me to work on? Have you already identified other issues? Best regards, Sébastien
        Hide
        Phil Steitz added a comment -

        I guess I am OK with pushing p up. See related post to follow in commons-dev.

        Show
        Phil Steitz added a comment - I guess I am OK with pushing p up. See related post to follow in commons-dev.
        Hide
        Christian Winter added a comment -

        Good, the definition of cum(x0,x1) will be P(x0 < X <= x1). Phil, you are right: cum(x0,x1) in AbstractDistribution was a reason for raising this issue. Another reason was cum(int x0, int x1) in AbstractIntegerDistribution.

        The idea behind probability(double x) is in fact to define it as p(X = x) and to return 0 for continuous distributions. This function would be useful for discrete distributions not inheriting from IntergerDistribution and for distributions being composed of discrete and continuous parts.

        Show
        Christian Winter added a comment - Good, the definition of cum(x0,x1) will be P(x0 < X <= x1). Phil, you are right: cum(x0,x1) in AbstractDistribution was a reason for raising this issue. Another reason was cum(int x0, int x1) in AbstractIntegerDistribution. The idea behind probability(double x) is in fact to define it as p(X = x) and to return 0 for continuous distributions. This function would be useful for discrete distributions not inheriting from IntergerDistribution and for distributions being composed of discrete and continuous parts.
        Hide
        Sébastien Brisard added a comment -

        OK, Phil, it makes perfect sense.

        Show
        Sébastien Brisard added a comment - OK, Phil, it makes perfect sense.
        Hide
        Phil Steitz added a comment - - edited

        Have a look at the default implementation of cum(x0,x1) now in AbstractDistribution. I think the incorrectness in the documentation there may have been what triggered Christian to raise this issue. The equation cum(a,b) = F(b) - F(a) where F is the distribution function is natural and what the impl there is trying to do. In the discrete case, this equation fails, however, unless you define the cum to exclude the lower endpoint. That's why P(x0 < X <= x1) is a better definition.

        Show
        Phil Steitz added a comment - - edited Have a look at the default implementation of cum(x0,x1) now in AbstractDistribution. I think the incorrectness in the documentation there may have been what triggered Christian to raise this issue. The equation cum(a,b) = F(b) - F(a) where F is the distribution function is natural and what the impl there is trying to do. In the discrete case, this equation fails, however, unless you define the cum to exclude the lower endpoint. That's why P(x0 < X <= x1) is a better definition.
        Hide
        Sébastien Brisard added a comment -

        Happy to help on the inverse cumulative probability. You will have to be patient and forgieving with me, though, as I discover this part of CM.

        As for the definition, I think that one of the bounds should be excluded, so that these cumulative probabilities can be summed
        P(a < X <= c) = P(a < X <= b) + P(b < X <= c),
        even in the case of discrete PDFs.

        Whether the lower or upper bound should be excluded is another matter. I usually work with continuous pdfs, so I don't know if there is a common practice in the probability community. If there is none, I would tend to chose the following definition
        P(x0 <= X < x1)
        (sorry Phil!), because it would be consistent with the way things are usually indexed in java (a[0].. a[a.length-1]). See also org.apache.commons.math.util.MultidimensionalCounter. Although this type of consistency is not an absolute requirement, I think it is nice for the user to have such simple principle: "lower bound always included, upper bound always excluded". Appart from this small point, I really have no objection to any choice.

        Show
        Sébastien Brisard added a comment - Happy to help on the inverse cumulative probability. You will have to be patient and forgieving with me, though, as I discover this part of CM. As for the definition, I think that one of the bounds should be excluded, so that these cumulative probabilities can be summed P(a < X <= c) = P(a < X <= b) + P(b < X <= c), even in the case of discrete PDFs. Whether the lower or upper bound should be excluded is another matter. I usually work with continuous pdfs, so I don't know if there is a common practice in the probability community. If there is none, I would tend to chose the following definition P(x0 <= X < x1) (sorry Phil!), because it would be consistent with the way things are usually indexed in java (a [0] .. a [a.length-1] ). See also org.apache.commons.math.util.MultidimensionalCounter . Although this type of consistency is not an absolute requirement, I think it is nice for the user to have such simple principle: "lower bound always included, upper bound always excluded". Appart from this small point, I really have no objection to any choice.
        Hide
        Phil Steitz added a comment -

        I am not sure it is really makes sense to add probability(double x) to the Distribution interface. It would have to be defined as density (referring to the distribution function) to make sense in the continuous case, since defined as p(X = x) it would in most cases be identically 0 for continuous distributions.

        Regarding the cum definition, I am fine with P(x0 < X <= x1).

        Show
        Phil Steitz added a comment - I am not sure it is really makes sense to add probability(double x) to the Distribution interface. It would have to be defined as density (referring to the distribution function) to make sense in the continuous case, since defined as p(X = x) it would in most cases be identically 0 for continuous distributions. Regarding the cum definition, I am fine with P(x0 < X <= x1).
        Hide
        Christian Winter added a comment -

        Thanks for the feedback to all. Sébastien, thanks for offering your help. If you like and find time for it, you could implement AbstractDistribution.inverseCumulativeProbability(double p).

        I will provide some patches next week, but adjusting AbstractContinuousDistribution.inverseCumulativeProbability(double p) will take some more time.

        After thinking a little more about the structure of the interfaces, I'd like to put the function probability(double x) to Distribution anyway (independently of the thought in point 1) above).

        Are there any preferences on P(x0 <= X <= x1) or P(x0 < X <= x1) for cumulativeProbability(double x0, double x1)?

        Show
        Christian Winter added a comment - Thanks for the feedback to all. Sébastien, thanks for offering your help. If you like and find time for it, you could implement AbstractDistribution.inverseCumulativeProbability(double p). I will provide some patches next week, but adjusting AbstractContinuousDistribution.inverseCumulativeProbability(double p) will take some more time. After thinking a little more about the structure of the interfaces, I'd like to put the function probability(double x) to Distribution anyway (independently of the thought in point 1) above). Are there any preferences on P(x0 <= X <= x1) or P(x0 < X <= x1) for cumulativeProbability(double x0, double x1)?
        Hide
        Mikkel Meyer Andersen added a comment -

        +1

        Show
        Mikkel Meyer Andersen added a comment - +1
        Hide
        Sébastien Brisard added a comment -

        I have neither used nor developed this part of CM, so my view on this is of but little value. Having said that, anything improving consistency can only be desirable, especially at this stage. So I'm all for it, and will be soon available (when I'm done on SYMMLQ) for an (novice on these issues) help.

        Sébastien

        Show
        Sébastien Brisard added a comment - I have neither used nor developed this part of CM, so my view on this is of but little value. Having said that, anything improving consistency can only be desirable, especially at this stage. So I'm all for it, and will be soon available (when I'm done on SYMMLQ) for an (novice on these issues) help. Sébastien
        Hide
        Phil Steitz added a comment -

        Thanks for raising this issue, Christian - especially now as we finalize the 3.0 API.

        I am +1 for these changes. I agree that the inf-based definition of inverse cum is more standard and we are in a position now make the change, so I say lets do it. I am also +1 on the move of this up to the distribution interface. The reason we did not include it there originally was that we thought we might implement distributions for which we could not define inverses. That has not happened in the last 8 years, so I think its safe enough to push it up.

        The code, test, user guide and doc changes for this have to be done carefully. Patches most welcome.

        Is everyone else OK with this change?

        Show
        Phil Steitz added a comment - Thanks for raising this issue, Christian - especially now as we finalize the 3.0 API. I am +1 for these changes. I agree that the inf-based definition of inverse cum is more standard and we are in a position now make the change, so I say lets do it. I am also +1 on the move of this up to the distribution interface. The reason we did not include it there originally was that we thought we might implement distributions for which we could not define inverses. That has not happened in the last 8 years, so I think its safe enough to push it up. The code, test, user guide and doc changes for this have to be done carefully. Patches most welcome. Is everyone else OK with this change?

          People

          • Assignee:
            Unassigned
            Reporter:
            Christian Winter
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development