Uploaded image for project: 'Commons Math'
  1. Commons Math
  2. MATH-984

Incorrect (bugged) generating function getNextValue() in .random.EmpiricalDistribution

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.2, 3.1.1
    • 3.4
    • None
    • None
    • all

    Description

      The generating function getNextValue() in org.apache.commons.math3.random.EmpiricalDistribution
      will generate wrong values for all Distributions that are single tailed or limited. For example Data which are resembling Exponential or Lognormal distributions.

      The problem could be easily seen in code and tested.

      In last version code
      ...
      490 return getKernel(stats).sample();
      ...
      it samples from Gaussian distribution to "smooth" in_the_bin. Obviously Gaussian Distribution is not limited and sometimes it does generates numbers outside the bin. In the case when it is the last bin it will generate wrong numbers.

      For example for empirical non-negative data it will generate negative rubbish.

      Additionally the proposed algorithm boldly returns only the mean value of the bin in case of one value! This last makes the generating function unusable for heavy tailed distributions with small number of values. (for example computer network traffic)

      On the last place usage of Gaussian soothing in the bin will change greatly some empirical distribution properties.

      The proposed method should be reworked to be applicable for real data which have often limited ranges. (either non-negative or both sides limited)

      Attachments

        Activity

          People

            psteitz Phil Steitz
            rtsvet Radoslav Tsvetkov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: