Uploaded image for project: 'Commons Math'
  1. Commons Math
  2. MATH-100

[Math] HypergeometricDistributionImpl cumulativeProbability calculation overflown


    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
    • Environment:

      Operating System: Windows XP
      Platform: PC

    • Bugzilla Id:


      Hi, for all that might use this class:

      several things I found when using this class to calculate the
      cumulative probability. I attached my code FYI. three things:

      1. when I used my code to calculate the cumulativeProbability(50) of
      5000 200 100 (Population size, number of successes, sample size),
      result was greater than 1 (1.0000000000134985);
      2. when I calculated cumulativeProbability(50) and
      cumulativeProbability(51) for the distribution 5000 200 100, I got the
      same results, but it should have been different;
      2. the cumulativeProbability returns "for this distribution, X,
      P(X<=x)", but most of the time (at least in my case) what I care about
      is the upper tail (X>=x). based on the above findings, I can't simply
      use 1-cumulativeProbability(x-1) to get what I want.

      here's what I think might be related to the problem: since the
      cumulativeProbability is calculating the lower tail (X<=x), a
      distribution like above often has this probability very close to 1;
      thus it's difficult to record a number that = 1-1E-50 'cause all you
      can do is record sth like 0.9999..... and further digits will be
      rounded. to avoid this, I suggest adding a new function to calculate
      upper tail or change this to calculate x in a range like (n<=x<=m), in
      addition to fix the overflow of the current function.

      thank you for your patience to get here. I'm a newbie but I've asked
      Java experts in our lab about this. looking into the source code really
      isn't up for me......hope someone can fix it, BTW I'm using cygwin under
      WinXP pro SP2, with Java SDK 1.4.2_09 build b05, and the commons-math I used is
      both the 1.0 and the nightly build of 8-15-05.

      the code:
      import org.apache.commons.math.distribution.HypergeometricDistributionImpl;

      class HyperGeometricProbability {

      public static void main(String args[]) {

      if(args.length != 4)

      { System.out.println("USAGE: java HyperGeometricProbabilityCalc [population] [numsuccess] [sample] [overlap]"); }

      else {

      String population = args[0];
      String numsuccess = args[1];
      String sample = args[2];
      String overlap = args[3];

      int populationI = Integer.parseInt(population);
      int numsuccessI = Integer.parseInt(numsuccess);
      int sampleI = Integer.parseInt(sample);
      int overlapI = Integer.parseInt(overlap);

      HypergeometricDistributionImpl hDist = new
      HypergeometricDistributionImpl(populationI, numsuccessI, sampleI);

      double raw_probability = 1.0;
      double cumPro = 1.0;
      double real_cumPro = 1.0;

      try {
      if (0 < overlapI && 0 < numsuccessI && 0 < sampleI)

      { raw_probability = hDist.probability(overlapI); cumPro = hDist.cumulativeProbability(overlapI - 1.0); real_cumPro = 1.0 - cumPro; System.out.println("cumulative probability=" + cumPro + "\t" + "raw probability=" + raw_probability + "\t" + "real cumulative probability=" + "\t" + real_cumPro); }

      catch (Exception e)

      { System.err.println("BAD PROBABILITY CALCULATION"); }







            • Assignee:
              microarray@gmail.com mike hu
            • Votes:
              0 Vote for this issue
              0 Start watching this issue


              • Created: