Commons Math
  1. Commons Math
  2. MATH-805

Percentile calculation is very slow when input data are constants

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Affects Version/s: 3.0
    • Fix Version/s: 3.1
    • Labels:

      Description

      I use the Percentile class to calculate quantile on a big array (10^6 entries). When I have to test the performance of my code, I notice that the calculation of quantile is at least 100x slower when my data are constants (10^6 of the same nomber). Maybe the Percentile calculation can be improved for this special case.

        Issue Links

          Activity

          Hide
          Thomas Neidhart added a comment -

          Duplicate of MATH-578

          Show
          Thomas Neidhart added a comment - Duplicate of MATH-578
          Hide
          Thomas Neidhart added a comment -

          Fixed in r1364318.

          Show
          Thomas Neidhart added a comment - Fixed in r1364318.
          Hide
          Thomas Neidhart added a comment -

          The evaluation of the percentile uses a partition method to sort the data around a pivot element (element that are smaller are before the pivot element, and elements that are larger are behind it).

          Now in case of an array with lots of equal values, this can lead to some kind of staircase situation, where the pivot element is just slowly increased from the bottom index and the full array of values has to be evaluated in each iteration. The problem is in the partition method of Percentile.java:

              while (i < j) {
                 while ((i < j) && (work[j] >= value)) {
                     --j;
                 }
                 while ((i < j) && (work[i] <= value)) {
                     ++i;
                 }
              ...
          

          The comparisons here use >= and <= while it should be > and <.

          Show
          Thomas Neidhart added a comment - The evaluation of the percentile uses a partition method to sort the data around a pivot element (element that are smaller are before the pivot element, and elements that are larger are behind it). Now in case of an array with lots of equal values, this can lead to some kind of staircase situation, where the pivot element is just slowly increased from the bottom index and the full array of values has to be evaluated in each iteration. The problem is in the partition method of Percentile.java: while (i < j) { while ((i < j) && (work[j] >= value)) { --j; } while ((i < j) && (work[i] <= value)) { ++i; } ... The comparisons here use >= and <= while it should be > and <.
          Hide
          Benoit de Rancourt added a comment - - edited

          Hello,

          Here is a simple test case :

          testPercentile.java
          /**
           * Test the Percentile calculation
           */
          public static void testPercentile() {
          	
          	final double CONST_NUMBER = 18.;
          	final double PERCENT = 5.;
          	final int DISTRIBUTION_SIZE = (int) 1e5;
          
          	double[] distribution = new double[DISTRIBUTION_SIZE];
          	Percentile percentile = new Percentile(PERCENT);
          	Random random = new Random(System.nanoTime());
          	
          	// filling the array with random number
          	for (int i = 0; i < distribution.length; i++) {
          		distribution[i] = random.nextDouble() * 100.;
          	}
          	
          	System.out.println("Start the calculation with random array");
          	long begin = System.currentTimeMillis();
          	double result = percentile.evaluate(distribution);
          	long end = System.currentTimeMillis();
          	System.out.println("duration : " + (end - begin) + "ms");
          	System.out.println("result : " + result);
          	
          	// filling the array with a constant number
          	for (int i = 0; i < distribution.length; i++) {
          		distribution[i] = CONST_NUMBER;
          	}
          	
          	System.out.println("Start the calculation with constant array");
          	begin = System.currentTimeMillis();
          	result = percentile.evaluate(distribution);
          	end = System.currentTimeMillis();
          	System.out.println("duration : " + (end - begin) + "ms");
          	System.out.println("result : " + result);
          }
          

          Thanks,
          Benoit.

          Show
          Benoit de Rancourt added a comment - - edited Hello, Here is a simple test case : testPercentile.java /** * Test the Percentile calculation */ public static void testPercentile() { final double CONST_NUMBER = 18.; final double PERCENT = 5.; final int DISTRIBUTION_SIZE = ( int ) 1e5; double [] distribution = new double [DISTRIBUTION_SIZE]; Percentile percentile = new Percentile(PERCENT); Random random = new Random( System .nanoTime()); // filling the array with random number for ( int i = 0; i < distribution.length; i++) { distribution[i] = random.nextDouble() * 100.; } System .out.println( "Start the calculation with random array" ); long begin = System .currentTimeMillis(); double result = percentile.evaluate(distribution); long end = System .currentTimeMillis(); System .out.println( "duration : " + (end - begin) + "ms" ); System .out.println( "result : " + result); // filling the array with a constant number for ( int i = 0; i < distribution.length; i++) { distribution[i] = CONST_NUMBER; } System .out.println( "Start the calculation with constant array" ); begin = System .currentTimeMillis(); result = percentile.evaluate(distribution); end = System .currentTimeMillis(); System .out.println( "duration : " + (end - begin) + "ms" ); System .out.println( "result : " + result); } Thanks, Benoit.
          Hide
          Thomas Neidhart added a comment -

          Hi Benoit,

          could you please provide a (simple) test case to demonstrate the behaviour you describe?

          Thanks,

          Thomas

          Show
          Thomas Neidhart added a comment - Hi Benoit, could you please provide a (simple) test case to demonstrate the behaviour you describe? Thanks, Thomas

            People

            • Assignee:
              Thomas Neidhart
              Reporter:
              Benoit de Rancourt
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development