# Percentile calculation is very slow when input data are constants

## Details

• Type: Improvement
• Status: Closed
• Priority: Minor
• Resolution: Duplicate
• Affects Version/s: 3.0
• Fix Version/s:
• Labels:

## Description

I use the Percentile class to calculate quantile on a big array (10^6 entries). When I have to test the performance of my code, I notice that the calculation of quantile is at least 100x slower when my data are constants (10^6 of the same nomber). Maybe the Percentile calculation can be improved for this special case.

## Activity

Hide
Thomas Neidhart added a comment -

Hi Benoit,

could you please provide a (simple) test case to demonstrate the behaviour you describe?

Thanks,

Thomas

Show
Thomas Neidhart added a comment - Hi Benoit, could you please provide a (simple) test case to demonstrate the behaviour you describe? Thanks, Thomas
Hide
Benoit de Rancourt added a comment - - edited

Hello,

Here is a simple test case :

testPercentile.java
```/**
* Test the Percentile calculation
*/
public static void testPercentile() {

final double CONST_NUMBER = 18.;
final double PERCENT = 5.;
final int DISTRIBUTION_SIZE = (int) 1e5;

double[] distribution = new double[DISTRIBUTION_SIZE];
Percentile percentile = new Percentile(PERCENT);
Random random = new Random(System.nanoTime());

// filling the array with random number
for (int i = 0; i < distribution.length; i++) {
distribution[i] = random.nextDouble() * 100.;
}

System.out.println("Start the calculation with random array");
long begin = System.currentTimeMillis();
double result = percentile.evaluate(distribution);
long end = System.currentTimeMillis();
System.out.println("duration : " + (end - begin) + "ms");
System.out.println("result : " + result);

// filling the array with a constant number
for (int i = 0; i < distribution.length; i++) {
distribution[i] = CONST_NUMBER;
}

System.out.println("Start the calculation with constant array");
begin = System.currentTimeMillis();
result = percentile.evaluate(distribution);
end = System.currentTimeMillis();
System.out.println("duration : " + (end - begin) + "ms");
System.out.println("result : " + result);
}
```

Thanks,
Benoit.

Show
Benoit de Rancourt added a comment - - edited Hello, Here is a simple test case : testPercentile.java /** * Test the Percentile calculation */ public static void testPercentile() { final double CONST_NUMBER = 18.; final double PERCENT = 5.; final int DISTRIBUTION_SIZE = ( int ) 1e5; double [] distribution = new double [DISTRIBUTION_SIZE]; Percentile percentile = new Percentile(PERCENT); Random random = new Random( System .nanoTime()); // filling the array with random number for ( int i = 0; i < distribution.length; i++) { distribution[i] = random.nextDouble() * 100.; } System .out.println( "Start the calculation with random array" ); long begin = System .currentTimeMillis(); double result = percentile.evaluate(distribution); long end = System .currentTimeMillis(); System .out.println( "duration : " + (end - begin) + "ms" ); System .out.println( "result : " + result); // filling the array with a constant number for ( int i = 0; i < distribution.length; i++) { distribution[i] = CONST_NUMBER; } System .out.println( "Start the calculation with constant array" ); begin = System .currentTimeMillis(); result = percentile.evaluate(distribution); end = System .currentTimeMillis(); System .out.println( "duration : " + (end - begin) + "ms" ); System .out.println( "result : " + result); } Thanks, Benoit.
Hide
Thomas Neidhart added a comment -

The evaluation of the percentile uses a partition method to sort the data around a pivot element (element that are smaller are before the pivot element, and elements that are larger are behind it).

Now in case of an array with lots of equal values, this can lead to some kind of staircase situation, where the pivot element is just slowly increased from the bottom index and the full array of values has to be evaluated in each iteration. The problem is in the partition method of Percentile.java:

```    while (i < j) {
while ((i < j) && (work[j] >= value)) {
--j;
}
while ((i < j) && (work[i] <= value)) {
++i;
}
...
```

The comparisons here use >= and <= while it should be > and <.

Show
Thomas Neidhart added a comment - The evaluation of the percentile uses a partition method to sort the data around a pivot element (element that are smaller are before the pivot element, and elements that are larger are behind it). Now in case of an array with lots of equal values, this can lead to some kind of staircase situation, where the pivot element is just slowly increased from the bottom index and the full array of values has to be evaluated in each iteration. The problem is in the partition method of Percentile.java: while (i < j) { while ((i < j) && (work[j] >= value)) { --j; } while ((i < j) && (work[i] <= value)) { ++i; } ... The comparisons here use >= and <= while it should be > and <.
Hide
Thomas Neidhart added a comment -

Fixed in r1364318.

Show
Thomas Neidhart added a comment - Fixed in r1364318.
Hide
Thomas Neidhart added a comment -

Duplicate of MATH-578

Show
Thomas Neidhart added a comment - Duplicate of MATH-578

## People

• Assignee:
Thomas Neidhart
Reporter:
Benoit de Rancourt