Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Percentile class is now using Hoare selection algorithm (aka Quickselect) with median of 3 for choosing a pivot and caching pivots. Quickselect expected runtime is about 3.4N + O(N).
The constant can be improved to 1.5N by different pivot strategy involving sampling, yielding the Floyd–Rivest algorithm, which has average complexity of 1.5N + O(N) for median, with other position statistics even faster.
For proof of concept I implemented Floyd–Rivest algorithm without caching pivots following Wikipedia and benchmarked it with Caliper.
Array size - runtime for Floyd-Rivest - runtime for current Percentile - % faster
100000 - 6.907 - 7.083 - 2.5% link
1000000 - 70.3 - 75.4 - 6.8% link
10000000 - 708 - 868 - 18.4% link
In attachments:
Main.java should be run to repeat experiments, it needs Caliper JAR: https://code.google.com/p/caliper/downloads/detail?name=caliper-1.0-beta-1-all.jar
Every call to evaluate() I'm filling the input double[] array with different random numbers.
FloydRivest.java contatins implementation of algorithm basing on pseudocode from its Wikipedia page.
PercentileFloydRivest.java is Percentile.java with modified evaluate(), so it calls FloydRivest.select() instead of typical select(). pivotsHeap was removed for simplicity (maybe it can improve efficiency even more).
Attachments
Attachments
Issue Links
- relates to
-
STATISTICS-18 Port Percentile class from Commons-Maths
- Reopened