Commons Math
  1. Commons Math
  2. MATH-578

Decrease DescriptiveStatistics performance from 2.0 to 2.2

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2
    • Fix Version/s: 3.1
    • Labels:
      None
    • Environment:

      Linux

      Description

      Switching between commons-math 2.0 to 2.2 we note how the
      DescriptiveStatistics.addValue(double) has decrease the performance.

      I tested with 2 million values.

      DescriptiveStatistics ds = new DescriptiveStatistics();
      for(int i = 0; i<1000*1000*2; i++)

      { //2 million values ds.addValue(v); }

      ds.getPercentile(50);

      Seems that depending by the values inserted in the DescriptiveStatistics it takes different time:

      • with a single value (0)
        • 2.0 -> take ~500 ms
        • 2.2 -> take more than 10 minutes
      • with 50% fixed value (0) and 50% Math.random()
        • 2.0 -> take ~500 ms
        • 2.2 -> take ~250000 ms -> ~250 second
      • with 100% Math.random()
        • 2.0 -> take ~500 ms
        • 2.2 -> take ~70 ms
      1. percentile.png
        28 kB
        Paolo Repele

        Issue Links

          Activity

          Hide
          Mikkel Meyer Andersen added a comment -

          Have you tried more detailed profiling? E.g. in Eclipse to see which methods are using the majority of time?

          Show
          Mikkel Meyer Andersen added a comment - Have you tried more detailed profiling? E.g. in Eclipse to see which methods are using the majority of time?
          Hide
          Phil Steitz added a comment -

          Thanks for reporting this. I assume the timings include the percentile calculation, right?

          This could be related to the changes in the Percentile implementation in 2.2. If isolating the timing to just the percentile calculation shows that is where the latency difference is, we should reopen MATH-417. The changes there were to improve Percentile performance, which in most cases they do. The first two results above are disturbing, however. If your data is largely constant and this creates a problem in your application, as a workaround, you can provide an alternative Percentile implementation to DescriptiveStatistics using setPercentileImpl.

          Show
          Phil Steitz added a comment - Thanks for reporting this. I assume the timings include the percentile calculation, right? This could be related to the changes in the Percentile implementation in 2.2. If isolating the timing to just the percentile calculation shows that is where the latency difference is, we should reopen MATH-417 . The changes there were to improve Percentile performance, which in most cases they do. The first two results above are disturbing, however. If your data is largely constant and this creates a problem in your application, as a workaround, you can provide an alternative Percentile implementation to DescriptiveStatistics using setPercentileImpl.
          Hide
          Paolo Repele added a comment -

          Image file to show the profile snapshot

          Show
          Paolo Repele added a comment - Image file to show the profile snapshot
          Hide
          Mikkel Meyer Andersen added a comment -

          Sorry for my (too) short first answer. Thanks for your proper introduction, Phil.

          I'll try a more detailed profiling to see what's causing the performance problems.

          Show
          Mikkel Meyer Andersen added a comment - Sorry for my (too) short first answer. Thanks for your proper introduction, Phil. I'll try a more detailed profiling to see what's causing the performance problems.
          Hide
          Paolo Repele added a comment - - edited

          No Problem

          • yep, the time was only for the getPercentile() method.
          • I added an image where you can see the profile snapshot

          Usually we use this library to analyze some grids. These grids can be very huge and can be generated using the same values for all the cells or a continue function around the grid or any combination of both.
          Then we have really no idea how these grids can be generated.

          Show
          Paolo Repele added a comment - - edited No Problem yep, the time was only for the getPercentile() method. I added an image where you can see the profile snapshot Usually we use this library to analyze some grids. These grids can be very huge and can be generated using the same values for all the cells or a continue function around the grid or any combination of both. Then we have really no idea how these grids can be generated.
          Hide
          Mikkel Meyer Andersen added a comment -

          Also, it seems like FastMath is new to 2.2. I'll try to investigate what causes this.

          Show
          Mikkel Meyer Andersen added a comment - Also, it seems like FastMath is new to 2.2. I'll try to investigate what causes this.
          Hide
          Mikkel Meyer Andersen added a comment -

          As far as I can see, Percentile contributes a lot to the longer execution time, so reopening MATH-417 for datasets of this type might be the right thing to do.

          Show
          Mikkel Meyer Andersen added a comment - As far as I can see, Percentile contributes a lot to the longer execution time, so reopening MATH-417 for datasets of this type might be the right thing to do.
          Hide
          Phil Steitz added a comment -

          Not sure this is in fact a bug, but rather a feature resulting from overall performance improvements in Percentile (poorer performance for a relatively small number of problem instances). I do not see it as showstopper for 3.0, so moving to 3.1.

          Show
          Phil Steitz added a comment - Not sure this is in fact a bug, but rather a feature resulting from overall performance improvements in Percentile (poorer performance for a relatively small number of problem instances). I do not see it as showstopper for 3.0, so moving to 3.1.
          Hide
          Gilles added a comment -

          Please rerun your test when MATH-805 has been solved, as it seems that this issue might be caused by the same bug.

          Show
          Gilles added a comment - Please rerun your test when MATH-805 has been solved, as it seems that this issue might be caused by the same bug.
          Hide
          Thomas Neidhart added a comment -

          Fixed in r1364318.
          See also MATH-805 with a description of the problem.

          Show
          Thomas Neidhart added a comment - Fixed in r1364318. See also MATH-805 with a description of the problem.
          Hide
          Thomas Neidhart added a comment -

          I did the provided test myself and indeed it is the same problem and is fixed by the suggested changes.

          Show
          Thomas Neidhart added a comment - I did the provided test myself and indeed it is the same problem and is fixed by the suggested changes.

            People

            • Assignee:
              Mikkel Meyer Andersen
              Reporter:
              Paolo Repele
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development