Commons Math
  1. Commons Math
  2. MATH-578

Decrease DescriptiveStatistics performance from 2.0 to 2.2

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2
    • Fix Version/s: 3.1
    • Labels:
      None
    • Environment:

      Linux

      Description

      Switching between commons-math 2.0 to 2.2 we note how the
      DescriptiveStatistics.addValue(double) has decrease the performance.

      I tested with 2 million values.

      DescriptiveStatistics ds = new DescriptiveStatistics();
      for(int i = 0; i<1000*1000*2; i++)

      { //2 million values ds.addValue(v); }

      ds.getPercentile(50);

      Seems that depending by the values inserted in the DescriptiveStatistics it takes different time:

      • with a single value (0)
        • 2.0 -> take ~500 ms
        • 2.2 -> take more than 10 minutes
      • with 50% fixed value (0) and 50% Math.random()
        • 2.0 -> take ~500 ms
        • 2.2 -> take ~250000 ms -> ~250 second
      • with 100% Math.random()
        • 2.0 -> take ~500 ms
        • 2.2 -> take ~70 ms
      1. percentile.png
        28 kB
        Paolo Repele

        Issue Links

          Activity

          Paolo Repele created issue -
          Hide
          Mikkel Meyer Andersen added a comment -

          Have you tried more detailed profiling? E.g. in Eclipse to see which methods are using the majority of time?

          Show
          Mikkel Meyer Andersen added a comment - Have you tried more detailed profiling? E.g. in Eclipse to see which methods are using the majority of time?
          Mikkel Meyer Andersen made changes -
          Field Original Value New Value
          Assignee Mikkel Meyer Andersen [ mikl ]
          Hide
          Phil Steitz added a comment -

          Thanks for reporting this. I assume the timings include the percentile calculation, right?

          This could be related to the changes in the Percentile implementation in 2.2. If isolating the timing to just the percentile calculation shows that is where the latency difference is, we should reopen MATH-417. The changes there were to improve Percentile performance, which in most cases they do. The first two results above are disturbing, however. If your data is largely constant and this creates a problem in your application, as a workaround, you can provide an alternative Percentile implementation to DescriptiveStatistics using setPercentileImpl.

          Show
          Phil Steitz added a comment - Thanks for reporting this. I assume the timings include the percentile calculation, right? This could be related to the changes in the Percentile implementation in 2.2. If isolating the timing to just the percentile calculation shows that is where the latency difference is, we should reopen MATH-417 . The changes there were to improve Percentile performance, which in most cases they do. The first two results above are disturbing, however. If your data is largely constant and this creates a problem in your application, as a workaround, you can provide an alternative Percentile implementation to DescriptiveStatistics using setPercentileImpl.
          Hide
          Paolo Repele added a comment -

          Image file to show the profile snapshot

          Show
          Paolo Repele added a comment - Image file to show the profile snapshot
          Paolo Repele made changes -
          Attachment percentile.png [ 12479338 ]
          Hide
          Mikkel Meyer Andersen added a comment -

          Sorry for my (too) short first answer. Thanks for your proper introduction, Phil.

          I'll try a more detailed profiling to see what's causing the performance problems.

          Show
          Mikkel Meyer Andersen added a comment - Sorry for my (too) short first answer. Thanks for your proper introduction, Phil. I'll try a more detailed profiling to see what's causing the performance problems.
          Hide
          Paolo Repele added a comment - - edited

          No Problem

          • yep, the time was only for the getPercentile() method.
          • I added an image where you can see the profile snapshot

          Usually we use this library to analyze some grids. These grids can be very huge and can be generated using the same values for all the cells or a continue function around the grid or any combination of both.
          Then we have really no idea how these grids can be generated.

          Show
          Paolo Repele added a comment - - edited No Problem yep, the time was only for the getPercentile() method. I added an image where you can see the profile snapshot Usually we use this library to analyze some grids. These grids can be very huge and can be generated using the same values for all the cells or a continue function around the grid or any combination of both. Then we have really no idea how these grids can be generated.
          Hide
          Mikkel Meyer Andersen added a comment -

          Also, it seems like FastMath is new to 2.2. I'll try to investigate what causes this.

          Show
          Mikkel Meyer Andersen added a comment - Also, it seems like FastMath is new to 2.2. I'll try to investigate what causes this.
          Hide
          Mikkel Meyer Andersen added a comment -

          As far as I can see, Percentile contributes a lot to the longer execution time, so reopening MATH-417 for datasets of this type might be the right thing to do.

          Show
          Mikkel Meyer Andersen added a comment - As far as I can see, Percentile contributes a lot to the longer execution time, so reopening MATH-417 for datasets of this type might be the right thing to do.
          Hide
          Phil Steitz added a comment -

          Not sure this is in fact a bug, but rather a feature resulting from overall performance improvements in Percentile (poorer performance for a relatively small number of problem instances). I do not see it as showstopper for 3.0, so moving to 3.1.

          Show
          Phil Steitz added a comment - Not sure this is in fact a bug, but rather a feature resulting from overall performance improvements in Percentile (poorer performance for a relatively small number of problem instances). I do not see it as showstopper for 3.0, so moving to 3.1.
          Phil Steitz made changes -
          Fix Version/s 3.1 [ 12317576 ]
          Gilles made changes -
          Link This issue is related to MATH-805 [ MATH-805 ]
          Hide
          Gilles added a comment -

          Please rerun your test when MATH-805 has been solved, as it seems that this issue might be caused by the same bug.

          Show
          Gilles added a comment - Please rerun your test when MATH-805 has been solved, as it seems that this issue might be caused by the same bug.
          Thomas Neidhart made changes -
          Link This issue is duplicated by MATH-805 [ MATH-805 ]
          Thomas Neidhart made changes -
          Link This issue is related to MATH-805 [ MATH-805 ]
          Hide
          Thomas Neidhart added a comment -

          Fixed in r1364318.
          See also MATH-805 with a description of the problem.

          Show
          Thomas Neidhart added a comment - Fixed in r1364318. See also MATH-805 with a description of the problem.
          Thomas Neidhart made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Thomas Neidhart added a comment -

          I did the provided test myself and indeed it is the same problem and is fixed by the suggested changes.

          Show
          Thomas Neidhart added a comment - I did the provided test myself and indeed it is the same problem and is fixed by the suggested changes.
          Thomas Neidhart made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          433d 26m 1 Thomas Neidhart 22/Jul/12 16:01
          Resolved Resolved Closed Closed
          225d 3h 51m 1 Thomas Neidhart 04/Mar/13 18:53

            People

            • Assignee:
              Mikkel Meyer Andersen
              Reporter:
              Paolo Repele
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development