Commons Math
  1. Commons Math
  2. MATH-384

DescriptiveStatistics based on double[]

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2, 3.0
    • Labels:
      None

      Description

      Right now the DescriptiveStatistics is for ResizableDoubleArray, but if the user has the double[] data she wishes to get DescriptiveStatistics for there is no way of doing this (besides the inefficient way of creating a ResizableDoubleArray from the double[]).

      It would be possible to use StatUtils, but it does not contain near as much functionality as DescriptiveStatistics.

      Idea: Create a DescriptiveStatistics for double[] also. I'm not sure which way of implementing it would be the best, but one way would be to make DescriptiveStatistics abstract with all but the apply-method implemented as now. The apply-method should be made as abstract which then has to implemented in the extensions together with a field containing the values. (addValue, windowSize etc. has to go in the ResizableDoubleArray-extension.) The double[]-extension should then have a constructor taking double[] as a parameter.

      1. MATH384-2_2
        5 kB
        Mikkel Meyer Andersen
      2. MATH384-3_0
        6 kB
        Mikkel Meyer Andersen
      3. patch-MATH-384-2
        7 kB
        Mikkel Meyer Andersen
      4. ResizableArrayInstantiatedByArray
        2 kB
        Mikkel Meyer Andersen

        Activity

        Hide
        Luc Maisonobe added a comment -

        Closing issue as it was included in version 2.2, which has been released

        Show
        Luc Maisonobe added a comment - Closing issue as it was included in version 2.2, which has been released
        Hide
        Mikkel Meyer Andersen added a comment -

        Fixed for 2.2 in revision 1053282 and for 3.0 in revision 1053283.

        Show
        Mikkel Meyer Andersen added a comment - Fixed for 2.2 in revision 1053282 and for 3.0 in revision 1053283.
        Hide
        Phil Steitz added a comment -

        +1 for both patches.

        Show
        Phil Steitz added a comment - +1 for both patches.
        Hide
        Mikkel Meyer Andersen added a comment -

        Proposed patches with comments taken into account. The difference for 2.2 and 3.0 is that in 3.0 the DoubleArray interface has been extended to include addElements(double[])

        Show
        Mikkel Meyer Andersen added a comment - Proposed patches with comments taken into account. The difference for 2.2 and 3.0 is that in 3.0 the DoubleArray interface has been extended to include addElements(double[])
        Hide
        Phil Steitz added a comment -

        Looks good. That would have the effect of a contract() cleanup and is simpler.

        Show
        Phil Steitz added a comment - Looks good. That would have the effect of a contract() cleanup and is simpler.
        Hide
        Phil Steitz added a comment -

        Looks good. I have a couple of comments, which I will address in the commit if we agree on them:

        1. The addition to DoubleArray interface should only be applied to the 3.0 branch

        2. I am not sure we want/need the constructor, DescriptiveStatistics(ResizeableDoubleArray); the version just taking a double[] is probably sufficient and better from maintainability and API simplicity standpoint (in case we ever dump the current internal store).

        3. I need to add some tests to confirm, but I don't think the expansion criteria newSize >= numElements + 1 is correct, unless you first force a contract(). Remember that there could be unused elements at the beginning of the array if "rolling" has occurred since the last contraction. I am also wondering whether you really need to mimic the behavior of expand() in this method. There is no invariant that says expansion needs to always happen in lumps as defined by the expansion mode. This machinery is there to eliminate the need to expand on every single value addition. Seems to me that we could just shove the new values to the end of the array and update numElements. Am I missing something?

        Show
        Phil Steitz added a comment - Looks good. I have a couple of comments, which I will address in the commit if we agree on them: 1. The addition to DoubleArray interface should only be applied to the 3.0 branch 2. I am not sure we want/need the constructor, DescriptiveStatistics(ResizeableDoubleArray); the version just taking a double[] is probably sufficient and better from maintainability and API simplicity standpoint (in case we ever dump the current internal store). 3. I need to add some tests to confirm, but I don't think the expansion criteria newSize >= numElements + 1 is correct, unless you first force a contract(). Remember that there could be unused elements at the beginning of the array if "rolling" has occurred since the last contraction. I am also wondering whether you really need to mimic the behavior of expand() in this method. There is no invariant that says expansion needs to always happen in lumps as defined by the expansion mode. This machinery is there to eliminate the need to expand on every single value addition. Seems to me that we could just shove the new values to the end of the array and update numElements. Am I missing something?
        Hide
        Mikkel Meyer Andersen added a comment -

        Lines 158-163 in the patch:
        + final int oldNumElements = numElements;
        +
        + numElements += values.length;
        +
        + int newSize = 0;
        + int i = 1;
        should possibly be changed to
        + final int oldNumElements = numElements;
        +
        + numElements += values.length;
        +
        + int newSize = internalArray.length;
        + int i = 1;
        to avoid looping if not necessary.

        Show
        Mikkel Meyer Andersen added a comment - Lines 158-163 in the patch: + final int oldNumElements = numElements; + + numElements += values.length; + + int newSize = 0; + int i = 1; should possibly be changed to + final int oldNumElements = numElements; + + numElements += values.length; + + int newSize = internalArray.length; + int i = 1; to avoid looping if not necessary.
        Hide
        Mikkel Meyer Andersen added a comment -

        An updated patch with the requested changes:
        1. Lets add DescriptiveStatistics(double[]) as well.
        2. We need test cases for the new code.
        3. Another useful extension to ResizeableDoubleArray that would lift to DescriptiveStatistics would be addValues(double[]).

        Sorry for the huge time-delay. I'm not sure if the tests are exhaustive enough. If not, please propose some additional ones.

        The patch includes the earlier one, so the earlier one should be disregarded.

        Show
        Mikkel Meyer Andersen added a comment - An updated patch with the requested changes: 1. Lets add DescriptiveStatistics(double[]) as well. 2. We need test cases for the new code. 3. Another useful extension to ResizeableDoubleArray that would lift to DescriptiveStatistics would be addValues(double[]). Sorry for the huge time-delay. I'm not sure if the tests are exhaustive enough. If not, please propose some additional ones. The patch includes the earlier one, so the earlier one should be disregarded.
        Hide
        Mikkel Meyer Andersen added a comment -

        No problem. I agree on all three items. I'll see to it but it will last a couple of weeks before something happens because of I'm leaving for holiday.

        Show
        Mikkel Meyer Andersen added a comment - No problem. I agree on all three items. I'll see to it but it will last a couple of weeks before something happens because of I'm leaving for holiday.
        Hide
        Phil Steitz added a comment -

        Sorry for the delay reviewing this. The change to ResizeableDoubleArray looks good. I have a couple of comments / suggestions for improvement on the patch:
        .
        1. Lets add DescriptiveStatistics(double[]) as well.

        2. We need test cases for the new code.

        3. Another useful extension to ResizeableDoubleArray that would lift to DescriptiveStatistics would be addValues(double[]).

        I am fine skipping 3 for now.

        Show
        Phil Steitz added a comment - Sorry for the delay reviewing this. The change to ResizeableDoubleArray looks good. I have a couple of comments / suggestions for improvement on the patch: . 1. Lets add DescriptiveStatistics(double[]) as well. 2. We need test cases for the new code. 3. Another useful extension to ResizeableDoubleArray that would lift to DescriptiveStatistics would be addValues(double[]). I am fine skipping 3 for now.
        Hide
        Mikkel Meyer Andersen added a comment -

        Patch proposal

        Show
        Mikkel Meyer Andersen added a comment - Patch proposal
        Hide
        Mikkel Meyer Andersen added a comment -

        I'll work on it rather sooner than later .

        Show
        Mikkel Meyer Andersen added a comment - I'll work on it rather sooner than later .
        Hide
        Phil Steitz added a comment -

        +1 - patches welcome!

        Show
        Phil Steitz added a comment - +1 - patches welcome!
        Hide
        Mikkel Meyer Andersen added a comment -

        Agree. And add a constructor public DescriptiveStatistics(ResizableDoubleArray eDA) to DescriptiveStatistics so that one can construct the eDA first, and then the DescriptiveStatistics. And maybe a constructor public DescriptiveStatistics(double[] data) doing this automatically, too?

        Show
        Mikkel Meyer Andersen added a comment - Agree. And add a constructor public DescriptiveStatistics(ResizableDoubleArray eDA) to DescriptiveStatistics so that one can construct the eDA first, and then the DescriptiveStatistics. And maybe a constructor public DescriptiveStatistics(double[] data) doing this automatically, too?
        Hide
        Phil Steitz added a comment -

        Yes, that should work, but in the constructor you should also set initialSize and numElements to the length of the array. That way, values can subsequently be added.

        Show
        Phil Steitz added a comment - Yes, that should work, but in the constructor you should also set initialSize and numElements to the length of the array. That way, values can subsequently be added.
        Hide
        Mikkel Meyer Andersen added a comment -

        You're probably right. What about just adding the constructor
        public ResizableDoubleArray(double[] initialArray)

        { internalArray = initialArray; }

        to ResizableArray (possibly with checking if initialArray == null). It would give a sufficiently efficient ResizableArray (in the case given it is not going to be expanded or anything), would it not?

        Show
        Mikkel Meyer Andersen added a comment - You're probably right. What about just adding the constructor public ResizableDoubleArray(double[] initialArray) { internalArray = initialArray; } to ResizableArray (possibly with checking if initialArray == null). It would give a sufficiently efficient ResizableArray (in the case given it is not going to be expanded or anything), would it not?
        Hide
        Phil Steitz added a comment -

        +1 to add the functionality. Another approach would be to simply add a method addValues(double[]) to DescriptiveStatistics. This would be a little tricky to implement efficiently (possibly requiring modifications to ResizeableDoubleArray) but maybe a little simpler from an interface perspective.

        Yet another idea to consider is to create a value interface (or possibly better a class) similar to StatisticalSummaryValues including all of the statistics maintained by DescriptiveStatistics and then just add a static method to StatUtils that takes a double[] and returns one of these. The same could be done for StatisticalSummaryValues - methods named descriptiveStatistics(double[]) and statisticalSummary(double[]).

        Any other ideas or reasons that any of these are not good?

        Show
        Phil Steitz added a comment - +1 to add the functionality. Another approach would be to simply add a method addValues(double[]) to DescriptiveStatistics. This would be a little tricky to implement efficiently (possibly requiring modifications to ResizeableDoubleArray) but maybe a little simpler from an interface perspective. Yet another idea to consider is to create a value interface (or possibly better a class) similar to StatisticalSummaryValues including all of the statistics maintained by DescriptiveStatistics and then just add a static method to StatUtils that takes a double[] and returns one of these. The same could be done for StatisticalSummaryValues - methods named descriptiveStatistics(double[]) and statisticalSummary(double[]). Any other ideas or reasons that any of these are not good?

          People

          • Assignee:
            Mikkel Meyer Andersen
            Reporter:
            Mikkel Meyer Andersen
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 2h
              2h
              Remaining:
              Remaining Estimate - 2h
              2h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development