Mahout
  1. Mahout
  2. MAHOUT-692

OnlineSummarizer does not tolerate fewer than 100 samples

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.4
    • Fix Version/s: 0.6
    • Component/s: None
    • Labels:
      None

      Description

      If fewer than 100 samples are add()ed to an instance of org.apache.mahout.math.stats.OnlineSummarizer an exception will be thrown during a sort when getQuartile() is called:

      Caused by: java.lang.IndexOutOfBoundsException: from: 0, to: 99, size=89

      at org.apache.mahout.math.list.AbstractList.checkRangeFromTo(AbstractList.java:87)
      at org.apache.mahout.math.list.DoubleArrayList.sortFromTo(DoubleArrayList.java:573)
      at org.apache.mahout.math.stats.OnlineSummarizer.sort(OnlineSummarizer.java:116)
      at org.apache.mahout.math.stats.OnlineSummarizer.getQuartile(OnlineSummarizer.java:129)

      The problem is that sort is on index range 0,99 but 0,n-1 should be used.

        Activity

        Paul Baclace created issue -
        Hide
        Ted Dunning added a comment -

        Harumph. this bug looks familiar.

        Let me look to see if I have a fix on a dev branch.

        Show
        Ted Dunning added a comment - Harumph. this bug looks familiar. Let me look to see if I have a fix on a dev branch.
        Hide
        Ted Dunning added a comment -

        This definitely needs fixing.

        Show
        Ted Dunning added a comment - This definitely needs fixing.
        Ted Dunning made changes -
        Field Original Value New Value
        Fix Version/s 0.6 [ 12316364 ]
        Hide
        Sean Owen added a comment -

        Looks like a very simple fix, to sort the whole "starter" array rather than sort potentially off the end. While I don't know the logic 100% I understand it enough at first glance to not see an obvious reason that would be wrong.

        Show
        Sean Owen added a comment - Looks like a very simple fix, to sort the whole "starter" array rather than sort potentially off the end. While I don't know the logic 100% I understand it enough at first glance to not see an obvious reason that would be wrong.
        Sean Owen made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Ted Dunning [ tdunning ]
        Resolution Fixed [ 1 ]
        Sean Owen made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Ted Dunning
            Reporter:
            Paul Baclace
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development