Mahout
  1. Mahout
  2. MAHOUT-159

SparseVector and DenseVector hashCode does not conform to the Java standard

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.2
    • Fix Version/s: 0.2
    • Component/s: Math
    • Labels:
      None

      Description

      The hash codes for SparseVector and DenseVector will not be equal even though equals() may return true. Also, the equals logic is inconsistent because DenseVector takes into account the name parameter but SparseVector does not.

      1. MAHOUT-159.patch
        5 kB
        Mark Desnoyer
      2. MAHOUT-159.patch
        5 kB
        Mark Desnoyer
      3. MAHOUT-159.patch
        6 kB
        Mark Desnoyer
      4. MAHOUT-159.patch
        8 kB
        Mark Desnoyer

        Activity

        Hide
        Grant Ingersoll added a comment -

        Committed revision 810184.

        Show
        Grant Ingersoll added a comment - Committed revision 810184.
        Hide
        Mark Desnoyer added a comment -

        Bah. Sorry about that. I had a fix for that test but it didn't get into the diff for some reason and I didn't check it.

        Show
        Mark Desnoyer added a comment - Bah. Sorry about that. I had a fix for that test but it didn't get into the diff for some reason and I didn't check it.
        Hide
        Grant Ingersoll added a comment -

        Hmm, seems some tests fail with this patch. TestKMeans, etc. I haven't investigated yet and it is likely something is wrong in the tests, but they are still there.

        Show
        Grant Ingersoll added a comment - Hmm, seems some tests fail with this patch. TestKMeans, etc. I haven't investigated yet and it is likely something is wrong in the tests, but they are still there.
        Hide
        Mark Desnoyer added a comment -

        As per Ted's suggestion, added vector size and element index to the hash

        Show
        Mark Desnoyer added a comment - As per Ted's suggestion, added vector size and element index to the hash
        Hide
        Ted Dunning added a comment -

        If you iterate over non zeros, then you should also include the dimensions in the hash code so that different sized vectors will have different hashes. In addition, the indexes of the non-zero elements should be included so that vectors with the same values in different positions will have different hashes. I think that we want to be fairly sure that

        hash([1,0,2,0,0,0]) != hash([1,0,2])

        and

        hash([1,0,2]) != hash([1,2,0])
        Show
        Ted Dunning added a comment - If you iterate over non zeros, then you should also include the dimensions in the hash code so that different sized vectors will have different hashes. In addition, the indexes of the non-zero elements should be included so that vectors with the same values in different positions will have different hashes. I think that we want to be fairly sure that hash([1,0,2,0,0,0]) != hash([1,0,2]) and hash([1,0,2]) != hash([1,2,0])
        Hide
        Grant Ingersoll added a comment -

        My only suggestion is that the hashCode method should use the iterateNonZeros() method instead of iterating over size().

        Show
        Grant Ingersoll added a comment - My only suggestion is that the hashCode method should use the iterateNonZeros() method instead of iterating over size().
        Hide
        Mark Desnoyer added a comment -

        I missed a broken test on the first patch

        Show
        Mark Desnoyer added a comment - I missed a broken test on the first patch
        Hide
        Mark Desnoyer added a comment -

        Added patch to fix this inconsistency. Includes tests to make sure it won't happen again.

        Show
        Mark Desnoyer added a comment - Added patch to fix this inconsistency. Includes tests to make sure it won't happen again.

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Mark Desnoyer
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 2h
              2h
              Remaining:
              Remaining Estimate - 2h
              2h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development