Mahout
  1. Mahout
  2. MAHOUT-159

SparseVector and DenseVector hashCode does not conform to the Java standard

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.2
    • Fix Version/s: 0.2
    • Component/s: Math
    • Labels:
      None

      Description

      The hash codes for SparseVector and DenseVector will not be equal even though equals() may return true. Also, the equals logic is inconsistent because DenseVector takes into account the name parameter but SparseVector does not.

      1. MAHOUT-159.patch
        8 kB
        Mark Desnoyer
      2. MAHOUT-159.patch
        6 kB
        Mark Desnoyer
      3. MAHOUT-159.patch
        5 kB
        Mark Desnoyer
      4. MAHOUT-159.patch
        5 kB
        Mark Desnoyer

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        28d 2h 19m 1 Grant Ingersoll 01/Sep/09 19:22
        Resolved Resolved Closed Closed
        77d 18h 43m 1 Grant Ingersoll 18/Nov/09 14:05
        Grant Ingersoll made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Grant Ingersoll made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.2 [ 12313278 ]
        Resolution Fixed [ 1 ]
        Hide
        Grant Ingersoll added a comment -

        Committed revision 810184.

        Show
        Grant Ingersoll added a comment - Committed revision 810184.
        Mark Desnoyer made changes -
        Attachment MAHOUT-159.patch [ 12416894 ]
        Hide
        Mark Desnoyer added a comment -

        Bah. Sorry about that. I had a fix for that test but it didn't get into the diff for some reason and I didn't check it.

        Show
        Mark Desnoyer added a comment - Bah. Sorry about that. I had a fix for that test but it didn't get into the diff for some reason and I didn't check it.
        Hide
        Grant Ingersoll added a comment -

        Hmm, seems some tests fail with this patch. TestKMeans, etc. I haven't investigated yet and it is likely something is wrong in the tests, but they are still there.

        Show
        Grant Ingersoll added a comment - Hmm, seems some tests fail with this patch. TestKMeans, etc. I haven't investigated yet and it is likely something is wrong in the tests, but they are still there.
        Mark Desnoyer made changes -
        Attachment MAHOUT-159.patch [ 12416100 ]
        Hide
        Mark Desnoyer added a comment -

        As per Ted's suggestion, added vector size and element index to the hash

        Show
        Mark Desnoyer added a comment - As per Ted's suggestion, added vector size and element index to the hash
        Hide
        Ted Dunning added a comment -

        If you iterate over non zeros, then you should also include the dimensions in the hash code so that different sized vectors will have different hashes. In addition, the indexes of the non-zero elements should be included so that vectors with the same values in different positions will have different hashes. I think that we want to be fairly sure that

        hash([1,0,2,0,0,0]) != hash([1,0,2])

        and

        hash([1,0,2]) != hash([1,2,0])
        Show
        Ted Dunning added a comment - If you iterate over non zeros, then you should also include the dimensions in the hash code so that different sized vectors will have different hashes. In addition, the indexes of the non-zero elements should be included so that vectors with the same values in different positions will have different hashes. I think that we want to be fairly sure that hash([1,0,2,0,0,0]) != hash([1,0,2]) and hash([1,0,2]) != hash([1,2,0])
        Hide
        Grant Ingersoll added a comment -

        My only suggestion is that the hashCode method should use the iterateNonZeros() method instead of iterating over size().

        Show
        Grant Ingersoll added a comment - My only suggestion is that the hashCode method should use the iterateNonZeros() method instead of iterating over size().
        Grant Ingersoll made changes -
        Assignee Grant Ingersoll [ gsingers ]
        Mark Desnoyer made changes -
        Attachment MAHOUT-159.patch [ 12415502 ]
        Hide
        Mark Desnoyer added a comment -

        I missed a broken test on the first patch

        Show
        Mark Desnoyer added a comment - I missed a broken test on the first patch
        Mark Desnoyer made changes -
        Field Original Value New Value
        Attachment MAHOUT-159.patch [ 12415494 ]
        Hide
        Mark Desnoyer added a comment -

        Added patch to fix this inconsistency. Includes tests to make sure it won't happen again.

        Show
        Mark Desnoyer added a comment - Added patch to fix this inconsistency. Includes tests to make sure it won't happen again.
        Mark Desnoyer created issue -

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Mark Desnoyer
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 2h
              2h
              Remaining:
              Remaining Estimate - 2h
              2h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development