Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-159

SparseVector and DenseVector hashCode does not conform to the Java standard

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.2
    • Fix Version/s: 0.2
    • Component/s: Math
    • Labels:
      None

      Description

      The hash codes for SparseVector and DenseVector will not be equal even though equals() may return true. Also, the equals logic is inconsistent because DenseVector takes into account the name parameter but SparseVector does not.

      1. MAHOUT-159.patch
        8 kB
        Mark Desnoyer
      2. MAHOUT-159.patch
        6 kB
        Mark Desnoyer
      3. MAHOUT-159.patch
        5 kB
        Mark Desnoyer
      4. MAHOUT-159.patch
        5 kB
        Mark Desnoyer

        Activity

        Hide
        mdesnoyer Mark Desnoyer added a comment -

        Added patch to fix this inconsistency. Includes tests to make sure it won't happen again.

        Show
        mdesnoyer Mark Desnoyer added a comment - Added patch to fix this inconsistency. Includes tests to make sure it won't happen again.
        Hide
        mdesnoyer Mark Desnoyer added a comment -

        I missed a broken test on the first patch

        Show
        mdesnoyer Mark Desnoyer added a comment - I missed a broken test on the first patch
        Hide
        gsingers Grant Ingersoll added a comment -

        My only suggestion is that the hashCode method should use the iterateNonZeros() method instead of iterating over size().

        Show
        gsingers Grant Ingersoll added a comment - My only suggestion is that the hashCode method should use the iterateNonZeros() method instead of iterating over size().
        Hide
        tdunning Ted Dunning added a comment -

        If you iterate over non zeros, then you should also include the dimensions in the hash code so that different sized vectors will have different hashes. In addition, the indexes of the non-zero elements should be included so that vectors with the same values in different positions will have different hashes. I think that we want to be fairly sure that

        hash([1,0,2,0,0,0]) != hash([1,0,2])

        and

        hash([1,0,2]) != hash([1,2,0])
        Show
        tdunning Ted Dunning added a comment - If you iterate over non zeros, then you should also include the dimensions in the hash code so that different sized vectors will have different hashes. In addition, the indexes of the non-zero elements should be included so that vectors with the same values in different positions will have different hashes. I think that we want to be fairly sure that hash([1,0,2,0,0,0]) != hash([1,0,2]) and hash([1,0,2]) != hash([1,2,0])
        Hide
        mdesnoyer Mark Desnoyer added a comment -

        As per Ted's suggestion, added vector size and element index to the hash

        Show
        mdesnoyer Mark Desnoyer added a comment - As per Ted's suggestion, added vector size and element index to the hash
        Hide
        gsingers Grant Ingersoll added a comment -

        Hmm, seems some tests fail with this patch. TestKMeans, etc. I haven't investigated yet and it is likely something is wrong in the tests, but they are still there.

        Show
        gsingers Grant Ingersoll added a comment - Hmm, seems some tests fail with this patch. TestKMeans, etc. I haven't investigated yet and it is likely something is wrong in the tests, but they are still there.
        Hide
        mdesnoyer Mark Desnoyer added a comment -

        Bah. Sorry about that. I had a fix for that test but it didn't get into the diff for some reason and I didn't check it.

        Show
        mdesnoyer Mark Desnoyer added a comment - Bah. Sorry about that. I had a fix for that test but it didn't get into the diff for some reason and I didn't check it.
        Hide
        gsingers Grant Ingersoll added a comment -

        Committed revision 810184.

        Show
        gsingers Grant Ingersoll added a comment - Committed revision 810184.

          People

          • Assignee:
            gsingers Grant Ingersoll
            Reporter:
            mdesnoyer Mark Desnoyer
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 2h
              2h
              Remaining:
              Remaining Estimate - 2h
              2h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development