Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1052

Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values)

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.8
    • Component/s: Clustering
    • Labels:

      Description

      Add a parameter to MinHash clustering that specifies the dimension of vector to hash (indexes or values). Current version of MinHash clustering only hashed values of vectors. Based on discussion on dev-mahout list, both of the use-cases are possible and frequently met in practice.
      Preserve backward compatibility with default dimension set to values. Add new unit tests.

        Attachments

        1. MAHOUT-1052.patch
          14 kB
          Elena Smirnova
        2. MAHOUT-1052.patch
          16 kB
          Suneel Marthi

          Activity

            People

            • Assignee:
              smarthi Suneel Marthi
              Reporter:
              esmirnova Elena Smirnova
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: