Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1052

Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.6
    • 0.8
    • classic

    Description

      Add a parameter to MinHash clustering that specifies the dimension of vector to hash (indexes or values). Current version of MinHash clustering only hashed values of vectors. Based on discussion on dev-mahout list, both of the use-cases are possible and frequently met in practice.
      Preserve backward compatibility with default dimension set to values. Add new unit tests.

      Attachments

        1. MAHOUT-1052.patch
          16 kB
          Suneel Marthi
        2. MAHOUT-1052.patch
          14 kB
          Elena Smirnova

        Activity

          People

            smarthi Suneel Marthi
            esmirnova Elena Smirnova
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: