Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18334

What hashDistance should MinHash use?

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • None
    • None
    • None

    Description

      MinHash currently is using the same `hashDistance` function as RandomProjection. This does not make sense for MinHash because the Jaccard distance of two sets is not relevant to the absolute distance of their hash buckets indices.

      This bug could affect accuracy of multi probing NN search for MinHash.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              yunn Yun Ni
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: