Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9557

create UDF to measure strings similarity using Cosine Similarity algo

    XMLWordPrintableJSON

Details

    Description

      algo description http://en.wikipedia.org/wiki/Cosine_similarity

      --one word different, total 2 words
      str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
      

      reference implementation:
      https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java

      Attachments

        Activity

          People

            apivovarov Alexander Pivovarov
            apivovarov Alexander Pivovarov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: