Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10146

Add VectorSimilarityFunction.COSINE

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 9.0
    • None
    • New

    Description

      To perform ANN search with cosine similarity, users are expected to normalize the document and query vectors to unit length, then use VectorSimilarityFunction.DOT_PRODUCT. I think it would be good to also support cosine similarity directly through VectorSimilarityFunction.COSINE. This would allow users to perform ANN based on cosine similarity, while retaining access to the original vectors through VectorValues. That way they can use the original vectors in a reranking step or return them to the application for further processing.

      It looks like nmslib and hnswlib support cosine similarity. On the other hand, FAISS only supports dot product and suggests users normalize the vectors to perform cosine similarity (https://github.com/facebookresearch/faiss/issues/95). To me adding this one additional similarity is worth it in terms of what it lets users accomplish.

      Attachments

        Activity

          People

            Unassigned Unassigned
            julietibs Julie Tibshirani
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2.5h
                2.5h