Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9322

Discussing a unified vectors format API

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 9.0
    • None
    • New

    Description

      Two different approximate nearest neighbor approaches are currently being developed, one based on HNSW (LUCENE-9004) and another based on coarse quantization (LUCENE-9136). Each prototype proposes to add a new format to handle vectors. In LUCENE-9136 we discussed the possibility of a unified API that could support both approaches. The two ANN strategies give different trade-offs in terms of speed, memory, and complexity, and it’s likely that we’ll want to support both. Vector search is also an active research area, and it would be great to be able to prototype and incorporate new approaches without introducing more formats.

      To me it seems like a good time to begin discussing a unified API. The prototype for coarse quantization (https://github.com/apache/lucene-solr/pull/1314) could be ready to commit soon (this depends on everyone's feedback of course). The approach is simple and shows solid search performance, as seen here. I think this API discussion is an important step in moving that implementation forward.

      The goals of the API would be

      1. Support for storing and retrieving individual float vectors.
      2. Support for approximate nearest neighbor search – given a query vector, return the indexed vectors that are closest to it.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jtibshirani Julie Tibshirani
            Votes:
            3 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 11h
                11h