Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16651

Optimize execution of KNN sub-query to apply it only on documents remaining after the main query



    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 9.1.1
    • None
    • query


      Solr 9.1 introduced pre-filtering for KNN queries, which is great and is working fine when the KNN is the main query.

      I was wondering rather it would be possible to make something similar, but for the case of KNN being a sub-query instead of the main query (q). Let me show an example use case with the films example.

      I want to query for films with “the” in the name, and filter only films with genre “Drama”, then calculate the similarity of these films vectors according to my target vector. The idea is making a simple lexical query, and using the KNN sub-query to calculate similarities (not really sorting by the similarity necessarily). Here is an example query:

      This query works fine, the problem is that the `my_similarity` subquery runs for all of the 1,100 film documents, instead of running only for the 51 that are relevant for the query. For a small collection like this it does not make a difference, but I have a collection with 12 million documents that makes queries similar like this to run very slow, even tough the retrieval being small.

      I tried using the cache and cost parameters to "force" the KNN sub-query running after the main query (`{!knn cache=false cost=101 f=film_vector topK=10000}[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]`), but it does not work (I guess the PostFilter is not implemented for KNN).

      This issue might be related to the fix of the StackOverflow bug of frange with KNN (https://issues.apache.org/jira/browse/SOLR-16567).




            Unassigned Unassigned
            gmagno Gabriel Magno
            0 Vote for this issue
            3 Start watching this issue