Lucene - Core
  1. Lucene - Core
  2. LUCENE-3643

Improve FilteredQuery to shortcut on wrapped MatchAllDocsQuery

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Since the rewrite of Lucene trunk to delegate all Filter logic to FilteredQuery, by simply wrapping in IndexSearcher.wrapFilter(), we can do more short circuits and improve query execution. A common use case it to pass MatchAllDocsQuery as query to IndexSearcher and a filter. For the underlying hit collection this is stupid and slow, as MatchAllDocsQuery simply increments the docID and checks acceptDocs. If the filter is sparse, this is a big waste. This patch changes FilteredQuery.rewrite() to short circuit and return ConstantScoreQuery, if the query is MatchAllDocs.

      1. LUCENE-3643.patch
        0.0 kB
        Uwe Schindler
      2. LUCENE-3643.patch
        17 kB
        Uwe Schindler
      3. LUCENE-3643.patch
        17 kB
        Uwe Schindler
      4. LUCENE-3643.patch
        14 kB
        Uwe Schindler
      5. LUCENE-3643-allowNullQueryFilter.patch
        17 kB
        Uwe Schindler
      6. LUCENE-3643-noNullQueryFilter.patch
        11 kB
        Uwe Schindler

        Activity

        Hide
        Uwe Schindler added a comment -

        Patch for trunk.

        It also adds tests that checks for correctly rewritten queries with correct boosts (it assigns random boosts and checks the resulting query's boost).

        Show
        Uwe Schindler added a comment - Patch for trunk. It also adds tests that checks for correctly rewritten queries with correct boosts (it assigns random boosts and checks the resulting query's boost).
        Hide
        Uwe Schindler added a comment -

        With this patch applied, it's now also possible to search directly on IndexSearcher with only a filter given:

        TopDocs topDocs = searcher.search(null, filter, topN);
        
        Show
        Uwe Schindler added a comment - With this patch applied, it's now also possible to search directly on IndexSearcher with only a filter given: TopDocs topDocs = searcher.search( null , filter, topN);
        Hide
        Uwe Schindler added a comment -

        Improved patch with more documentation in FilteredQuery.rewrite() and easier if/then/else checks (+ asserts).

        Also added some more javadocs.

        This cannot be backported to 3.x, because:

        • In 3.x Filters are not required to only contain non-deleted documents (in trunk, they must respect non-null acceptDocs). As ConstantScoreQuery does not enforce deleted documents the optimization may return wrong results (MatchAllDocs is required)
        • The API of IndexSearcher cannot be modified, so wrapFilter() is not available there. IS delegates hit collection to FilteredQuery, but it does not rewrite Filter+Query to FilteredQuery.

        This is ready to commit.

        Show
        Uwe Schindler added a comment - Improved patch with more documentation in FilteredQuery.rewrite() and easier if/then/else checks (+ asserts). Also added some more javadocs. This cannot be backported to 3.x, because: In 3.x Filters are not required to only contain non-deleted documents (in trunk, they must respect non-null acceptDocs). As ConstantScoreQuery does not enforce deleted documents the optimization may return wrong results (MatchAllDocs is required) The API of IndexSearcher cannot be modified, so wrapFilter() is not available there. IS delegates hit collection to FilteredQuery, but it does not rewrite Filter+Query to FilteredQuery. This is ready to commit.
        Hide
        Robert Muir added a comment -

        Hi Uwe, I think the patch looks great.

        I just have one question, should we really advertise that you can pass a null Query to IndexSearcher?

        I'm not saying I'm against it, but maybe we should think about it... because then maybe someone
        says its strange they can't have a null query and null filter with a Sort.

        If FilteredQuery optimizes the MatchAllDocsCase behind the scenes anyway, what is the harm in keeping
        the current semantics?

        Show
        Robert Muir added a comment - Hi Uwe, I think the patch looks great. I just have one question, should we really advertise that you can pass a null Query to IndexSearcher? I'm not saying I'm against it, but maybe we should think about it... because then maybe someone says its strange they can't have a null query and null filter with a Sort. If FilteredQuery optimizes the MatchAllDocsCase behind the scenes anyway, what is the harm in keeping the current semantics?
        Hide
        Uwe Schindler added a comment -

        Here a more simplified rewrite. As rewrite is running in a loop "while (query != query.rewrite())", we don't need to do everything on first pass. The special cases are simply expanded only on the not rewritten inner query, otherwise a new FilteredQuery is created. The second pass may rewrite the inner query.

        Show
        Uwe Schindler added a comment - Here a more simplified rewrite. As rewrite is running in a loop "while (query != query.rewrite())", we don't need to do everything on first pass. The special cases are simply expanded only on the not rewritten inner query, otherwise a new FilteredQuery is created. The second pass may rewrite the inner query.
        Hide
        Uwe Schindler added a comment -

        I just have one question, should we really advertise that you can pass a null Query to IndexSearcher?

        I heard this question quite often, how to execute a solely Filter without query. As the current API allows filter==null, why not the same for the query. Passing null as filter and query of course makes no real sense, so I don't expect this question.

        I think it is harder for people to explain them that they must a) either wrap the Filter with CSQ or b) must pass a MatchAllDocsQuery. If you tell them, just call IndexSearcher.search() and pass either filter, query or both, then it's much easier.

        Show
        Uwe Schindler added a comment - I just have one question, should we really advertise that you can pass a null Query to IndexSearcher? I heard this question quite often, how to execute a solely Filter without query. As the current API allows filter==null, why not the same for the query. Passing null as filter and query of course makes no real sense, so I don't expect this question. I think it is harder for people to explain them that they must a) either wrap the Filter with CSQ or b) must pass a MatchAllDocsQuery. If you tell them, just call IndexSearcher.search() and pass either filter, query or both, then it's much easier.
        Hide
        Uwe Schindler added a comment -

        Here 2 patch variants:

        • allowNullQueryFilter.patch is the same patch as before, with some rewrite() improvements for more clarity, but it allows null Query and null Filter and so does IndexSearcher.
        • noNullQueryFilter.patch is a simplier patch, disallowing null filters and null queries, so IndexSaercher behaves as before. The only optimization is to rewrite the MatchAllDocs+Filter combination to CSQ with combined boost.

        After sleeping one night about it, I think we should start with the latter patch and maybe think about allowing a null query in IndexSearcher separately in another issue. In my opinion, the null special case should maybe done inside IS.wrapFilter. Allowing null as query or filter in FilteredQuery makes no sense and adds complexity.

        Show
        Uwe Schindler added a comment - Here 2 patch variants: allowNullQueryFilter.patch is the same patch as before, with some rewrite() improvements for more clarity, but it allows null Query and null Filter and so does IndexSearcher. noNullQueryFilter.patch is a simplier patch, disallowing null filters and null queries, so IndexSaercher behaves as before. The only optimization is to rewrite the MatchAllDocs+Filter combination to CSQ with combined boost. After sleeping one night about it, I think we should start with the latter patch and maybe think about allowing a null query in IndexSearcher separately in another issue. In my opinion, the null special case should maybe done inside IS.wrapFilter. Allowing null as query or filter in FilteredQuery makes no sense and adds complexity.
        Hide
        Uwe Schindler added a comment -

        I will commit the noBullQuerFilter.patch soon!

        Show
        Uwe Schindler added a comment - I will commit the noBullQuerFilter.patch soon!
        Hide
        Uwe Schindler added a comment -

        Committed LUCENE-3643-noNullQueryFilter.patch to trunk revision: 1213771

        Show
        Uwe Schindler added a comment - Committed LUCENE-3643 -noNullQueryFilter.patch to trunk revision: 1213771

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development