Lucene - Core
  1. Lucene - Core
  2. LUCENE-6639

LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 5.3
    • Fix Version/s: 5.3
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The method org.apache.lucene.search.LRUQueryCache.CachingWrapperWeight.scorer(LeafReaderContext) starts with

      if (context.ord == 0) {
          policy.onUse(getQuery());
      }
      

      which can result in a missed call for queries that return a null scorer for the first segment.

        Activity

        Hide
        Terry Smith added a comment -

        Attached unit test will fail if the extra IndexWriter.commit() gets triggered or the BooleanQuery clauses are shuffled to make the first clauses' scorer null for the first segment.

        Show
        Terry Smith added a comment - Attached unit test will fail if the extra IndexWriter.commit() gets triggered or the BooleanQuery clauses are shuffled to make the first clauses' scorer null for the first segment.
        Hide
        Adrien Grand added a comment -

        I think there are two natural places where we can call policy.onUse(): Query.createWeight and Weight.scorer. None of them is perfect: if we put it in Query.createWeight, we will over-count how many times the query has been used if the weight is created yet never used, or under-count if the weight is used several times. And in Weight.scorer we under-count if no scorer is pulled on the first leaf as you noticed. Out of these two options, Weight.scorer is my favourite because it can never over-count, which I think it better given that caching too much is probably more harmful than not caching enough?

        We could try to do something more sophisticated but I'm not sure it would be worth the complexity?

        Show
        Adrien Grand added a comment - I think there are two natural places where we can call policy.onUse(): Query.createWeight and Weight.scorer. None of them is perfect: if we put it in Query.createWeight, we will over-count how many times the query has been used if the weight is created yet never used, or under-count if the weight is used several times. And in Weight.scorer we under-count if no scorer is pulled on the first leaf as you noticed. Out of these two options, Weight.scorer is my favourite because it can never over-count, which I think it better given that caching too much is probably more harmful than not caching enough? We could try to do something more sophisticated but I'm not sure it would be worth the complexity?
        Hide
        Terry Smith added a comment -

        This doesn't seem pressing but irked me enough to submit a ticket. It feels that we should be able to be more correct but the current API isn't very supportive of that work flow.

        I slightly prefer calling onUse() from createWeight() as it does make this edge case of the first segment go away which I feel is harder to reason about than someone creating a weight and not using it. The improved multi-threaded search code in IndexSearcher is a great example of this misbehaving where there is no guarantee that the first segment's Weight.scorer() will be called before the other segments. However I'm not familiar with use cases that use Query.createWeight() without executing some kind of search or explain to know if they are more of an issue.

        Is adding bookend methods to more correctly detect the begin/end of the search phase seen as too messy and special casey?

        At the end of the day I also wonder if it's worth the complexity but wanted to open this ticket to bootstrap the discussion as this could be a hard problem to diagnose in the future (someone wants to know why their query isn't getting cached and it's due to some obscure detail like this).

        Show
        Terry Smith added a comment - This doesn't seem pressing but irked me enough to submit a ticket. It feels that we should be able to be more correct but the current API isn't very supportive of that work flow. I slightly prefer calling onUse() from createWeight() as it does make this edge case of the first segment go away which I feel is harder to reason about than someone creating a weight and not using it. The improved multi-threaded search code in IndexSearcher is a great example of this misbehaving where there is no guarantee that the first segment's Weight.scorer() will be called before the other segments. However I'm not familiar with use cases that use Query.createWeight() without executing some kind of search or explain to know if they are more of an issue. Is adding bookend methods to more correctly detect the begin/end of the search phase seen as too messy and special casey? At the end of the day I also wonder if it's worth the complexity but wanted to open this ticket to bootstrap the discussion as this could be a hard problem to diagnose in the future (someone wants to know why their query isn't getting cached and it's due to some obscure detail like this).
        Hide
        Adrien Grand added a comment -

        One issue I have with putting the call in createWeight is that you might sometimes only pull a Weight in order to extract terms (eg. for highlighting or computing distributed term frequencies), so incrementing the counter here would not work.

        That said, you made good arguments against the current logic. In particular it's true that reusing weights for multiple collections should not be common so maybe we can just call policy.onUse on the first time that Weight.scorer is called?

        Show
        Adrien Grand added a comment - One issue I have with putting the call in createWeight is that you might sometimes only pull a Weight in order to extract terms (eg. for highlighting or computing distributed term frequencies), so incrementing the counter here would not work. That said, you made good arguments against the current logic. In particular it's true that reusing weights for multiple collections should not be common so maybe we can just call policy.onUse on the first time that Weight.scorer is called?
        Hide
        Terry Smith added a comment -

        Ah, I didn't realize the highlighters were creating the weights to extract the terms, that makes sense.

        I like the idea of just calling onUse() the first time scorer() is called, that ought to be more robust and is very easy to understand.

        Show
        Terry Smith added a comment - Ah, I didn't realize the highlighters were creating the weights to extract the terms, that makes sense. I like the idea of just calling onUse() the first time scorer() is called, that ought to be more robust and is very easy to understand.
        Hide
        Adrien Grand added a comment -

        Thanks for the feedback Terry, I'll commit shortly then!

        Show
        Adrien Grand added a comment - Thanks for the feedback Terry, I'll commit shortly then!
        Hide
        ASF subversion and git services added a comment -

        Commit 1689464 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1689464 ]

        LUCENE-6639: Make LRUQueryCache consider a query as used on the first time a Scorer is pulled.

        Show
        ASF subversion and git services added a comment - Commit 1689464 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1689464 ] LUCENE-6639 : Make LRUQueryCache consider a query as used on the first time a Scorer is pulled.
        Hide
        ASF subversion and git services added a comment -

        Commit 1689470 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1689470 ]

        LUCENE-6639: Make LRUQueryCache consider a query as used on the first time a Scorer is pulled.

        Show
        ASF subversion and git services added a comment - Commit 1689470 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1689470 ] LUCENE-6639 : Make LRUQueryCache consider a query as used on the first time a Scorer is pulled.
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Unassigned
            Reporter:
            Terry Smith
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development