Solr
  1. Solr
  2. SOLR-7689

ReRankQuery rewrite method can change the QueryResultKey causing cache misses.

    Details

      Description

      In SolrIndexSearcher class, the key used to lookup results in queryResultCache uses the original query.

      However later in createNormalizedWeight the query gets re-written, and then saved in the queryResultCache after it's re-written.

      This causes cache misses for the same query, and un-necessary inserts in the queryResultCache.

      I can reproduce this using a re-ranking query that is using a main query as a dismax query, the dismax Query could be re-written into a TermQuery, which makes sense, but will cause cache misses.

      I tested a quick solution by just using q.clone() when it comes to build QueryResultKey, and it works fine, but not sure if that is the best way of doing it.

      1. SOLR-7689.patch
        5 kB
        Joel Bernstein
      2. SOLR-7689.patch
        5 kB
        Joel Bernstein
      3. SOLR-7689.patch
        6 kB
        Joel Bernstein
      4. SOLR-7689.patch
        5 kB
        Joel Bernstein
      5. SOLR-7689.patch
        0.6 kB
        Joel Bernstein

        Activity

        Hide
        Joel Bernstein added a comment -

        Wondering if this is a bug in the re-ranking? I'll investigate.

        Show
        Joel Bernstein added a comment - Wondering if this is a bug in the re-ranking? I'll investigate.
        Hide
        Joel Bernstein added a comment -

        It looks like the only way this could be a problem is if the Query was not immutable. Can you post the exact query that is causing the cache miss?

        Show
        Joel Bernstein added a comment - It looks like the only way this could be a problem is if the Query was not immutable. Can you post the exact query that is causing the cache miss?
        Hide
        Joel Bernstein added a comment -

        Quick search turns this up: https://issues.apache.org/jira/browse/LUCENE-6570.

        So cloning would seem like the safe thing to do.

        Would be good to get some more eyes on this.

        Show
        Joel Bernstein added a comment - Quick search turns this up: https://issues.apache.org/jira/browse/LUCENE-6570 . So cloning would seem like the safe thing to do. Would be good to get some more eyes on this.
        Hide
        Joel Bernstein added a comment -

        Yonik Seeley curious to see what you see with this ticket.

        Show
        Joel Bernstein added a comment - Yonik Seeley curious to see what you see with this ticket.
        Hide
        Yonik Seeley added a comment - - edited

        This would definitely be a bug if we actually do use a rewritten query for any caches. SolrIndexSearcher certainly didn't do that in the past, but it's possible that a bug was introduced somewhere along the way.

        Is this because some other code passes SolrIndexSearcher a rewritten query, or do you think there's a bug internal to SolrIndexSearcher?

        edit: I reviewed SolrIndexSearcher, and nothing jumps out at me as incorrect.
        We should not clone the queries in general though (in SolrIndexSearcher) - that's a lot of overhead and the wrong fix. If there's a bad-egg query somewhere that is not effectively immutable (i.e. using it in a search can change it) then that's what should be fixed.

        Show
        Yonik Seeley added a comment - - edited This would definitely be a bug if we actually do use a rewritten query for any caches. SolrIndexSearcher certainly didn't do that in the past, but it's possible that a bug was introduced somewhere along the way. Is this because some other code passes SolrIndexSearcher a rewritten query, or do you think there's a bug internal to SolrIndexSearcher? edit: I reviewed SolrIndexSearcher, and nothing jumps out at me as incorrect. We should not clone the queries in general though (in SolrIndexSearcher) - that's a lot of overhead and the wrong fix. If there's a bad-egg query somewhere that is not effectively immutable (i.e. using it in a search can change it) then that's what should be fixed.
        Hide
        Yonik Seeley added a comment -

        Took a brief look at ReRankQuery, and I do see one problem:

                  this.boostedPriority = (Map<BytesRef, Integer>)context.get(QueryElevationComponent.BOOSTED_PRIORITY);
        

        boostedPriority is a field on ReRankQuery, so this query won't be immutable or thread safe.

        Show
        Yonik Seeley added a comment - Took a brief look at ReRankQuery, and I do see one problem: this .boostedPriority = (Map<BytesRef, Integer >)context.get(QueryElevationComponent.BOOSTED_PRIORITY); boostedPriority is a field on ReRankQuery, so this query won't be immutable or thread safe.
        Hide
        Joel Bernstein added a comment -

        I'll review this.

        Show
        Joel Bernstein added a comment - I'll review this.
        Hide
        Joel Bernstein added a comment -

        I think the problem would be in the query itself. If they are not immutable now, under what conditions can they change? The Lucene IndexSearcher calls rewrite on the query, do any queries change internally because of a call like this?

        Show
        Joel Bernstein added a comment - I think the problem would be in the query itself. If they are not immutable now, under what conditions can they change? The Lucene IndexSearcher calls rewrite on the query, do any queries change internally because of a call like this?
        Hide
        Yonik Seeley added a comment -

        I think the problem would be in the query itself. If they are not immutable now, under what conditions can they change?

        The work in LUCENE-6570 is only tangentially related.
        All the core Lucene queries are what I guess I would call "weakly immutable". If the user (in this case Solr) does not change the query, then the query won't be changed by other read / search operations and is safe to use concurrently across different threads, etc.

        So stuff like LUCENE-6570 is to make stuff like BooleanQuery truly immutable. It would only matter to Solr in the event of a bug.

        The Lucene IndexSearcher calls rewrite on the query, do any queries change internally because of a call like this?

        No, they should not.

        Show
        Yonik Seeley added a comment - I think the problem would be in the query itself. If they are not immutable now, under what conditions can they change? The work in LUCENE-6570 is only tangentially related. All the core Lucene queries are what I guess I would call "weakly immutable". If the user (in this case Solr) does not change the query, then the query won't be changed by other read / search operations and is safe to use concurrently across different threads, etc. So stuff like LUCENE-6570 is to make stuff like BooleanQuery truly immutable. It would only matter to Solr in the event of a bug. The Lucene IndexSearcher calls rewrite on the query, do any queries change internally because of a call like this? No, they should not.
        Hide
        Joel Bernstein added a comment -

        Not seeing a problem with the boostedPriority Map. This is map is created by the main execution thread in the QueryElevationComponent and is only used for a specific query. So each query should have a unique HashMap. Once it's created I believe it's read only. The code is pretty confusing though because of the callback to QueryElelvationComponent. This should probably be looked at.

        Show
        Joel Bernstein added a comment - Not seeing a problem with the boostedPriority Map. This is map is created by the main execution thread in the QueryElevationComponent and is only used for a specific query. So each query should have a unique HashMap. Once it's created I believe it's read only. The code is pretty confusing though because of the callback to QueryElelvationComponent. This should probably be looked at.
        Hide
        Joel Bernstein added a comment -

        Ok, I see the issue. It's a bug specific to the ReRankQParserPlugin. Rewritten queries will foil the query result cache. I'll work on a fix.

        Show
        Joel Bernstein added a comment - Ok, I see the issue. It's a bug specific to the ReRankQParserPlugin. Rewritten queries will foil the query result cache. I'll work on a fix.
        Hide
        Joel Bernstein added a comment -

        Patch that resolves the issue. More details to follow.

        Show
        Joel Bernstein added a comment - Patch that resolves the issue. More details to follow.
        Hide
        Yonik Seeley added a comment -

        Hmmm, the patch makes rewrite on ReRankQuery do nothing.
        Are the internal queries rewritten at some point (some queries require that).

        The normal pattern of rewrite is to rewrite your child queries and if any of them changed, create a new parent query with those child queries.

        Show
        Yonik Seeley added a comment - Hmmm, the patch makes rewrite on ReRankQuery do nothing. Are the internal queries rewritten at some point (some queries require that). The normal pattern of rewrite is to rewrite your child queries and if any of them changed, create a new parent query with those child queries.
        Hide
        Joel Bernstein added a comment -

        Added a set of tests that ensures that the queryResultCache misses are resolved.

        Show
        Joel Bernstein added a comment - Added a set of tests that ensures that the queryResultCache misses are resolved.
        Hide
        Joel Bernstein added a comment -

        Yeah this is special case. The ReRankQuery is just a wrapper that peels off on the rewrite() method. The wrapper stays on in the SolrIndexSearcher to set the topdocs collecter and act as the cache key. In the Lucene IndexSearcher search() method where rewrite() is called, all it needs to do is return the mainQuery. The rewrite method in the Lucene IndexSearcher does a nifty little loop that ensures the main query then gets rewritten.

        Show
        Joel Bernstein added a comment - Yeah this is special case. The ReRankQuery is just a wrapper that peels off on the rewrite() method. The wrapper stays on in the SolrIndexSearcher to set the topdocs collecter and act as the cache key. In the Lucene IndexSearcher search() method where rewrite() is called, all it needs to do is return the mainQuery. The rewrite method in the Lucene IndexSearcher does a nifty little loop that ensures the main query then gets rewritten.
        Hide
        Yonik Seeley added a comment -

        Ah, I had missed that you were returning this.mainQuery and not just "this"

        Show
        Yonik Seeley added a comment - Ah, I had missed that you were returning this.mainQuery and not just "this"
        Hide
        Emad Nashed added a comment - - edited

        Thanks a lot for giving this issue your attention. The query I am trying looks basically like this (after omitting some unimportant stuff)

        q={!dismax qf=$sqf pf=$spf v=$sq}
        rq={!rerank reRankQuery=$rqq reRankDocs=1000 reRankWeight=2}
        rqq=_val_:"recip(ms(NOW/DAY+1DAY,DateUpdated),3.16e-11,1.4,1)"
        sq="test"
        

        I tried the patch, although it fixes the caching issue, but now the Re-Ranking doesn't happen anymore. This is basically because when rewrite is called, ReRanking query will just return the inner dismax query and just hide itself from the searcher. I'll propose another patch shortly. I'd propose that rewrite method on ReRankQuery would clone itself (if the mainQuery has changed) and return the new instance. Sorry for the delay, I just live on the other side of the earth, and there is a big time difference.
        Cheers

        Show
        Emad Nashed added a comment - - edited Thanks a lot for giving this issue your attention. The query I am trying looks basically like this (after omitting some unimportant stuff) q={!dismax qf=$sqf pf=$spf v=$sq} rq={!rerank reRankQuery=$rqq reRankDocs=1000 reRankWeight=2} rqq=_val_:"recip(ms(NOW/DAY+1DAY,DateUpdated),3.16e-11,1.4,1)" sq="test" I tried the patch, although it fixes the caching issue, but now the Re-Ranking doesn't happen anymore. This is basically because when rewrite is called, ReRanking query will just return the inner dismax query and just hide itself from the searcher. I'll propose another patch shortly. I'd propose that rewrite method on ReRankQuery would clone itself (if the mainQuery has changed) and return the new instance. Sorry for the delay, I just live on the other side of the earth, and there is a big time difference. Cheers
        Hide
        Emad Nashed added a comment -

        This patch changes the rewrite function in ReRankQuery. It will rewrite the mainQuery, and if there is a change it will create a new clone instead of modifying itself.

        Show
        Emad Nashed added a comment - This patch changes the rewrite function in ReRankQuery. It will rewrite the mainQuery, and if there is a change it will create a new clone instead of modifying itself.
        Hide
        Joel Bernstein added a comment - - edited

        The test cases are showing that the rerank is occurring with this patch.

        The SolrIndexSearcher never calls rewrite on the query. So it sees the query as the RankQuery wrapper, and will use the ReRankCollector. The search method inside of the parent Lucene IndexSearcher calls rewrite on the RankQuery which returns the mainQuery. IndexSearcher.rewrite() will then call rewrite on the mainQuery. There is a loop in the IndexSearcher.rewrite() method that loops over the query until all the rewrites have been completed.

        If you can provide a failing test case for your example I'll take a look at it.

        Show
        Joel Bernstein added a comment - - edited The test cases are showing that the rerank is occurring with this patch. The SolrIndexSearcher never calls rewrite on the query. So it sees the query as the RankQuery wrapper, and will use the ReRankCollector. The search method inside of the parent Lucene IndexSearcher calls rewrite on the RankQuery which returns the mainQuery. IndexSearcher.rewrite() will then call rewrite on the mainQuery. There is a loop in the IndexSearcher.rewrite() method that loops over the query until all the rewrites have been completed. If you can provide a failing test case for your example I'll take a look at it.
        Hide
        Emad Nashed added a comment -

        I was actually relying on the output from debugQuery to see if it is re-ranking. I didn't see any output for reranking with your patch, so it looks like I made the wrong assumption that if there is no output in debugQuery then it is not re-ranking.

        Show
        Emad Nashed added a comment - I was actually relying on the output from debugQuery to see if it is re-ranking. I didn't see any output for reranking with your patch, so it looks like I made the wrong assumption that if there is no output in debugQuery then it is not re-ranking.
        Hide
        Joel Bernstein added a comment -

        I'll take a look at that. It sounds like we lost the explain part of the re-ranking with this patch. Thanks.

        Show
        Joel Bernstein added a comment - I'll take a look at that. It sounds like we lost the explain part of the re-ranking with this patch. Thanks.
        Hide
        Joel Bernstein added a comment -

        Emad Nashed you're right we lost the ReRank portion of the Explain with this patch. You're approach of cloning the ReRankQuery in the rewrite method should solve the issue. I'll put another patch up to review.

        Show
        Joel Bernstein added a comment - Emad Nashed you're right we lost the ReRank portion of the Explain with this patch. You're approach of cloning the ReRankQuery in the rewrite method should solve the issue. I'll put another patch up to review.
        Hide
        Joel Bernstein added a comment -

        New patch using a different approach then cloning the ReRankQuery. See comments for explanation. I believe this should resolve the cache miss issue and preserve the ReRankQuery's Explain

        Show
        Joel Bernstein added a comment - New patch using a different approach then cloning the ReRankQuery. See comments for explanation. I believe this should resolve the cache miss issue and preserve the ReRankQuery's Explain
        Hide
        Yonik Seeley added a comment -

        ReRankQuery looks like a normal query type to me (i.e. it can be produced by a QParser from a user request, and can be used as a cache key, etc). This also means that a single ReRankQuery object can be used concurrently on different indexes (autowarming is one way this can happen).

        So caching both "rewrittenQuery" and "boostedPriority" which depend on specific requests / indexes is problematic. As a general rule, queries shouldn't change themselves during execution.

        Is there a reason we can't use the normal pattern of returning a new main query in rewrite if any of the child queries were changed by a rewrite?

        Show
        Yonik Seeley added a comment - ReRankQuery looks like a normal query type to me (i.e. it can be produced by a QParser from a user request, and can be used as a cache key, etc). This also means that a single ReRankQuery object can be used concurrently on different indexes (autowarming is one way this can happen). So caching both "rewrittenQuery" and "boostedPriority" which depend on specific requests / indexes is problematic. As a general rule, queries shouldn't change themselves during execution. Is there a reason we can't use the normal pattern of returning a new main query in rewrite if any of the child queries were changed by a rewrite?
        Hide
        Joel Bernstein added a comment -

        Ok I see the issue you're taking about with the rewrittenQuery. I believe it doesn't apply to the boostedPriority because that won't get regenerated during autowarming. I'll verify this.

        The normal pattern is a little different here because we need to return the re-written query wrapped in a ReRankQuery in order to preserver the re-rank Explain. Otherwise we could just return the mainQuery and let that get rewritten. I'll work on a patch that returns a clone of the ReRankedQuery wrapped around the rewritten query. The only trick to this is that the cloned ReRankQuery will also have rewrite() called on it because of the loop in IndexSearcher.rewrite(). So the cloned ReRankQuery will need to return itself from the rewrite method, rather then cloning again in an endless loop.

        Show
        Joel Bernstein added a comment - Ok I see the issue you're taking about with the rewrittenQuery. I believe it doesn't apply to the boostedPriority because that won't get regenerated during autowarming. I'll verify this. The normal pattern is a little different here because we need to return the re-written query wrapped in a ReRankQuery in order to preserver the re-rank Explain. Otherwise we could just return the mainQuery and let that get rewritten. I'll work on a patch that returns a clone of the ReRankedQuery wrapped around the rewritten query. The only trick to this is that the cloned ReRankQuery will also have rewrite() called on it because of the loop in IndexSearcher.rewrite(). So the cloned ReRankQuery will need to return itself from the rewrite method, rather then cloning again in an endless loop.
        Hide
        Yonik Seeley added a comment - - edited

        The normal pattern is a little different here because we need to return the re-written query wrapped in a ReRankQuery in order to preserver the re-rank Explain.

        But that is the normal pattern of how rewrite() is implemented.

        The only trick to this is that the cloned ReRankQuery will also have rewrite() called on it because of the loop in IndexSearcher.rewrite()

        This isn't a problem if you follow the normal pattern of returning a clone only if one of your sub-queries was changed by the rewrite. The query will change the first time through the loop and won't change the second time.

        Check something like BoostedQuery.rewrite() for the standard way of doing it.

        Show
        Yonik Seeley added a comment - - edited The normal pattern is a little different here because we need to return the re-written query wrapped in a ReRankQuery in order to preserver the re-rank Explain. But that is the normal pattern of how rewrite() is implemented. The only trick to this is that the cloned ReRankQuery will also have rewrite() called on it because of the loop in IndexSearcher.rewrite() This isn't a problem if you follow the normal pattern of returning a clone only if one of your sub-queries was changed by the rewrite. The query will change the first time through the loop and won't change the second time. Check something like BoostedQuery.rewrite() for the standard way of doing it.
        Hide
        Yonik Seeley added a comment -

        As for ReRankQuery.boostedPriority, which is retrieved from the request context, it seems like it can be safely removed?
        It's only used to pass to the collector during creation, and that's also where it's set.

        Show
        Yonik Seeley added a comment - As for ReRankQuery.boostedPriority, which is retrieved from the request context, it seems like it can be safely removed? It's only used to pass to the collector during creation, and that's also where it's set.
        Hide
        Joel Bernstein added a comment -

        I believe it was kept around to handle the cache warming scenario when the QueryElevationComponent is not called. I'll need to do a full review again though because the code looks odd.

        Show
        Joel Bernstein added a comment - I believe it was kept around to handle the cache warming scenario when the QueryElevationComponent is not called. I'll need to do a full review again though because the code looks odd.
        Hide
        Joel Bernstein added a comment -

        Supporting the QueryElevationComponent with ReRanking was tricky

        Show
        Joel Bernstein added a comment - Supporting the QueryElevationComponent with ReRanking was tricky
        Hide
        Joel Bernstein added a comment -

        Ok, new patch using the normal pattern for query rewriting.

        Show
        Joel Bernstein added a comment - Ok, new patch using the normal pattern for query rewriting.
        Hide
        Yonik Seeley added a comment -

        Don't forget to replicate the boost in clone()

        Show
        Yonik Seeley added a comment - Don't forget to replicate the boost in clone()
        Hide
        Joel Bernstein added a comment -

        New patch cloning the boost as well. The boost on the ReRankQuery is not actually applied though. There is a reRankWeight param that serves as a boost. The boost is included in the hashCode and equals of the ReRankQuery because the test framework enforces that.

        Show
        Joel Bernstein added a comment - New patch cloning the boost as well. The boost on the ReRankQuery is not actually applied though. There is a reRankWeight param that serves as a boost. The boost is included in the hashCode and equals of the ReRankQuery because the test framework enforces that.
        Hide
        Emad Nashed added a comment -

        Thanks a lot for that, it works for me! (Y)

        Show
        Emad Nashed added a comment - Thanks a lot for that, it works for me! (Y)
        Hide
        Joel Bernstein added a comment -

        Ok, this is looking pretty good. New test scenario ensures that cache hit occurs when the main query is re-written. The reRank Explain is showing up properly in the debugQuery output.

        Running the full test suite now so hope to commit and backport to the 5x branch shortly.

        Thanks Emad Nashed for reporting the bug and collaborating on the ticket. Thanks Yonik Seeley for guidance on the rewrite.

        Show
        Joel Bernstein added a comment - Ok, this is looking pretty good. New test scenario ensures that cache hit occurs when the main query is re-written. The reRank Explain is showing up properly in the debugQuery output. Running the full test suite now so hope to commit and backport to the 5x branch shortly. Thanks Emad Nashed for reporting the bug and collaborating on the ticket. Thanks Yonik Seeley for guidance on the rewrite.
        Hide
        ASF subversion and git services added a comment -

        Commit 1686213 from Joel Bernstein in branch 'dev/trunk'
        [ https://svn.apache.org/r1686213 ]

        SOLR-7689: ReRankQuery rewrite method can change the QueryResultKey causing cache misses

        Show
        ASF subversion and git services added a comment - Commit 1686213 from Joel Bernstein in branch 'dev/trunk' [ https://svn.apache.org/r1686213 ] SOLR-7689 : ReRankQuery rewrite method can change the QueryResultKey causing cache misses
        Hide
        ASF subversion and git services added a comment -

        Commit 1686236 from Joel Bernstein in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1686236 ]

        SOLR-7689: ReRankQuery rewrite method can change the QueryResultKey causing cache misses

        Show
        ASF subversion and git services added a comment - Commit 1686236 from Joel Bernstein in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1686236 ] SOLR-7689 : ReRankQuery rewrite method can change the QueryResultKey causing cache misses
        Hide
        Joel Bernstein added a comment -

        I have this labeled as 5.2.2 fix, but not sure if there will be a 5.2.2. I'll keep an eye out for it and backport to lucene_solr_5_2 if it materializes. I'll also hold off updating the CHANGES.txt until we know what release it will be in.

        Show
        Joel Bernstein added a comment - I have this labeled as 5.2.2 fix, but not sure if there will be a 5.2.2. I'll keep an eye out for it and backport to lucene_solr_5_2 if it materializes. I'll also hold off updating the CHANGES.txt until we know what release it will be in.
        Hide
        Shalin Shekhar Mangar added a comment -

        I'll also hold off updating the CHANGES.txt until we know what release it will be in.

        Creating the list of changes for 5.2.2 in the CHANGES.txt is part of the release process. So, please add this issue to the CHANGES.txt under 5.3 so that we can make a decision to backport this to 5.2.2. At least, this way this fix will not be forgotten even if you're on vacation or not paying attention.

        Show
        Shalin Shekhar Mangar added a comment - I'll also hold off updating the CHANGES.txt until we know what release it will be in. Creating the list of changes for 5.2.2 in the CHANGES.txt is part of the release process. So, please add this issue to the CHANGES.txt under 5.3 so that we can make a decision to backport this to 5.2.2. At least, this way this fix will not be forgotten even if you're on vacation or not paying attention.
        Hide
        Joel Bernstein added a comment -

        ok

        Show
        Joel Bernstein added a comment - ok
        Hide
        ASF subversion and git services added a comment -

        Commit 1686243 from Joel Bernstein in branch 'dev/trunk'
        [ https://svn.apache.org/r1686243 ]

        SOLR-7689: Updating CHANGES.txt

        Show
        ASF subversion and git services added a comment - Commit 1686243 from Joel Bernstein in branch 'dev/trunk' [ https://svn.apache.org/r1686243 ] SOLR-7689 : Updating CHANGES.txt
        Hide
        ASF subversion and git services added a comment -

        Commit 1686245 from Joel Bernstein in branch 'dev/trunk'
        [ https://svn.apache.org/r1686245 ]

        SOLR-7689: Updating CHANGES.txt

        Show
        ASF subversion and git services added a comment - Commit 1686245 from Joel Bernstein in branch 'dev/trunk' [ https://svn.apache.org/r1686245 ] SOLR-7689 : Updating CHANGES.txt
        Hide
        ASF subversion and git services added a comment -

        Commit 1686246 from Joel Bernstein in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1686246 ]

        SOLR-7689: Updating CHANGES.txt

        Show
        ASF subversion and git services added a comment - Commit 1686246 from Joel Bernstein in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1686246 ] SOLR-7689 : Updating CHANGES.txt

          People

          • Assignee:
            Joel Bernstein
            Reporter:
            Emad Nashed
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development