Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10159

DBQ, where query is based on updated value, reordered with the update doesn't work

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.5, 7.0
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      Background/History

      If a recently updated (in-place) value is used for DBQ, the DBQ doesn't work at Lucene level, unless there's an explicit commit between the update and the DBQ, due to LUCENE-7344. To work around this, Yonik suggested that we use ulog.openRealtimeSearcher() just before the DBQ is performed. This worked fine.

      Example:

      ADD: [id=0, dv=200, title="mytitle", _version_=100]
      UPD: [id=0, dv=300, _version_=200]
      DBQ: q="dv:300", _version_=300
      

      Problem discovered now

      Suppose, in the above example, the last two commands are reordered at the replica. What would happen is: (i) the full document (_version_ 100) is received and indexed, (ii) the DBQ is received (out of ordered) and applied, and no document is deleted [so far so good] and this DBQ is buffered in ulog.deleteByQueries map, (iii) the in-place update arrives (_version 200), it is applied to the document that was added in step i. After that, the buffered DBQ is applied (at DUH2.addAndDelete()). This buffered DBQ, based on a value updated immediately before (step ii), fails to delete the document.

      What happens exactly?

      The initial DBQ query is "dv:300", but when it is applied, it is expanded to "+dv:[300 TO 300] -ConstantScore(frange(long(_version_)):[300 TO *])". In spite of doing a ulog.openRealtimeSearcher() just before the DBQ, it doesn't work.

      A different version of the query, i.e. "+dv:[300 TO 300] +_version_:[200 TO 200]" also doesn't work. As I found out, this happened due to the presence of two clauses! "+dv:[300 TO 300]" works, and so does "+_version_:[200 TO 200]", but both clauses don't work together. Also, surprisingly, even "+dv:[300 TO 300] +dv:[300 TO 300]" doesn't work (same clause repeated).

      Investigation at Lucene level

      Upon some tedious investigation into the internals of Lucene, I discovered that if I change the internal search (at BufferedUpdates) to use Sort.RELEVANCE instead of Sort.INDEXORDER (which, I think is the default when using weight/scorer), the DBQ is applied correctly.

      1. SOLR-10159.patch
        5 kB
        Ishan Chattopadhyaya
      2. SOLR-10159.patch
        6 kB
        Ishan Chattopadhyaya

        Issue Links

          Activity

          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited

          Here's a patch that adds the test for reordered DBQ, which is based on updated value.

          Also, added the hack/workaround to BufferedUpdatesStream that solves the problem. I haven't been able to reproduce a standalone Lucene test for this. My hunch is that this is a Lucene bug related to index sorting.

          To reproduce the issue, remove the changes from BufferedUpdates.java and the added test fails.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited Here's a patch that adds the test for reordered DBQ, which is based on updated value. Also, added the hack/workaround to BufferedUpdatesStream that solves the problem. I haven't been able to reproduce a standalone Lucene test for this. My hunch is that this is a Lucene bug related to index sorting. To reproduce the issue, remove the changes from BufferedUpdates.java and the added test fails.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Similarly, instead of all the changes in BufferedUpdatesStream, just changing
          final Weight weight = searcher.createNormalizedWeight(query, false); to
          final Weight weight = searcher.createNormalizedWeight(query, true);
          also passes the test reliably.

          My hunch is that this is a Lucene bug related to index sorting.

          So, maybe this has nothing to do with Index sorting, but perhaps something to do with how BooleanQueries are processed?

          Michael McCandless, Shai Erera, Hoss Man, does anything jump out to you? I shall continue to dig, but so far my attempts at writing a lucene level test to reproduce this have produced no results. (So far, this: https://paste.fedoraproject.org/paste/rAtcbOpqKFGJG57-lWQ58F5M1UNdIGYhyRLivL9gydE=/, but the test passes)

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Similarly, instead of all the changes in BufferedUpdatesStream, just changing final Weight weight = searcher.createNormalizedWeight(query, false); to final Weight weight = searcher.createNormalizedWeight(query, true); also passes the test reliably. My hunch is that this is a Lucene bug related to index sorting. So, maybe this has nothing to do with Index sorting, but perhaps something to do with how BooleanQueries are processed? Michael McCandless , Shai Erera , Hoss Man , does anything jump out to you? I shall continue to dig, but so far my attempts at writing a lucene level test to reproduce this have produced no results. (So far, this: https://paste.fedoraproject.org/paste/rAtcbOpqKFGJG57-lWQ58F5M1UNdIGYhyRLivL9gydE=/ , but the test passes)
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited

          Finally, figured out the culprit!

          BufferedUpdatesStream executes the DBQs using a searcher whose queryCache is set to null. When the query contains a DeleteByQueryWrapper (in Solr) clause, its createWeight() method obtains its own searcher (privateContext). This searcher's cache is not set to null, and hence it caches the queries.

          During the case of reordered DBQs, the DBQ is executed twice: first it cannot delete anything, since the queries return 0 results, and second when it should return results. Unfortunately, caching at this first step resulted in 0 results in the latter step (even though there is an updated value now).

          The fix is to set the queryCache for the DBQW's privateContext to whatever the initial searcher's queryCache was set to.

          Planning to commit the attached patch soon. Would be great if someone could review.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited Finally, figured out the culprit! BufferedUpdatesStream executes the DBQs using a searcher whose queryCache is set to null. When the query contains a DeleteByQueryWrapper (in Solr) clause, its createWeight() method obtains its own searcher (privateContext). This searcher's cache is not set to null, and hence it caches the queries. During the case of reordered DBQs, the DBQ is executed twice: first it cannot delete anything, since the queries return 0 results, and second when it should return results. Unfortunately, caching at this first step resulted in 0 results in the latter step (even though there is an updated value now). The fix is to set the queryCache for the DBQW's privateContext to whatever the initial searcher's queryCache was set to. Planning to commit the attached patch soon. Would be great if someone could review.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 82f598d895a811a799edbb7760da589cd2d3f51d in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=82f598d ]

          SOLR-10159: When DBQ is reordered with an in-place update, upon whose updated value the DBQ is based on, the DBQ fails due to excessive caching in DeleteByQueryWrapper

          Show
          jira-bot ASF subversion and git services added a comment - Commit 82f598d895a811a799edbb7760da589cd2d3f51d in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=82f598d ] SOLR-10159 : When DBQ is reordered with an in-place update, upon whose updated value the DBQ is based on, the DBQ fails due to excessive caching in DeleteByQueryWrapper
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2cd35288ec7e19af93df6275e8d028de0777bd1e in lucene-solr's branch refs/heads/branch_6x from Ishan Chattopadhyaya
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2cd3528 ]

          SOLR-10159: When DBQ is reordered with an in-place update, upon whose updated value the DBQ is based on, the DBQ fails due to excessive caching in DeleteByQueryWrapper

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2cd35288ec7e19af93df6275e8d028de0777bd1e in lucene-solr's branch refs/heads/branch_6x from Ishan Chattopadhyaya [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2cd3528 ] SOLR-10159 : When DBQ is reordered with an in-place update, upon whose updated value the DBQ is based on, the DBQ fails due to excessive caching in DeleteByQueryWrapper
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Nice catch!

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Nice catch!

            People

            • Assignee:
              ichattopadhyaya Ishan Chattopadhyaya
              Reporter:
              ichattopadhyaya Ishan Chattopadhyaya
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development