Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9308

SolrCloud RTG doesn't forward any params to shards, causes fqs & non-default fl params to be ignored

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.2, 7.0
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      While working on a robust randomized test for SOLR-9285, I can't seem to get filter queries on RTG to work at all – even when the docs are fully committed.

      steps to reproduce to follow in comment...

      1. SOLR-9308.patch
        23 kB
        Hoss Man
      2. SOLR-9308.patch
        22 kB
        Hoss Man
      3. SOLR-9308.patch
        15 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          hossman Hoss Man added a comment -

          Steps to reproduce with a clean checkout of master...

          • startup the cloud example:
            bin/solr -e cloud -noprompt
            ...
            
          • explicitly disable auto commit so we can test RTG with filters against the update log:
            curl -H 'Content-Type: application/json' http://localhost:8983/solr/gettingstarted/config --data-binary '{"set-property":{"updateHandler.autoSoftCommit.maxTime":"-1"}}'
            ...
            
          • add 2 docs, which we do not explicitly commit:
            curl -H 'Content-Type: application/json' http://localhost:8983/solr/gettingstarted/update --data-binary '[{"id":"xxx","aaa_i":1532757419},{"id":"yyy","aaa_i":-459637688}]'
            ...
            
          • simple RTG (against ulog) should return both docs:
            curl 'http://localhost:8983/solr/gettingstarted/get?ids=xxx,yyy'
              "response":{"numFound":2,"start":0,"docs":[
                  {
                    "id":"yyy",
                    "aaa_i":-459637688,
                    "_version_":1539875027865829376},
                  {
                    "id":"xxx",
                    "aaa_i":1532757419,
                    "_version_":1539875027875266560}]
              }}
            
          • RTG w/fq (against ulog/realtimeSearcher) should only return doc yyy, not doc xxx:
            curl 'http://localhost:8983/solr/gettingstarted/get?ids=xxx,yyy&fq=aaa_i:%5B*+TO+-459637688%5D'
            {
              "response":{"numFound":2,"start":0,"docs":[
                  {
                    "id":"yyy",
                    "aaa_i":-459637688,
                    "_version_":1539875027865829376},
                  {
                    "id":"xxx",
                    "aaa_i":1532757419,
                    "_version_":1539875027875266560}]
              }}
            

            ...UNEXPECTED RTG RESULT!

          • even a single id=xxx RTG w/fq (against ulog/realtimeSearcher) seems to be broken:
            curl 'http://localhost:8983/solr/gettingstarted/get?id=xxx&fq=aaa_i:%5B*+TO+-459637688%5D'
            {
              "response":{"numFound":1,"start":0,"docs":[
                  {
                    "id":"xxx",
                    "aaa_i":1532757419,
                    "_version_":1539876151162306560}]
              }}
            

            ...UNEXPECTED RTG RESULT!

          • sanity check that it's not just some sort of univerted field / numerics problem:
            curl 'http://localhost:8983/solr/gettingstarted/get?id=xxx&fq=bogus_s:ddd'
            {
              "doc":
              {
                "id":"xxx",
                "aaa_i":1532757419,
                "_version_":1539876677341937664}}
            

            ...UNEXPECTED RTG RESULT!

          • Commit both docs:
            curl 'http://localhost:8983/solr/gettingstarted/update?commit=true'
            ...
            
          • do a basic search for all docs with same numeric fq to confirm doc xxx doesn't match:
            curl 'http://localhost:8983/solr/gettingstarted/query?q=*:*&fq=aaa_i:%5B*+TO+-459637688%5D'
            {
              "responseHeader":{
                "zkConnected":true,
                "status":0,
                "QTime":65,
                "params":{
                  "q":"*:*",
                  "fq":"aaa_i:[* TO -459637688]"}},
              "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
                  {
                    "id":"yyy",
                    "aaa_i":-459637688,
                    "_version_":1539875027865829376}]
              }}
            
          • Also check the same RTG w/fq again now that docs are committed:
            curl 'http://localhost:8983/solr/gettingstarted/get?ids=xxx,yyy&fq=aaa_i:%5B*+TO+-459637688%5D'
            {
              "response":{"numFound":2,"start":0,"docs":[
                  {
                    "id":"yyy",
                    "aaa_i":-459637688,
                    "_version_":1539875027865829376},
                  {
                    "id":"xxx",
                    "aaa_i":1532757419,
                    "_version_":1539875027875266560}]
              }}
            

            ... RTG STILL UNEXPECTEDLY RETURNING DOC NOT MATCHING FQ!


          I've seen the code in RealtimeGetComponent that re-opens the realtime searcher and uses that if fq params are included – I've read it a couple of times and it looks fine to me, so really have no idea why these trivial little examples fail so badly

          Yonik Seeley: any idea WTF is going on here?

          Show
          hossman Hoss Man added a comment - Steps to reproduce with a clean checkout of master... startup the cloud example: bin/solr -e cloud -noprompt ... explicitly disable auto commit so we can test RTG with filters against the update log: curl -H 'Content-Type: application/json' http://localhost:8983/solr/gettingstarted/config --data-binary '{"set-property":{"updateHandler.autoSoftCommit.maxTime":"-1"}}' ... add 2 docs, which we do not explicitly commit: curl -H 'Content-Type: application/json' http://localhost:8983/solr/gettingstarted/update --data-binary '[{"id":"xxx","aaa_i":1532757419},{"id":"yyy","aaa_i":-459637688}]' ... simple RTG (against ulog) should return both docs: curl 'http://localhost:8983/solr/gettingstarted/get?ids=xxx,yyy' "response":{"numFound":2,"start":0,"docs":[ { "id":"yyy", "aaa_i":-459637688, "_version_":1539875027865829376}, { "id":"xxx", "aaa_i":1532757419, "_version_":1539875027875266560}] }} RTG w/fq (against ulog/realtimeSearcher) should only return doc yyy, not doc xxx: curl 'http://localhost:8983/solr/gettingstarted/get?ids=xxx,yyy&fq=aaa_i:%5B*+TO+-459637688%5D' { "response":{"numFound":2,"start":0,"docs":[ { "id":"yyy", "aaa_i":-459637688, "_version_":1539875027865829376}, { "id":"xxx", "aaa_i":1532757419, "_version_":1539875027875266560}] }} ... UNEXPECTED RTG RESULT! even a single id=xxx RTG w/fq (against ulog/realtimeSearcher) seems to be broken: curl 'http://localhost:8983/solr/gettingstarted/get?id=xxx&fq=aaa_i:%5B*+TO+-459637688%5D' { "response":{"numFound":1,"start":0,"docs":[ { "id":"xxx", "aaa_i":1532757419, "_version_":1539876151162306560}] }} ... UNEXPECTED RTG RESULT! sanity check that it's not just some sort of univerted field / numerics problem: curl 'http://localhost:8983/solr/gettingstarted/get?id=xxx&fq=bogus_s:ddd' { "doc": { "id":"xxx", "aaa_i":1532757419, "_version_":1539876677341937664}} ... UNEXPECTED RTG RESULT! Commit both docs: curl 'http://localhost:8983/solr/gettingstarted/update?commit=true' ... do a basic search for all docs with same numeric fq to confirm doc xxx doesn't match: curl 'http://localhost:8983/solr/gettingstarted/query?q=*:*&fq=aaa_i:%5B*+TO+-459637688%5D' { "responseHeader":{ "zkConnected":true, "status":0, "QTime":65, "params":{ "q":"*:*", "fq":"aaa_i:[* TO -459637688]"}}, "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[ { "id":"yyy", "aaa_i":-459637688, "_version_":1539875027865829376}] }} Also check the same RTG w/fq again now that docs are committed: curl 'http://localhost:8983/solr/gettingstarted/get?ids=xxx,yyy&fq=aaa_i:%5B*+TO+-459637688%5D' { "response":{"numFound":2,"start":0,"docs":[ { "id":"yyy", "aaa_i":-459637688, "_version_":1539875027865829376}, { "id":"xxx", "aaa_i":1532757419, "_version_":1539875027875266560}] }} ... RTG STILL UNEXPECTEDLY RETURNING DOC NOT MATCHING FQ! I've seen the code in RealtimeGetComponent that re-opens the realtime searcher and uses that if fq params are included – I've read it a couple of times and it looks fine to me, so really have no idea why these trivial little examples fail so badly Yonik Seeley : any idea WTF is going on here?
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Hmmm, there are definitely tests for "fq" in TestRealTimeGet.java
          So given how badly broken your test run looks, I'd say that the "fq" params aren't being received/processed.
          Given that you're starting with a solrcloud setup, I'd guess that request forwarding is being done and that "fq" params aren't being passed along.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Hmmm, there are definitely tests for "fq" in TestRealTimeGet.java So given how badly broken your test run looks, I'd say that the "fq" params aren't being received/processed. Given that you're starting with a solrcloud setup, I'd guess that request forwarding is being done and that "fq" params aren't being passed along.
          Hide
          hossman Hoss Man added a comment -

          I'd guess that request forwarding is being done and that "fq" params aren't being passed along.

          Hmmm... that seems like a good candidate, but -e cloud -noprompt spins up only 2 nodes w/2 shards, and 2 replicas. And even if i try sending the RTG requests to both nodes – or even directly to the individual cores – the filter is ignored in all cases...

          $ curl 'http://localhost:8983/solr/gettingstarted_shard1_replica2/get?id=xxx&fq=bogus_s:ddd'
          {
            "doc":
            {
              "id":"xxx",
              "aaa_i":1532757419,
              "_version_":1539876677341937664}}
          $ curl 'http://localhost:8983/solr/gettingstarted_shard2_replica2/get?id=xxx&fq=bogus_s:ddd'
          {
            "doc":
            {
              "id":"xxx",
              "aaa_i":1532757419,
              "_version_":1539876677341937664}}
          $ curl 'http://localhost:7574/solr/gettingstarted_shard1_replica1/get?id=xxx&fq=bogus_s:ddd'
          {
            "doc":
            {
              "id":"xxx",
              "aaa_i":1532757419,
              "_version_":1539876677341937664}}
          $ curl 'http://localhost:7574/solr/gettingstarted_shard2_replica1/get?id=xxx&fq=bogus_s:ddd'
          {
            "doc":
            {
              "id":"xxx",
              "aaa_i":1532757419,
              "_version_":1539876677341937664}}
          

          (at the time i ran those queries, the UI said the 2 cores on 8983 were my 2 leaders)

          Show
          hossman Hoss Man added a comment - I'd guess that request forwarding is being done and that "fq" params aren't being passed along. Hmmm... that seems like a good candidate, but -e cloud -noprompt spins up only 2 nodes w/2 shards, and 2 replicas. And even if i try sending the RTG requests to both nodes – or even directly to the individual cores – the filter is ignored in all cases... $ curl 'http://localhost:8983/solr/gettingstarted_shard1_replica2/get?id=xxx&fq=bogus_s:ddd' { "doc": { "id":"xxx", "aaa_i":1532757419, "_version_":1539876677341937664}} $ curl 'http://localhost:8983/solr/gettingstarted_shard2_replica2/get?id=xxx&fq=bogus_s:ddd' { "doc": { "id":"xxx", "aaa_i":1532757419, "_version_":1539876677341937664}} $ curl 'http://localhost:7574/solr/gettingstarted_shard1_replica1/get?id=xxx&fq=bogus_s:ddd' { "doc": { "id":"xxx", "aaa_i":1532757419, "_version_":1539876677341937664}} $ curl 'http://localhost:7574/solr/gettingstarted_shard2_replica1/get?id=xxx&fq=bogus_s:ddd' { "doc": { "id":"xxx", "aaa_i":1532757419, "_version_":1539876677341937664}} (at the time i ran those queries, the UI said the 2 cores on 8983 were my 2 leaders)
          Hide
          hossman Hoss Man added a comment -

          definitely doesn't reproduce with -e techproducts so something cloud specific is definitely the root cause.

          Show
          hossman Hoss Man added a comment - definitely doesn't reproduce with -e techproducts so something cloud specific is definitely the root cause.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          And even if i try sending the RTG requests to both nodes – or even directly to the individual cores – the filter is ignored in all cases...

          Perhaps sub-requests are being unnecessarily used even when the current core could handle it? That would be consistent with everything we've seen so far.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - And even if i try sending the RTG requests to both nodes – or even directly to the individual cores – the filter is ignored in all cases... Perhaps sub-requests are being unnecessarily used even when the current core could handle it? That would be consistent with everything we've seen so far.
          Hide
          hossman Hoss Man added a comment -

          I suspect RealTimeGetComponent.createSubRequests is broken?

          It's used by RealTimeGetComponent.distributedProcess to create ShardRequests and copies/parses params but at a skim i don't see anything about fq

          Show
          hossman Hoss Man added a comment - I suspect RealTimeGetComponent.createSubRequests is broken? It's used by RealTimeGetComponent.distributedProcess to create ShardRequests and copies/parses params but at a skim i don't see anything about fq
          Hide
          hossman Hoss Man added a comment -

          Suggested patch:

          • refactors some redundant ShardRequest init logic into a new helper method
          • updates the ShardRequest init logic to copy the params from the original request, explicitly removing a few we definitely don't want before adding the new (shard specific) "ids" param.
          • updates tests to use filter queries in some RTG requests, against both committed & uncommitted docs
          • ensures that the filter Query objects are rewritten before trying to call createWeight on them (unrelated bug that seems like it would also affect some single node RTG requests depending on the type of fq used ... discovered during testing)

          This fix also seems to resolve all of the issues noted in SOLR-9286 (which makes sense: since no params were being copied for the shard requests, any non-default fl transformers would never be generated by the shards). So this patch also enables all of the test logic that was blocked on SOLR-9286.

          Yonik Seeley - what do you think?

          Show
          hossman Hoss Man added a comment - Suggested patch: refactors some redundant ShardRequest init logic into a new helper method updates the ShardRequest init logic to copy the params from the original request, explicitly removing a few we definitely don't want before adding the new (shard specific) "ids" param. updates tests to use filter queries in some RTG requests, against both committed & uncommitted docs ensures that the filter Query objects are rewritten before trying to call createWeight on them (unrelated bug that seems like it would also affect some single node RTG requests depending on the type of fq used ... discovered during testing) This fix also seems to resolve all of the issues noted in SOLR-9286 (which makes sense: since no params were being copied for the shard requests, any non-default fl transformers would never be generated by the shards). So this patch also enables all of the test logic that was blocked on SOLR-9286 . Yonik Seeley - what do you think?
          Hide
          hossman Hoss Man added a comment -

          Updated patch:

          • Updated to apply clean on master
          • I realized this issue is the same root cause as SOLR-9289 (as well as SOLR-9286) so i've enabled those tests as well.
          Show
          hossman Hoss Man added a comment - Updated patch: Updated to apply clean on master I realized this issue is the same root cause as SOLR-9289 (as well as SOLR-9286 ) so i've enabled those tests as well.
          Hide
          hossman Hoss Man added a comment -

          Ugh...

          • TestStressCloudBlindAtomicUpdates has been using RTG + filter queries to assert that atomic updates work – but because of this issue the filter queries were getting silently ignored and the tests wasn't as strong as i thought when i wrote it.
          • TestStressCloudBlindAtomicUpdates evidently had a bug in how it formatted the fq params when trying to filter on negative numbers – but again: because of SOLR-9308 those filter queries were never getting parsed, and the test bug when unnoticed until now.

          Latest patch updated to also fix the bug in TestStressCloudBlindAtomicUpdates now that the filter queries are getting parsed & used correctly.

          Show
          hossman Hoss Man added a comment - Ugh... TestStressCloudBlindAtomicUpdates has been using RTG + filter queries to assert that atomic updates work – but because of this issue the filter queries were getting silently ignored and the tests wasn't as strong as i thought when i wrote it. TestStressCloudBlindAtomicUpdates evidently had a bug in how it formatted the fq params when trying to filter on negative numbers – but again: because of SOLR-9308 those filter queries were never getting parsed, and the test bug when unnoticed until now. Latest patch updated to also fix the bug in TestStressCloudBlindAtomicUpdates now that the filter queries are getting parsed & used correctly.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 72750167a20558789d07a1ff5eca35ea8eec3c6e in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7275016 ]

          SOLR-9308: Fix distributed RTG to forward request params, fixes fq and non-default fl params

          (cherry picked from commit b3505298a5bef76ff83b269bf87a179d027da849)

          Conflicts:
          solr/core/src/java/org/apache/solr/handler/component/RealTimeGetComponent.java

          Show
          jira-bot ASF subversion and git services added a comment - Commit 72750167a20558789d07a1ff5eca35ea8eec3c6e in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7275016 ] SOLR-9308 : Fix distributed RTG to forward request params, fixes fq and non-default fl params (cherry picked from commit b3505298a5bef76ff83b269bf87a179d027da849) Conflicts: solr/core/src/java/org/apache/solr/handler/component/RealTimeGetComponent.java
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b3505298a5bef76ff83b269bf87a179d027da849 in lucene-solr's branch refs/heads/master from Chris Hostetter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b350529 ]

          SOLR-9308: Fix distributed RTG to forward request params, fixes fq and non-default fl params

          Show
          jira-bot ASF subversion and git services added a comment - Commit b3505298a5bef76ff83b269bf87a179d027da849 in lucene-solr's branch refs/heads/master from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b350529 ] SOLR-9308 : Fix distributed RTG to forward request params, fixes fq and non-default fl params
          Hide
          mikemccand Michael McCandless added a comment -

          Bulk close resolved issues after 6.2.0 release.

          Show
          mikemccand Michael McCandless added a comment - Bulk close resolved issues after 6.2.0 release.

            People

            • Assignee:
              hossman Hoss Man
              Reporter:
              hossman Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development