Solr
  1. Solr
  2. SOLR-7746

Ping requests stopped working with distrib=true in Solr 5.2.1

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.2.1
    • Fix Version/s: 5.4, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      "steps to reproduce"
      # start 1 node SolrCloud cluster
      sh> ./bin/solr -c -p 8888
      # create a test collection (we won’t use it, but I just want to it to load solr configs to Zk)
      ./bin/solr create_collection -c test -d sample_techproducts_configs -p 8888
      # create another test collection with 2 shards
      curl 'http://localhost:8888/solr/admin/collections?action=CREATE&name=test2&numShards=2&replicationFactor=1&maxShardsPerNode=2&collection.configName=test'
      # try distrib ping request
      curl 'http://localhost:8888/solr/test2/admin/ping?wt=json&distrib=true&indent=true'
      ...
        "error":{
          "msg":"Ping query caused exception: Error from server at http://192.168.59.3:8888/solr/test2_shard2_replica1: Cannot execute the PingRequestHandler recursively"
      ...
      
      "Exception"
      2116962 [qtp599601600-13] ERROR org.apache.solr.core.SolrCore  [test2 shard2 core_node1 test2_shard2_replica1] – org.apache.solr.common.SolrException: Cannot execute the PingRequestHandler recursively
      	at org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:246)
      	at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:211)
      	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
      	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
      	at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
      	at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
      	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
      	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
      
      1. SOLR-7746.patch
        8 kB
        Michael Sun
      2. SOLR-7746.patch
        8 kB
        Michael Sun
      3. SOLR-7746.patch
        9 kB
        Michael Sun
      4. SOLR-7746.patch
        12 kB
        Michael Sun
      5. SOLR-7746.patch
        3 kB
        Michael Sun
      6. SOLR-7746.patch
        3 kB
        Michael Sun
      7. SOLR-7746.patch
        1 kB
        Michael Sun

        Activity

        Hide
        Michael Sun added a comment -

        This issue can be reproduced in trunk build too.

        Show
        Michael Sun added a comment - This issue can be reproduced in trunk build too.
        Hide
        Michael Sun added a comment -

        Attach a fix. For the ping request particular, there is a requirement that qt parameter either be a meaningful handler or be null. The fix address this requirement when requests are sent to shards.

        Show
        Michael Sun added a comment - Attach a fix. For the ping request particular, there is a requirement that qt parameter either be a meaningful handler or be null. The fix address this requirement when requests are sent to shards.
        Hide
        Gregory Chanan added a comment -

        Can we put the complexity in the PingRequestHandler instead of the SearchHandler? Localizing the complexity seems like the best long term approach here.

        Show
        Gregory Chanan added a comment - Can we put the complexity in the PingRequestHandler instead of the SearchHandler? Localizing the complexity seems like the best long term approach here.
        Hide
        Michael Sun added a comment -

        Yeah, that makes sense. Thanks Gregory Chanan for suggestion.

        Show
        Michael Sun added a comment - Yeah, that makes sense. Thanks Gregory Chanan for suggestion.
        Hide
        Michael Sun added a comment -

        Upload the patch to handle sharded ping requests inside PingRequestHandler. There are two changes.

        1. If the qt=/admin/ping for ping requests to shard, it delegates the request to the default handler, "/select".
        2. For ping requests to shard, PingRequestHandler returns what the delegated handler returns. Therefore the result can be aggregated later. (Originally PingRequestHandler removes most part from results that delegated handler returns. It works fine with non-distributed requests but can break the aggregation logic for distributed requests.)

        Show
        Michael Sun added a comment - Upload the patch to handle sharded ping requests inside PingRequestHandler. There are two changes. 1. If the qt=/admin/ping for ping requests to shard, it delegates the request to the default handler, "/select". 2. For ping requests to shard, PingRequestHandler returns what the delegated handler returns. Therefore the result can be aggregated later. (Originally PingRequestHandler removes most part from results that delegated handler returns. It works fine with non-distributed requests but can break the aggregation logic for distributed requests.)
        Hide
        Gregory Chanan added a comment -

        Looks good Michael, a few questions/comments:

        (params.getBool(ShardParams.IS_SHARD,false))
        

        Convention is to put a space after ",". Also, are you using tabs? please remove them.

            handler = core.getRequestHandler( null );
                ModifiableSolrParams wparams = new ModifiableSolrParams(params);
                wparams.remove(CommonParams.QT);
                req.setParams(wparams);
        

        Is it correct ot remove the QT or replace the QT with the default handler you are calling?

         // In case it's a query for shard, return the result from delegated handler for distributed query to merge result
              if (params.getBool(ShardParams.IS_SHARD,false)) {
                core.execute(handler, req, rsp );
                ex = rsp.getException(); 
              } else {
               core.execute(handler, req, pingrsp );
                ex = pingrsp.getException(); 
              }
        ...
         if (!params.getBool(ShardParams.IS_SHARD,false)) {
            rsp.add( "status", "OK" );
         }
        

        Is all the if-elsing necessary? What happens if you use pingrsp for whether IS_SHARD is true or not and then remove the if around the status check? What you have now doesn't look correct to me, the non-IS_SHARD case won't have OK status, right?

        Show
        Gregory Chanan added a comment - Looks good Michael, a few questions/comments: (params.getBool(ShardParams.IS_SHARD, false )) Convention is to put a space after ",". Also, are you using tabs? please remove them. handler = core.getRequestHandler( null ); ModifiableSolrParams wparams = new ModifiableSolrParams(params); wparams.remove(CommonParams.QT); req.setParams(wparams); Is it correct ot remove the QT or replace the QT with the default handler you are calling? // In case it's a query for shard, return the result from delegated handler for distributed query to merge result if (params.getBool(ShardParams.IS_SHARD, false )) { core.execute(handler, req, rsp ); ex = rsp.getException(); } else { core.execute(handler, req, pingrsp ); ex = pingrsp.getException(); } ... if (!params.getBool(ShardParams.IS_SHARD, false )) { rsp.add( "status" , "OK" ); } Is all the if-elsing necessary? What happens if you use pingrsp for whether IS_SHARD is true or not and then remove the if around the status check? What you have now doesn't look correct to me, the non-IS_SHARD case won't have OK status, right?
        Hide
        Michael Sun added a comment -

        Thanks Gregory Chanan for reviewing. Here is my answers.

        1. Will add space and other formatting.
        2. qt parameter is no longer used at that point since delegation is completed. It doesn't matter if qt is removed or set to default handler.
        3. The logic is correct for non-IS_SHARD use case and it adds OK status to rsp with the ! in last if condition.

        Show
        Michael Sun added a comment - Thanks Gregory Chanan for reviewing. Here is my answers. 1. Will add space and other formatting. 2. qt parameter is no longer used at that point since delegation is completed. It doesn't matter if qt is removed or set to default handler. 3. The logic is correct for non-IS_SHARD use case and it adds OK status to rsp with the ! in last if condition.
        Hide
        Gregory Chanan added a comment -

        on #3, sorry I didn't write that correctly. Let me try to put it another way – why not just simplify the code I listed above and get rid of the if/else handling. Does anything break if you do that?

        Also, it would be great to add a test to demonstrate the problem you are fixing so it doesn't show up again. Ideally the test should demonstrate IS_SHARD and non-IS_SHARD cases and the IS_SHARD test case should fail without your patch applied.

        Show
        Gregory Chanan added a comment - on #3, sorry I didn't write that correctly. Let me try to put it another way – why not just simplify the code I listed above and get rid of the if/else handling. Does anything break if you do that? Also, it would be great to add a test to demonstrate the problem you are fixing so it doesn't show up again. Ideally the test should demonstrate IS_SHARD and non-IS_SHARD cases and the IS_SHARD test case should fail without your patch applied.
        Hide
        Michael Sun added a comment -

        Attach patch with test. I tested that the test is gonna fail without change in PingRequestHandler. Also the code path in PingRequestHandler() is modified to separate the handling in case isShard is true or not to reduce if-else block.

        There is behavior change for PingRequestHandler in case isShard=true. In this case, it simply returns what the delegated handler returns. Javadoc is updated. Here is how new PingRequestHandler process ping request with distrib=true. This behavior change is required to aggregate results from shards (Step 7).

        1. PingRequestHandler receives ping request with distrib=true.
        2. Find the delegated handler, for example SearchHandler by default.
        3. Call SearchHandler to handle request.
        4. SearchHandler send requests to shards with distrib=false and isShard=true.
        5. [Shard] PingHandler of each shard receives the request and delegate to SearchHandler.
        6. [Shard] SearchHandler of shards process and return the result.
        7. SearchHandler in Step3 aggregate the result of all shards.
        8. Return the result to PingRequestHandler.
        9. PingRequestHandler returns either success or HTTP Error code.

        Show
        Michael Sun added a comment - Attach patch with test. I tested that the test is gonna fail without change in PingRequestHandler. Also the code path in PingRequestHandler() is modified to separate the handling in case isShard is true or not to reduce if-else block. There is behavior change for PingRequestHandler in case isShard=true. In this case, it simply returns what the delegated handler returns. Javadoc is updated. Here is how new PingRequestHandler process ping request with distrib=true. This behavior change is required to aggregate results from shards (Step 7). 1. PingRequestHandler receives ping request with distrib=true. 2. Find the delegated handler, for example SearchHandler by default. 3. Call SearchHandler to handle request. 4. SearchHandler send requests to shards with distrib=false and isShard=true. 5. [Shard] PingHandler of each shard receives the request and delegate to SearchHandler. 6. [Shard] SearchHandler of shards process and return the result. 7. SearchHandler in Step3 aggregate the result of all shards. 8. Return the result to PingRequestHandler. 9. PingRequestHandler returns either success or HTTP Error code.
        Hide
        Gregory Chanan added a comment -

        1. You should not modify TestMiniSolrCloudCluster. That's for testing whether the MiniSolrCloudCluster itself works. Write a test that uses the MiniSolrCloudCluster, there should be a number of examples. Or maybe you don't even need it – can SolrPingTest satisfy what you need?

        2.

        +      // Send distributed and non-distributed ping query
        +      cloudSolrClient.setDefaultCollection(collectionName);
        +      SolrPing req = new SolrPing();
        +      req.setDistrib(true);
        +      SolrPingResponse rsp = req.process(cloudSolrClient, collectionName);
        +      assertEquals(0, rsp.getStatus()); 
        +      
        +      cloudSolrClient.setDefaultCollection(collectionName);
        +      req = new SolrPing();
        +      req.setDistrib(false);
        +      rsp = req.process(cloudSolrClient, collectionName);
        +      assertEquals(0, rsp.getStatus());   
        

        Most of this code is unnecessary, you set the default collection multiple times, you pass the collectionName even though it's set, you create a new request when it would just suffice to set distrib.

        3.

        +  public SolrPing setDistrib(boolean distrib) {   
        +    params.add("distrib", distrib ? "true" : "false");
        +    return this;    
        +  }
        

        You shouldn't modify SolrPing just to test it. Just extend getParams on SolrPing for your distrib example.

        4. Can you just have a single one of these instead of putting it in each clause?

          // Send an error or return
        +      if( ex != null ) {
        +        throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, 
        +            "Ping query caused exception: "+ex.getMessage(), ex );
        +      }
        +    } else {
        
        Show
        Gregory Chanan added a comment - 1. You should not modify TestMiniSolrCloudCluster. That's for testing whether the MiniSolrCloudCluster itself works. Write a test that uses the MiniSolrCloudCluster, there should be a number of examples. Or maybe you don't even need it – can SolrPingTest satisfy what you need? 2. + // Send distributed and non-distributed ping query + cloudSolrClient.setDefaultCollection(collectionName); + SolrPing req = new SolrPing(); + req.setDistrib( true ); + SolrPingResponse rsp = req.process(cloudSolrClient, collectionName); + assertEquals(0, rsp.getStatus()); + + cloudSolrClient.setDefaultCollection(collectionName); + req = new SolrPing(); + req.setDistrib( false ); + rsp = req.process(cloudSolrClient, collectionName); + assertEquals(0, rsp.getStatus()); Most of this code is unnecessary, you set the default collection multiple times, you pass the collectionName even though it's set, you create a new request when it would just suffice to set distrib. 3. + public SolrPing setDistrib( boolean distrib) { + params.add( "distrib" , distrib ? " true " : " false " ); + return this ; + } You shouldn't modify SolrPing just to test it. Just extend getParams on SolrPing for your distrib example. 4. Can you just have a single one of these instead of putting it in each clause? // Send an error or return + if ( ex != null ) { + throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, + "Ping query caused exception: " +ex.getMessage(), ex ); + } + } else {
        Hide
        Gregory Chanan added a comment -

        I compared what happens in 5.1 vs your version of trunk for the commands listed in the description:
        5.1:

        curl 'http://localhost:8889/solr/test2/admin/ping?wt=json&distrib=true&indent=true'
        {
          "responseHeader":{
            "status":0,
            "QTime":26,
            "params":{
              "df":"text",
              "echoParams":"all",
              "indent":"true",
              "q":"solrpingquery",
              "distrib":"true",
              "wt":"json",
              "preferLocalShards":"false",
              "rows":"10"}},
          "status":"OK"}
        

        Trunk:

        curl 'http://localhost:8885/solr/test2/admin/ping?wt=json&distrib=true&indent=true'
        {
          "responseHeader":{
            "status":0,
            "QTime":28,
            "params":{
              "q":"{!lucene}*:*",
              "distrib":"true",
              "df":"text",
              "preferLocalShards":"false",
              "indent":"true",
              "echoParams":"all",
              "rows":"10",
              "wt":"json"}},
          "status":"OK"}
        

        Looks good, the change in the q parameter looks like it's that way because the solrconfig.xml being used is different.

        Show
        Gregory Chanan added a comment - I compared what happens in 5.1 vs your version of trunk for the commands listed in the description: 5.1: curl 'http://localhost:8889/solr/test2/admin/ping?wt=json&distrib=true&indent=true' { "responseHeader":{ "status":0, "QTime":26, "params":{ "df":"text", "echoParams":"all", "indent":"true", "q":"solrpingquery", "distrib":"true", "wt":"json", "preferLocalShards":"false", "rows":"10"}}, "status":"OK"} Trunk: curl 'http://localhost:8885/solr/test2/admin/ping?wt=json&distrib=true&indent=true' { "responseHeader":{ "status":0, "QTime":28, "params":{ "q":"{!lucene}*:*", "distrib":"true", "df":"text", "preferLocalShards":"false", "indent":"true", "echoParams":"all", "rows":"10", "wt":"json"}}, "status":"OK"} Looks good, the change in the q parameter looks like it's that way because the solrconfig.xml being used is different.
        Hide
        Michael Sun added a comment -

        Thanks Gregory Chanan for your comments. The updated patch is uploaded.

        Show
        Michael Sun added a comment - Thanks Gregory Chanan for your comments. The updated patch is uploaded.
        Hide
        Gregory Chanan added a comment -

        Looks good, two minor comments:

        +      req = new SolrPingWithDistrib();
        +      req.setDistrib(false);
        

        Just use a SolrPing here, if you can.

        +      Map<String, String> collectionProperties = new HashMap<>();
        +      collectionProperties.put(CoreDescriptor.CORE_CONFIG, "solrconfig-tlog.xml");
        +      collectionProperties.put("solr.tests.maxBufferedDocs", "100000");
        +      collectionProperties.put("solr.tests.ramBufferSizeMB", "100");
        +      // use non-test classes so RandomizedRunner isn't necessary
        +      collectionProperties.put("solr.tests.mergePolicy", "org.apache.lucene.index.TieredMergePolicy");
        +      collectionProperties.put("solr.tests.mergeScheduler", "org.apache.lucene.index.ConcurrentMergeScheduler");
        +      collectionProperties.put("solr.directoryFactory", "solr.RAMDirectoryFactory");
        +      miniCluster.createCollection(collectionName, NUM_SHARDS, REPLICATION_FACTOR, configName, collectionProperties);
        

        I think all this complexity is necessary because of the configuration you are using. Look at ConcurrentDeleteAndCreateCollectionTest – that uses the MiniSolrCloudCluster and creates a collection without this complexity.

        Show
        Gregory Chanan added a comment - Looks good, two minor comments: + req = new SolrPingWithDistrib(); + req.setDistrib( false ); Just use a SolrPing here, if you can. + Map< String , String > collectionProperties = new HashMap<>(); + collectionProperties.put(CoreDescriptor.CORE_CONFIG, "solrconfig-tlog.xml" ); + collectionProperties.put( "solr.tests.maxBufferedDocs" , "100000" ); + collectionProperties.put( "solr.tests.ramBufferSizeMB" , "100" ); + // use non-test classes so RandomizedRunner isn't necessary + collectionProperties.put( "solr.tests.mergePolicy" , "org.apache.lucene.index.TieredMergePolicy" ); + collectionProperties.put( "solr.tests.mergeScheduler" , "org.apache.lucene.index.ConcurrentMergeScheduler" ); + collectionProperties.put( "solr.directoryFactory" , "solr.RAMDirectoryFactory" ); + miniCluster.createCollection(collectionName, NUM_SHARDS, REPLICATION_FACTOR, configName, collectionProperties); I think all this complexity is necessary because of the configuration you are using. Look at ConcurrentDeleteAndCreateCollectionTest – that uses the MiniSolrCloudCluster and creates a collection without this complexity.
        Hide
        Michael Sun added a comment -

        That makes sense. The updated patch is uploaded. Thanks Gregory Chanan

        Show
        Michael Sun added a comment - That makes sense. The updated patch is uploaded. Thanks Gregory Chanan
        Hide
        Gregory Chanan added a comment -

        +1 lgtm.

        Show
        Gregory Chanan added a comment - +1 lgtm.
        Hide
        Gregory Chanan added a comment -

        Actually I found a problem with the test. It looks like

        Builder jettyConfig = JettyConfig.builder();
        

        doesn't set up the JettyConfig properly for SSL tests. You need to use buildJettyConfig instead. Not sure if we can enforce that in tests automatically.

        Show
        Gregory Chanan added a comment - Actually I found a problem with the test. It looks like Builder jettyConfig = JettyConfig.builder(); doesn't set up the JettyConfig properly for SSL tests. You need to use buildJettyConfig instead. Not sure if we can enforce that in tests automatically.
        Hide
        Michael Sun added a comment -

        Thanks Gregory Chanan for pointing it out. I updated the patch and uploaded it.

        Show
        Michael Sun added a comment - Thanks Gregory Chanan for pointing it out. I updated the patch and uploaded it.
        Hide
        ASF subversion and git services added a comment -

        Commit 1702581 from gchanan@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1702581 ]

        SOLR-7746: Ping requests stopped working with distrib=true in Solr 5.2.1

        Show
        ASF subversion and git services added a comment - Commit 1702581 from gchanan@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1702581 ] SOLR-7746 : Ping requests stopped working with distrib=true in Solr 5.2.1
        Hide
        ASF subversion and git services added a comment -

        Commit 1702582 from gchanan@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1702582 ]

        SOLR-7746: Ping requests stopped working with distrib=true in Solr 5.2.1

        Show
        ASF subversion and git services added a comment - Commit 1702582 from gchanan@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1702582 ] SOLR-7746 : Ping requests stopped working with distrib=true in Solr 5.2.1
        Hide
        Gregory Chanan added a comment -

        Committed to trunk and 5.4, thanks for the patch Michael Sun.

        Show
        Gregory Chanan added a comment - Committed to trunk and 5.4, thanks for the patch Michael Sun .
        Hide
        Mark Miller added a comment -

        FY on the CHANGES.txt entry for this one:

        SOLR-7746: Ping requests stopped working with distrib=true in Solr 5.2.1.  (Michael Sun)
        

        It should read:

        SOLR-7746: Ping requests stopped working with distrib=true in Solr 5.2.1.  (Alexey Serba, Michael Sun via Greg Chanan)
        

        Reporter always gets credit and if committer didn't really contribute we use the via credit.

        Show
        Mark Miller added a comment - FY on the CHANGES.txt entry for this one: SOLR-7746: Ping requests stopped working with distrib=true in Solr 5.2.1. (Michael Sun) It should read: SOLR-7746: Ping requests stopped working with distrib=true in Solr 5.2.1. (Alexey Serba, Michael Sun via Greg Chanan) Reporter always gets credit and if committer didn't really contribute we use the via credit.
        Hide
        Gregory Chanan added a comment -

        Thanks Mark, both the "via" and reporter points make sense. Should I go back and update the CHANGES.txt?

        Show
        Gregory Chanan added a comment - Thanks Mark, both the "via" and reporter points make sense. Should I go back and update the CHANGES.txt?
        Hide
        Mark Miller added a comment -

        Yeah, personally I think it's worth it for the reporter credit.

        Show
        Mark Miller added a comment - Yeah, personally I think it's worth it for the reporter credit.
        Hide
        ASF subversion and git services added a comment -

        Commit 1706499 from gchanan@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1706499 ]

        SOLR-7746: Update credit in CHANGES.txt

        Show
        ASF subversion and git services added a comment - Commit 1706499 from gchanan@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1706499 ] SOLR-7746 : Update credit in CHANGES.txt
        Hide
        ASF subversion and git services added a comment -

        Commit 1706502 from gchanan@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1706502 ]

        SOLR-7746: Update credit in CHANGES.txt

        Show
        ASF subversion and git services added a comment - Commit 1706502 from gchanan@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1706502 ] SOLR-7746 : Update credit in CHANGES.txt

          People

          • Assignee:
            Gregory Chanan
            Reporter:
            Alexey Serba
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development