Solr
  1. Solr
  2. SOLR-7173

Fix ReplicationFactorTest on Windows

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1
    • Component/s: None
    • Labels:
      None

      Description

      The ReplicationFactorTest fails on the Windows build with NoHttpResponseException, as seen here: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4502/testReport/junit/org.apache.solr.cloud/ReplicationFactorTest/test/

      Adding a retry logic similar to HttpPartitionTest's doSend() method makes the test pass on Windows.

      1. SOLR-7173.patch
        6 kB
        Ishan Chattopadhyaya
      2. SOLR-7173.patch
        4 kB
        Ishan Chattopadhyaya
      3. SOLR-7173.patch
        4 kB
        Ishan Chattopadhyaya

        Issue Links

          Activity

          Hide
          Ishan Chattopadhyaya added a comment -

          Here's the patch to fix the test, as tested on my Windows box.

          Show
          Ishan Chattopadhyaya added a comment - Here's the patch to fix the test, as tested on my Windows box.
          Hide
          Timothy Potter added a comment -

          Ishan Chattopadhyaya thanks for taking this up ... a couple of minor suggestions about your patch:

          1) please add some logging around the retries

          2) I've found it better to use recursion with a decrementing counter vs. nested re-try blocks so something like the following, see where I use maxRetries as a param:

          protected int sendBatch(List<SolrInputDocument> batch, int waitBeforeRetry, int maxRetries) throws Exception {
              int sent = 0;
              final Timer.Context sendTimerCtxt = sendBatchToSolrTimer.time();
              try {
                UpdateRequest updateRequest = new UpdateRequest();
                ModifiableSolrParams params = updateRequest.getParams();
                if (params == null) {
                  params = new ModifiableSolrParams();
                  updateRequest.setParams(params);
                }
                updateRequest.add(batch);
                cloudSolrServer.request(updateRequest);
                sent = batch.size();
              } catch (Exception exc) {
                Throwable rootCause = SolrException.getRootCause(exc);
                boolean wasCommError = ...
                if (wasCommError) {
                  if (--maxRetries > 0) {
                    log.warn("ERROR: " + rootCause + " ... Sleeping for "
                        + waitBeforeRetry + " seconds before re-try ...");
                    Thread.sleep(waitBeforeRetry * 1000L);
                    sent = sendBatch(batch, waitBeforeRetry, maxRetries);
                  } else {
                    log.error("No more retries available! Add batch failed due to: " + rootCause);
                    throw exc;
                  }
                }
              } finally {
                sendTimerCtxt.stop();
              }
          
              batch.clear();
              return sent;
            }
          
          Show
          Timothy Potter added a comment - Ishan Chattopadhyaya thanks for taking this up ... a couple of minor suggestions about your patch: 1) please add some logging around the retries 2) I've found it better to use recursion with a decrementing counter vs. nested re-try blocks so something like the following, see where I use maxRetries as a param: protected int sendBatch(List<SolrInputDocument> batch, int waitBeforeRetry, int maxRetries) throws Exception { int sent = 0; final Timer.Context sendTimerCtxt = sendBatchToSolrTimer.time(); try { UpdateRequest updateRequest = new UpdateRequest(); ModifiableSolrParams params = updateRequest.getParams(); if (params == null ) { params = new ModifiableSolrParams(); updateRequest.setParams(params); } updateRequest.add(batch); cloudSolrServer.request(updateRequest); sent = batch.size(); } catch (Exception exc) { Throwable rootCause = SolrException.getRootCause(exc); boolean wasCommError = ... if (wasCommError) { if (--maxRetries > 0) { log.warn( "ERROR: " + rootCause + " ... Sleeping for " + waitBeforeRetry + " seconds before re- try ..." ); Thread .sleep(waitBeforeRetry * 1000L); sent = sendBatch(batch, waitBeforeRetry, maxRetries); } else { log.error( "No more retries available! Add batch failed due to: " + rootCause); throw exc; } } } finally { sendTimerCtxt.stop(); } batch.clear(); return sent; }
          Hide
          Ishan Chattopadhyaya added a comment -

          Thanks for your suggestion. I've updated the patch; I've added the warning before retrying. However, instead of the recursive approach (through my initial few attempts, couldn't get the tests to pass), I went for an iterative one; do you suggest I go with the recursive approach?

          Also, since I had copied the initial retry logic from HttpPartitionTest's sendDoc(), should I update that code as well with this iterative/recursive retry?

          Show
          Ishan Chattopadhyaya added a comment - Thanks for your suggestion. I've updated the patch; I've added the warning before retrying. However, instead of the recursive approach (through my initial few attempts, couldn't get the tests to pass), I went for an iterative one; do you suggest I go with the recursive approach? Also, since I had copied the initial retry logic from HttpPartitionTest's sendDoc(), should I update that code as well with this iterative/recursive retry?
          Hide
          Mark Miller added a comment -

          Awesome, thank you. I figured we where just missing some retries - this issue should also solve SOLR-6944 (this can also still happen on linux too, just doesn't happen consistently like on windows).

          Show
          Mark Miller added a comment - Awesome, thank you. I figured we where just missing some retries - this issue should also solve SOLR-6944 (this can also still happen on linux too, just doesn't happen consistently like on windows).
          Hide
          Mark Miller added a comment -

          should I update that code as well with this iterative/recursive retry?

          Yes. All the better if we can just consolidate the code in some test util method rather than duping it.

          Show
          Mark Miller added a comment - should I update that code as well with this iterative/recursive retry? Yes. All the better if we can just consolidate the code in some test util method rather than duping it.
          Hide
          Ishan Chattopadhyaya added a comment -

          Updating the patch, now covers the HttpPartitionTest's retry logic. The retry logic is now in the AbstractFullDistribZkTest.sendDocsWithRetry().

          Show
          Ishan Chattopadhyaya added a comment - Updating the patch, now covers the HttpPartitionTest's retry logic. The retry logic is now in the AbstractFullDistribZkTest.sendDocsWithRetry().
          Hide
          ASF subversion and git services added a comment -

          Commit 1666266 from Timothy Potter in branch 'dev/trunk'
          [ https://svn.apache.org/r1666266 ]

          SOLR-7173: Fix ReplicationFactorTest on Windows

          Show
          ASF subversion and git services added a comment - Commit 1666266 from Timothy Potter in branch 'dev/trunk' [ https://svn.apache.org/r1666266 ] SOLR-7173 : Fix ReplicationFactorTest on Windows
          Hide
          Timothy Potter added a comment -
          Show
          Timothy Potter added a comment - thanks Ishan Chattopadhyaya !
          Hide
          ASF subversion and git services added a comment -

          Commit 1666289 from Timothy Potter in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1666289 ]

          SOLR-7173: Fix ReplicationFactorTest on Windows

          Show
          ASF subversion and git services added a comment - Commit 1666289 from Timothy Potter in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1666289 ] SOLR-7173 : Fix ReplicationFactorTest on Windows
          Hide
          Timothy Potter added a comment -

          Bulk close after 5.1 release

          Show
          Timothy Potter added a comment - Bulk close after 5.1 release

            People

            • Assignee:
              Timothy Potter
              Reporter:
              Ishan Chattopadhyaya
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development