Solr
  1. Solr
  2. SOLR-6241

HttpPartitionTest.testRf3WithLeaderFailover fails sometimes

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10.2, 5.0
    • Component/s: SolrCloud, Tests
    • Labels:
      None

      Description

      This test fails sometimes locally as well as on jenkins.

      Expected 2 of 3 replicas to be active but only found 1....
      at org.junit.Assert.fail(Assert.java:93)
              at org.junit.Assert.assertTrue(Assert.java:43)
              at org.apache.solr.cloud.HttpPartitionTest.testRf3WithLeaderFailover(HttpPartitionTest.java:367)
              at org.apache.solr.cloud.HttpPartitionTest.doTest(HttpPartitionTest.java:148)
              at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:863)
      

        Activity

        Hide
        Shalin Shekhar Mangar added a comment -

        I have looked at this test and the failure is because it sleeps for 10s to make sure that a recovery completes and sometimes it doesn't. We should try harder.

        Show
        Shalin Shekhar Mangar added a comment - I have looked at this test and the failure is because it sleeps for 10s to make sure that a recovery completes and sometimes it doesn't. We should try harder.
        Hide
        Shalin Shekhar Mangar added a comment -

        Okay, this has started failing more frequently. Looks like there might be a genuine problem here.

        Show
        Shalin Shekhar Mangar added a comment - Okay, this has started failing more frequently. Looks like there might be a genuine problem here.
        Hide
        ASF subversion and git services added a comment -

        Commit 1610364 from shalin@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1610364 ]

        SOLR-6241: Harden the HttpPartitionTest

        Show
        ASF subversion and git services added a comment - Commit 1610364 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1610364 ] SOLR-6241 : Harden the HttpPartitionTest
        Hide
        ASF subversion and git services added a comment -

        Commit 1610365 from shalin@apache.org in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1610365 ]

        SOLR-6241: Harden the HttpPartitionTest

        Show
        ASF subversion and git services added a comment - Commit 1610365 from shalin@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1610365 ] SOLR-6241 : Harden the HttpPartitionTest
        Hide
        Shalin Shekhar Mangar added a comment -

        There wasn't a bug in the test. Some of the recent failures were due to SOLR-6235 which is fixed. I committed further changes to the test to increase the timeout values for recovery. That should take care of the spurious failures.

        Show
        Shalin Shekhar Mangar added a comment - There wasn't a bug in the test. Some of the recent failures were due to SOLR-6235 which is fixed. I committed further changes to the test to increase the timeout values for recovery. That should take care of the spurious failures.
        Hide
        Shalin Shekhar Mangar added a comment -

        I still see some exceptions such as:

        No registered leader was found after waiting for 60000ms , collection: c8n_1x3_lf slice: shard1
        Stacktrace
        
        org.apache.solr.common.SolrException: No registered leader was found after waiting for 60000ms , collection: c8n_1x3_lf slice: shard1
        	at __randomizedtesting.SeedInfo.seed([CBCC4F6420498B0C:4A2AC17C5716EB30]:0)
        	at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:567)
        	at org.apache.solr.cloud.HttpPartitionTest.testRf3WithLeaderFailover(HttpPartitionTest.java:370)
        	at org.apache.solr.cloud.HttpPartitionTest.doTest(HttpPartitionTest.java:150)
        	at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:863)
        
        Show
        Shalin Shekhar Mangar added a comment - I still see some exceptions such as: No registered leader was found after waiting for 60000ms , collection: c8n_1x3_lf slice: shard1 Stacktrace org.apache.solr.common.SolrException: No registered leader was found after waiting for 60000ms , collection: c8n_1x3_lf slice: shard1 at __randomizedtesting.SeedInfo.seed([CBCC4F6420498B0C:4A2AC17C5716EB30]:0) at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:567) at org.apache.solr.cloud.HttpPartitionTest.testRf3WithLeaderFailover(HttpPartitionTest.java:370) at org.apache.solr.cloud.HttpPartitionTest.doTest(HttpPartitionTest.java:150) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:863)
        Hide
        Timothy Potter added a comment -

        I went ahead and disabled this test on trunk & branch_4x using the AwaitsFix annotation. I'm digging into the failure as well Shalin, thanks for the help!

        Show
        Timothy Potter added a comment - I went ahead and disabled this test on trunk & branch_4x using the AwaitsFix annotation. I'm digging into the failure as well Shalin, thanks for the help!
        Hide
        Timothy Potter added a comment -

        I'm doing some refactoring as part of SOLR-6511 and looks to have fixed this issue.

        Show
        Timothy Potter added a comment - I'm doing some refactoring as part of SOLR-6511 and looks to have fixed this issue.
        Hide
        Timothy Potter added a comment -

        Recent refactorings for SOLR-6511 have resolved the test fails; several days without a fail on Jenkins

        Show
        Timothy Potter added a comment - Recent refactorings for SOLR-6511 have resolved the test fails; several days without a fail on Jenkins

          People

          • Assignee:
            Timothy Potter
            Reporter:
            Shalin Shekhar Mangar
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development