Solr
  1. Solr
  2. SOLR-7109

Indexing threads stuck during network partition can put leader into down state

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.10.3, 5.0
    • Fix Version/s: 5.1, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      I found this recently while running some Jepsen tests. I found that some threads get stuck on zk operations for a long time in ZkController.updateLeaderInitiatedRecoveryState method and when they wake up they go ahead with setting the LIR state to down. But in the mean time, new leader has been elected and sometimes you'd get into a state where the leader itself is put into recovery causing the shard to reject all writes.

      1. SOLR-7109.patch
        17 kB
        Shalin Shekhar Mangar
      2. SOLR-7109.patch
        14 kB
        Shalin Shekhar Mangar

        Issue Links

          Activity

          Hide
          Mark Miller added a comment -

          I ran into a similar issue in SOLR-7065 with the new test I have there.

          It's still exploratory, so I just took that DOWN publish out for the time being.

          Show
          Mark Miller added a comment - I ran into a similar issue in SOLR-7065 with the new test I have there. It's still exploratory, so I just took that DOWN publish out for the time being.
          Hide
          Shalin Shekhar Mangar added a comment - - edited

          Here's a patch which uses ZooKeeper 'multi' transactions to make sure that the LIR state can be set only when the requesting leader node is still alive. This ensures that regardless of how long the network partition lasts (long GC, whatever), the node setting the LIR state must be the leader or else the LIR state cannot be set.

          Initially I attempted to use the shard leader path as the 'exists' check in the 'multi' command but this doesn't work because the leader path is always created fresh which means that it's version is always 0 and the check always succeeds regardless of who the current leader is. This is why we must use the election's leader sequence path.

          This is just a first cut of this approach. I intend to refactor some of these LIR methods – they have become too big. I will also write a test which exercises these new transactional semantics and reproduces the failure.

          Edit - I also remove the replicaUrl parameter from ZkController.ensureReplicaInLeaderInitiatedRecovery because replicaProps were already being passed as a parameter and the replica url can be derived from it.

          Show
          Shalin Shekhar Mangar added a comment - - edited Here's a patch which uses ZooKeeper 'multi' transactions to make sure that the LIR state can be set only when the requesting leader node is still alive. This ensures that regardless of how long the network partition lasts (long GC, whatever), the node setting the LIR state must be the leader or else the LIR state cannot be set. Initially I attempted to use the shard leader path as the 'exists' check in the 'multi' command but this doesn't work because the leader path is always created fresh which means that it's version is always 0 and the check always succeeds regardless of who the current leader is. This is why we must use the election's leader sequence path. This is just a first cut of this approach. I intend to refactor some of these LIR methods – they have become too big. I will also write a test which exercises these new transactional semantics and reproduces the failure. Edit - I also remove the replicaUrl parameter from ZkController.ensureReplicaInLeaderInitiatedRecovery because replicaProps were already being passed as a parameter and the replica url can be derived from it.
          Hide
          Shalin Shekhar Mangar added a comment -

          Patch updated to trunk.

          I've been testing with Jepsen on trunk with this patch and it has worked very well. Unfortunately, writing a junit test to simulate this failure is proving to be very difficult. I'm inclined to commit this as-is for now so a code review would be appreciated.

          Also note that this patch doesn't solve the problem of the threads being stuck but it only ensures that the LIR state is written only if the ephemeral sequential election node of the current leader exists.

          Show
          Shalin Shekhar Mangar added a comment - Patch updated to trunk. I've been testing with Jepsen on trunk with this patch and it has worked very well. Unfortunately, writing a junit test to simulate this failure is proving to be very difficult. I'm inclined to commit this as-is for now so a code review would be appreciated. Also note that this patch doesn't solve the problem of the threads being stuck but it only ensures that the LIR state is written only if the ephemeral sequential election node of the current leader exists.
          Hide
          Anshum Gupta added a comment -

          Looks good Shalin. There's one thing that I'd like to point:
          You've changed the signature of Zkcontroller.ensureReplicaInLeaderInitiatedRecovery(), which is a public method. Though it's advanced and internal, it's a public method and might break back-compat for developers.

          Show
          Anshum Gupta added a comment - Looks good Shalin. There's one thing that I'd like to point: You've changed the signature of Zkcontroller.ensureReplicaInLeaderInitiatedRecovery(), which is a public method. Though it's advanced and internal, it's a public method and might break back-compat for developers.
          Hide
          Ramkumar Aiyengar added a comment -

          +1

          Show
          Ramkumar Aiyengar added a comment - +1
          Hide
          Yonik Seeley added a comment -

          You've changed the signature of Zkcontroller.ensureReplicaInLeaderInitiatedRecovery(), which is a public method. Though it's advanced and internal,

          Thats' fine, we should be able to freely change stuff like that.

          Ram, was your +1 echoing Anshum's concern?

          Show
          Yonik Seeley added a comment - You've changed the signature of Zkcontroller.ensureReplicaInLeaderInitiatedRecovery(), which is a public method. Though it's advanced and internal, Thats' fine, we should be able to freely change stuff like that. Ram, was your +1 echoing Anshum's concern?
          Hide
          Mark Miller added a comment -

          I think we need to open an issue to start using annotations for what API's you can count on in Java. We can start labeling most of them internal and open them based on demand, maturity, sensibility, but a plugin writer should have an idea of what API's they can count on and still get support for things like rolling upgrades. Perhaps that just most of the basic SolrCore methods and ZKStateReader methods, but it should be something over time. Eventually it would be nice if basic plugins could survive rolling upgrades if they use some common simple API's.

          Given where things are currently though, these particular types of internal methods - especially those on ZkController, are still under considerable flux.

          Show
          Mark Miller added a comment - I think we need to open an issue to start using annotations for what API's you can count on in Java. We can start labeling most of them internal and open them based on demand, maturity, sensibility, but a plugin writer should have an idea of what API's they can count on and still get support for things like rolling upgrades. Perhaps that just most of the basic SolrCore methods and ZKStateReader methods, but it should be something over time. Eventually it would be nice if basic plugins could survive rolling upgrades if they use some common simple API's. Given where things are currently though, these particular types of internal methods - especially those on ZkController, are still under considerable flux.
          Hide
          Ramkumar Aiyengar added a comment - - edited

          Ram, was your +1 echoing Anshum's concern?

          Sorry, no, it was a +1 overall for the patch.

          I agree with you that we should be able to freely change this stuff. As Mark mentions we need to decide on what is considered as API. Personally I feel we should guarantee only the methods and classes used for anything pluggable and in SolrJ (even in SolrJ we should perhaps mark a few things internal). I know people depend on other things, but we shouldn't be burdened with back compact on those.

          Show
          Ramkumar Aiyengar added a comment - - edited Ram, was your +1 echoing Anshum's concern? Sorry, no, it was a +1 overall for the patch. I agree with you that we should be able to freely change this stuff. As Mark mentions we need to decide on what is considered as API. Personally I feel we should guarantee only the methods and classes used for anything pluggable and in SolrJ (even in SolrJ we should perhaps mark a few things internal). I know people depend on other things, but we shouldn't be burdened with back compact on those.
          Hide
          Anshum Gupta added a comment -

          Sure, I'm +1 about Mark's idea and +1 on the patch too.

          Show
          Anshum Gupta added a comment - Sure, I'm +1 about Mark's idea and +1 on the patch too.
          Hide
          ASF subversion and git services added a comment -

          Commit 1666825 from shalin@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1666825 ]

          SOLR-7109: Indexing threads stuck during network partition can put leader into down state

          Show
          ASF subversion and git services added a comment - Commit 1666825 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1666825 ] SOLR-7109 : Indexing threads stuck during network partition can put leader into down state
          Hide
          ASF subversion and git services added a comment -

          Commit 1666826 from shalin@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1666826 ]

          SOLR-7109: Indexing threads stuck during network partition can put leader into down state

          Show
          ASF subversion and git services added a comment - Commit 1666826 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1666826 ] SOLR-7109 : Indexing threads stuck during network partition can put leader into down state
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks everyone!

          Show
          Shalin Shekhar Mangar added a comment - Thanks everyone!
          Hide
          ASF subversion and git services added a comment -

          Commit 1666863 from Yonik Seeley in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1666863 ]

          SOLR-7109: fix java7 compile error

          Show
          ASF subversion and git services added a comment - Commit 1666863 from Yonik Seeley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1666863 ] SOLR-7109 : fix java7 compile error
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks for fixing the Java7 error, Yonik!

          Show
          Shalin Shekhar Mangar added a comment - Thanks for fixing the Java7 error, Yonik!
          Hide
          Timothy Potter added a comment -

          Bulk close after 5.1 release

          Show
          Timothy Potter added a comment - Bulk close after 5.1 release
          Hide
          Mark Miller added a comment -

          FYI, we found an issue with this improvement because of how it tries to ensure only a legal replica can put anoither replica into LIR. Working on a fix here in SOLR-8069.

          Show
          Mark Miller added a comment - FYI, we found an issue with this improvement because of how it tries to ensure only a legal replica can put anoither replica into LIR. Working on a fix here in SOLR-8069 .

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Shalin Shekhar Mangar
            • Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development