Kafka
  1. Kafka
  2. KAFKA-705

Controlled shutdown doesn't seem to work on more than one broker in a cluster

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: core
    • Labels:

      Description

      I wrote a script (attached here) to basically round robin through the brokers in a cluster doing the following 2 operations on each of them -

      1. Send the controlled shutdown admin command. If it succeeds
      2. Restart the broker

      What I've observed is that only one broker is able to finish the above successfully the first time around. For the rest of the iterations, no broker is able to shutdown using the admin command and every single time it fails with the error message stating the same number of leaders on every broker.

      1. kafka-705-incremental-v2.patch
        3 kB
        Joel Koshy
      2. kafka-705-v1.patch
        2 kB
        Joel Koshy
      3. shutdown-command
        0.2 kB
        Neha Narkhede
      4. shutdown_brokers_eat.py
        3 kB
        Neha Narkhede

        Activity

        Joel Koshy made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Joel Koshy made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Joel Koshy added a comment -

        Yes we can close this.

        Show
        Joel Koshy added a comment - Yes we can close this.
        Hide
        Jay Kreps added a comment -

        Joel, this is done, no?

        Show
        Jay Kreps added a comment - Joel, this is done, no?
        Hide
        Joel Koshy added a comment -

        Thanks for reviewing. I checked-in the incremental patch as well. Will leave this jira open for now until it can be verified.

        Show
        Joel Koshy added a comment - Thanks for reviewing. I checked-in the incremental patch as well. Will leave this jira open for now until it can be verified.
        Hide
        Neha Narkhede added a comment -

        +1

        Show
        Neha Narkhede added a comment - +1
        Joel Koshy made changes -
        Attachment kafka-705-incremental-v2.patch [ 12565853 ]
        Hide
        Joel Koshy added a comment -

        Here is what I meant in my last comment.

        Show
        Joel Koshy added a comment - Here is what I meant in my last comment.
        Hide
        Joel Koshy added a comment -

        I committed the fix to 0.8 with a small edit: used the liveOrShuttingDownBrokers field.

        Another small issue is that we send a stop replica fetchers to the shutting down broker even if
        controlled shutdown did not complete. This "prematurely" forces the broker out of the ISR of those
        partitions. I think it should be safe to avoid sending the stop replica request if controlled shutdown
        has not completely moved leadership of partitions off the shutting down broker.

        Show
        Joel Koshy added a comment - I committed the fix to 0.8 with a small edit: used the liveOrShuttingDownBrokers field. Another small issue is that we send a stop replica fetchers to the shutting down broker even if controlled shutdown did not complete. This "prematurely" forces the broker out of the ISR of those partitions. I think it should be safe to avoid sending the stop replica request if controlled shutdown has not completely moved leadership of partitions off the shutting down broker.
        Hide
        Neha Narkhede added a comment -

        +1 on the fix. And there is a problem with the script I wrote. This fix is correct, but the script will fail because it uses the shutdown command in a way that is not recommended or intended. It shuts down one broker, restarts it, doesn't wait until the restart is completed and the first broker re-registers itself in zookeeper and proceeds to shutting down the next broker. Since the replication factor is 2, if both these brokers were the replicas for some partitions, they go into the under replicated state and the script is never able to shut any other broker down after that.

        I think we should include this fix.

        Show
        Neha Narkhede added a comment - +1 on the fix. And there is a problem with the script I wrote. This fix is correct, but the script will fail because it uses the shutdown command in a way that is not recommended or intended. It shuts down one broker, restarts it, doesn't wait until the restart is completed and the first broker re-registers itself in zookeeper and proceeds to shutting down the next broker. Since the replication factor is 2, if both these brokers were the replicas for some partitions, they go into the under replicated state and the script is never able to shut any other broker down after that. I think we should include this fix.
        Joel Koshy made changes -
        Attachment kafka-705-v1.patch [ 12565560 ]
        Hide
        Joel Koshy added a comment -

        Here's a simple fix.

        I don't really see any good reason why we shouldn't allow starting
        a fetcher to a broker that is shutting down but not completely
        shut down yet if a leader still exists on that broker.

        Show
        Joel Koshy added a comment - Here's a simple fix. I don't really see any good reason why we shouldn't allow starting a fetcher to a broker that is shutting down but not completely shut down yet if a leader still exists on that broker.
        Hide
        Joel Koshy added a comment -

        I think this is why it happens:

        https://github.com/apache/kafka/blob/03eb903ce223ab55c5acbcf4243ce805aaaf4fad/core/src/main/scala/kafka/controller/ReplicaStateMachine.scala#L150

        It could occur as follows. Suppose there's a partition 'P' assigned to brokers x and y; leaderAndIsr = y,

        {x, y}

        1. Controlled shutdown of broker x; leaderAndIsr -> y,

        {y}

        2. After above completes, kill -15 and then restart broker x
        3. Immediately do a controlled shutdown of broker y; so now y is in the list of shutting down brokers.

        Due to the above, x will not start its follower to 'P' on broker y.

        Adding sufficient wait time between (2) and (3) seems to address the issue (in your script there's no sleep), but we should handle it properly in the shutdown code.
        Will think about a fix for that.

        Show
        Joel Koshy added a comment - I think this is why it happens: https://github.com/apache/kafka/blob/03eb903ce223ab55c5acbcf4243ce805aaaf4fad/core/src/main/scala/kafka/controller/ReplicaStateMachine.scala#L150 It could occur as follows. Suppose there's a partition 'P' assigned to brokers x and y; leaderAndIsr = y, {x, y} 1. Controlled shutdown of broker x; leaderAndIsr -> y, {y} 2. After above completes, kill -15 and then restart broker x 3. Immediately do a controlled shutdown of broker y; so now y is in the list of shutting down brokers. Due to the above, x will not start its follower to 'P' on broker y. Adding sufficient wait time between (2) and (3) seems to address the issue (in your script there's no sleep), but we should handle it properly in the shutdown code. Will think about a fix for that.
        Hide
        Neha Narkhede added a comment -

        >> Would you by any chance be able to provide a scenario to reproduce this locally?

        I would suggest you try out on a distributed environment that is setup on a large amount of partitions and traffic. Since it is internal, I can pass on the connection url to you.

        Show
        Neha Narkhede added a comment - >> Would you by any chance be able to provide a scenario to reproduce this locally? I would suggest you try out on a distributed environment that is setup on a large amount of partitions and traffic. Since it is internal, I can pass on the connection url to you.
        Hide
        Joel Koshy added a comment -

        I set up a local cluster of three brokers and created a bunch of topics, replication factor = 2. I was able to do multiple iterations of rolling bounces without
        issue. Since this was local, I did not use your py script as it kills pid's returned by ps.

        Would you by any chance be able to provide a scenario to reproduce this locally? That said, I believe John Fung also tried to reproduce this in a
        distributed environment but was unable to do so; so I'll probably need to take a look at logs in your environment.

        Show
        Joel Koshy added a comment - I set up a local cluster of three brokers and created a bunch of topics, replication factor = 2. I was able to do multiple iterations of rolling bounces without issue. Since this was local, I did not use your py script as it kills pid's returned by ps. Would you by any chance be able to provide a scenario to reproduce this locally? That said, I believe John Fung also tried to reproduce this in a distributed environment but was unable to do so; so I'll probably need to take a look at logs in your environment.
        Neha Narkhede made changes -
        Field Original Value New Value
        Attachment shutdown_brokers_eat.py [ 12565028 ]
        Attachment shutdown-command [ 12565029 ]
        Hide
        Neha Narkhede added a comment -

        Run it with the --help option to list the description of the command line options.

        Show
        Neha Narkhede added a comment - Run it with the --help option to list the description of the command line options.
        Neha Narkhede created issue -

          People

          • Assignee:
            Joel Koshy
            Reporter:
            Neha Narkhede
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development