Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-4207

Partitions stopped after a rapid restart of a broker

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.9.0.1, 0.10.0.1
    • None
    • controller
    • None

    Description

      Environment:
      4 Kafka brokers
      10,000 topics with one partition each, replication factor 3
      Partitions with 4KB data each
      No data being produced or consumed

      Scenario:
      Initiate controlled shutdown on one broker
      Interrupt controlled shutdown prior completion with a SIGKILL
      Start a new broker with the same broker ID as broker that was just killed immediately

      Symptoms:
      After starting the new broker, the other three brokers in the cluster will see under replicated partitions forever for some partitions that are hosted on the broker that was killed and restarted

      Cause:
      Today, the controller sends a StopReplica command for each replica hosted on a broker that has initiated a controlled shutdown. For a large number of replicas this can take awhile. When the broker that is doing the controlled shutdown is killed, the StopReplica commands are queued up even though the request queue to the broker is cleared. When the broker comes back online, the StopReplica commands that were queued, get sent to the broker that just started up.

      CC: junrao since he's familiar with the scenario seen here

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              cotedm Dustin Cote
              Jun Rao Jun Rao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: