Kafka
  1. Kafka
  2. KAFKA-928

new topics may not be processed after ZK session expiration in controller

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: controller
    • Labels:
      None

      Description

      When controller loses its ZK session, it calls partitionStateMachine.shutdown in SessionExpirationListener, which marks the partitionStateMachine as down. However, when the controller regains its controllership, it doesn't mark partitionStateMachine as up. In TopicChangeListener, we only process new topics if the partitionStateMachine is marked up.

      1. kafka-928.patch
        1 kB
        Neha Narkhede
      2. kafka-928-v2.patch
        3 kB
        Neha Narkhede

        Activity

        Hide
        Swapnil Ghike added a comment -

        Was just about to comment, perhaps it would be good to rename hasStarted to isRunning like in KafkaController. +1 otherwise.

        Show
        Swapnil Ghike added a comment - Was just about to comment, perhaps it would be good to rename hasStarted to isRunning like in KafkaController. +1 otherwise.
        Hide
        Neha Narkhede added a comment -

        Thanks for the review, committed patch to 08

        Show
        Neha Narkhede added a comment - Thanks for the review, committed patch to 08
        Hide
        Jun Rao added a comment -

        Thanks for patch v2. +1.

        Show
        Jun Rao added a comment - Thanks for patch v2. +1.
        Hide
        Neha Narkhede added a comment -

        I think you are right, we don't need both anymore. See the updated patch.

        Show
        Neha Narkhede added a comment - I think you are right, we don't need both anymore. See the updated patch.
        Hide
        Jun Rao added a comment -

        Thanks for the patch. It seems to me that hasStarted should be set to false on shutdown too. If that's the case, I don't see why we need both hasStarted and hasShutdown.

        Show
        Jun Rao added a comment - Thanks for the patch. It seems to me that hasStarted should be set to false on shutdown too. If that's the case, I don't see why we need both hasStarted and hasShutdown.
        Hide
        Swapnil Ghike added a comment -

        +1, thanks for fixing this.

        Show
        Swapnil Ghike added a comment - +1, thanks for fixing this.
        Hide
        Neha Narkhede added a comment -

        The bug is more serious. If the controller goes through a session expiration and gets re-elected, which is rare, it will stop responding to all new topic state changes. Not only that, it will also stop responding to broker failures or startups.

        The root cause of the bug is in the startup() API of the state machines. Both hasStarted and hasShutdown() are required since the former prevents the state machines from acting on state changes before their internal data structures are ready. The latter prevents state machines from acting on state changes while they are being shutdown.

        Show
        Neha Narkhede added a comment - The bug is more serious. If the controller goes through a session expiration and gets re-elected, which is rare, it will stop responding to all new topic state changes. Not only that, it will also stop responding to broker failures or startups. The root cause of the bug is in the startup() API of the state machines. Both hasStarted and hasShutdown() are required since the former prevents the state machines from acting on state changes before their internal data structures are ready. The latter prevents state machines from acting on state changes while they are being shutdown.

          People

          • Assignee:
            Neha Narkhede
            Reporter:
            Jun Rao
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development