Kafka
  1. Kafka
  2. KAFKA-928

new topics may not be processed after ZK session expiration in controller

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: controller
    • Labels:
      None

      Description

      When controller loses its ZK session, it calls partitionStateMachine.shutdown in SessionExpirationListener, which marks the partitionStateMachine as down. However, when the controller regains its controllership, it doesn't mark partitionStateMachine as up. In TopicChangeListener, we only process new topics if the partitionStateMachine is marked up.

      1. kafka-928.patch
        1 kB
        Neha Narkhede
      2. kafka-928-v2.patch
        3 kB
        Neha Narkhede

        Activity

        Jun Rao created issue -
        Hide
        Neha Narkhede added a comment -

        The bug is more serious. If the controller goes through a session expiration and gets re-elected, which is rare, it will stop responding to all new topic state changes. Not only that, it will also stop responding to broker failures or startups.

        The root cause of the bug is in the startup() API of the state machines. Both hasStarted and hasShutdown() are required since the former prevents the state machines from acting on state changes before their internal data structures are ready. The latter prevents state machines from acting on state changes while they are being shutdown.

        Show
        Neha Narkhede added a comment - The bug is more serious. If the controller goes through a session expiration and gets re-elected, which is rare, it will stop responding to all new topic state changes. Not only that, it will also stop responding to broker failures or startups. The root cause of the bug is in the startup() API of the state machines. Both hasStarted and hasShutdown() are required since the former prevents the state machines from acting on state changes before their internal data structures are ready. The latter prevents state machines from acting on state changes while they are being shutdown.
        Neha Narkhede made changes -
        Field Original Value New Value
        Attachment kafka-928.patch [ 12585737 ]
        Neha Narkhede made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Neha Narkhede made changes -
        Assignee Neha Narkhede [ nehanarkhede ]
        Neha Narkhede made changes -
        Component/s controller [ 12320321 ]
        Component/s core [ 12315217 ]
        Hide
        Swapnil Ghike added a comment -

        +1, thanks for fixing this.

        Show
        Swapnil Ghike added a comment - +1, thanks for fixing this.
        Hide
        Jun Rao added a comment -

        Thanks for the patch. It seems to me that hasStarted should be set to false on shutdown too. If that's the case, I don't see why we need both hasStarted and hasShutdown.

        Show
        Jun Rao added a comment - Thanks for the patch. It seems to me that hasStarted should be set to false on shutdown too. If that's the case, I don't see why we need both hasStarted and hasShutdown.
        Hide
        Neha Narkhede added a comment -

        I think you are right, we don't need both anymore. See the updated patch.

        Show
        Neha Narkhede added a comment - I think you are right, we don't need both anymore. See the updated patch.
        Neha Narkhede made changes -
        Attachment kafka-928-v2.patch [ 12585919 ]
        Hide
        Jun Rao added a comment -

        Thanks for patch v2. +1.

        Show
        Jun Rao added a comment - Thanks for patch v2. +1.
        Hide
        Neha Narkhede added a comment -

        Thanks for the review, committed patch to 08

        Show
        Neha Narkhede added a comment - Thanks for the review, committed patch to 08
        Neha Narkhede made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Neha Narkhede made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Swapnil Ghike added a comment -

        Was just about to comment, perhaps it would be good to rename hasStarted to isRunning like in KafkaController. +1 otherwise.

        Show
        Swapnil Ghike added a comment - Was just about to comment, perhaps it would be good to rename hasStarted to isRunning like in KafkaController. +1 otherwise.

          People

          • Assignee:
            Neha Narkhede
            Reporter:
            Jun Rao
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development