Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9267

ZkSecurityMigrator should not create /controller node

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.5.0
    • admin
    • None
    • Patch, Important

    Description

      As we can see in these source codes – ZkSecurityMigrator.scala#L226

      ZkSecurityMigrator checks and sets acl recursively for each path in SecureRootPaths. And /controller is also in SecureRootPaths.

      As we can predicted, zkClient.makeSurePersistentPathExists() will create /controller node if /controller is not existed.

      /controller is a EPHEMERAL node for controller election, but makeSurePersistentPathExists() will create a PERSISTENT node with null data.

      If that happens, null data will cause a NPE, and the controller cannot be elected, kafka cluster will be unavailable .
      In addition, a PERSISTENT node doesn't disappear automatically, we have to delete it manually to fix the problem.

       

      PERSISTENT /controller node with null data in zk:

      [zk: localhost:2181(CONNECTED) 16] get /kafka/controller
      null
      cZxid = 0x1100002284
      ctime = Tue Dec 03 18:37:26 CST 2019
      mZxid = 0x1100002284
      mtime = Tue Dec 03 18:37:26 CST 2019
      pZxid = 0x1100002284
      cversion = 0
      dataVersion = 0
      aclVersion = 1
      ephemeralOwner = 0x0
      dataLength = 0
      numChildren = 0

      Normal /controller node in zk:

      [zk: localhost:2181(CONNECTED) 21] get /kafka/controller
      {"version":1,"brokerid":1001,"timestamp":"1575370170528"}
      cZxid = 0x11000023e1
      ctime = Tue Dec 03 18:49:30 CST 2019
      mZxid = 0x11000023e1
      mtime = Tue Dec 03 18:49:30 CST 2019
      pZxid = 0x11000023e1
      cversion = 0
      dataVersion = 0
      aclVersion = 0
      ephemeralOwner = 0x16ecb572df50021
      dataLength = 57
      numChildren = 0

       NPE in controller.log : 

      [2019-11-21 15:02:41,276] INFO [ControllerEventThread controllerId=1002] Starting (kafka.controller.ControllerEventManager$ControllerEventThread)
      [2019-11-21 15:02:41,282] ERROR [ControllerEventThread controllerId=1002] Error processing event Startup (kafka.controller.ControllerEventManager$ControllerEventThread)
      java.lang.NullPointerException
       at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:857)
       at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2572)
       at kafka.utils.Json$.parseBytes(Json.scala:62)
       at kafka.zk.ControllerZNode$.decode(ZkData.scala:56)
       at kafka.zk.KafkaZkClient.getControllerId(KafkaZkClient.scala:902)
       at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1199)
       at kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1148)
       at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
       at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
       at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
       at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
       at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
       at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)

       

      So, I submit a PR that ZkSecurityMigrator will not handle /controller node when /controller is not existed.

      This bug seems to affect all versions, please review and merge the PR as soon as possible.

      Thanks!

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            nanerlee NanerLee
            Manikumar Manikumar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment