Kafka
  1. Kafka
  2. KAFKA-1398

Topic config changes can be lost and cause fatal exceptions on broker restarts

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.8.1
    • Fix Version/s: 0.8.1.1
    • Component/s: None
    • Labels:
      None

      Description

      Our topic config cleanup policy seems to be broken. When a broker is
      bounced and starting up:
      1 - Read all the children of the config change path
      2 - For each, if the change id is greater than the last executed change,
      then extract the topic information.
      3 - If there is a log for that topic on this broker, then apply the change.
      However, if there is no log, then delete the config change.

      In step 3, a delete triggers a child change watch firing on all the other
      brokers. The other brokers currently take all the children of the config
      path but will ignore those config changes that are less than the last
      executed change. At least one issue here is that if a broker does not have
      partitions for a topic then the lastExecutedChange is not updated (for
      that topic).

      Consider this scenario:

      • Three brokers 0, 1, 2
      • Topic A has partitions only assigned to broker 0
      • Topic B has partitions only assigned to broker 1
      • Topic C has partitions only assigned to broker 2
      • Change 0: topic A
      • Change 1: topic B
      • Change 2: topic C
      • lastExecutedChange on broker 0 is 0
      • lastExecutedChange on broker 1 is 1
      • lastExecutedChange on broker 2 is 2
      • Bounce broker 1
      • The above bounce will cause Change 0 and Change 2 to get deleted.
      • Watch fires on broker 0 and 1
      • Broker 0 will try and read the topic corresponding to change 1 (since its
        lastExecutedChange is 0) and then change 2. That read will fail:

      2014/04/15 19:35:34.236 INFO [TopicConfigManager] [main] [kafka-server] [] Processed topic config change 25 for topic xyz, setting new config to

      {retention.ms=3600000, segment.ms=3600000}

      .
      2014/04/15 19:35:34.238 FATAL [KafkaServerStartable] [main] [kafka-server] [] Fatal error during KafkaServerStable startup. Prepare to shutdown
      org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /config/changes/config_change_0000000026
      at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
      at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
      at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
      at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
      at kafka.utils.ZkUtils$.readData(ZkUtils.scala:467)
      at kafka.server.TopicConfigManager$$anonfun$kafka$server$TopicConfigManager$$processConfigChanges$2.apply(TopicConfigManager.scala:97)
      at kafka.server.TopicConfigManager$$anonfun$kafka$server$TopicConfigManager$$processConfigChanges$2.apply(TopicConfigManager.scala:93)
      at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
      at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
      at kafka.server.TopicConfigManager.kafka$server$TopicConfigManager$$processConfigChanges(TopicConfigManager.scala:93)
      at kafka.server.TopicConfigManager.processAllConfigChanges(TopicConfigManager.scala:81)
      at kafka.server.TopicConfigManager.startup(TopicConfigManager.scala:72)
      at kafka.server.KafkaServer.startup(KafkaServer.scala:104)
      at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
      ...
      Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /config/changes/config_change_0000000026
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
      at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
      at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
      at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
      at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
      at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
      at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
      ... 39 more

      Another issue is that there are two logging statements with incorrect
      qualifiers which makes things a little harder to debug. E.g.,

      2014/04/15 19:35:34.223 ERROR [TopicConfigManager] [kafka-server] [] Ignoring topic config change %d for topic %s since the change has expired

      1. KAFKA-1398_2014-04-18_13:03:03.patch
        3 kB
        Jay Kreps
      2. KAFKA-1398.patch
        7 kB
        Jay Kreps
      3. KAFKA-1398.patch
        7 kB
        Jay Kreps
      4. KAFKA-1398.patch
        6 kB
        Jay Kreps

        Issue Links

          Activity

          Hide
          Jay Kreps added a comment -

          Created reviewboard https://reviews.apache.org/r/20471/
          against branch trunk

          Show
          Jay Kreps added a comment - Created reviewboard https://reviews.apache.org/r/20471/ against branch trunk
          Hide
          Jay Kreps added a comment -

          Looks like this got broken with the delete topic work.

          I refactored a bit

          • Added a unit test for the simple case of changing config
          • Moved purge logic into another method
          • Changed the zk ops to handle the case where the znode is purged in between the notification and the read.
          Show
          Jay Kreps added a comment - Looks like this got broken with the delete topic work. I refactored a bit Added a unit test for the simple case of changing config Moved purge logic into another method Changed the zk ops to handle the case where the znode is purged in between the notification and the read.
          Hide
          Jay Kreps added a comment -

          Created reviewboard https://reviews.apache.org/r/20492/
          against branch trunk

          Show
          Jay Kreps added a comment - Created reviewboard https://reviews.apache.org/r/20492/ against branch trunk
          Hide
          Jay Kreps added a comment -

          Created reviewboard https://reviews.apache.org/r/20498/
          against branch trunk

          Show
          Jay Kreps added a comment - Created reviewboard https://reviews.apache.org/r/20498/ against branch trunk
          Hide
          Jay Kreps added a comment -

          Updated reviewboard https://reviews.apache.org/r/20471/
          against branch trunk

          Show
          Jay Kreps added a comment - Updated reviewboard https://reviews.apache.org/r/20471/ against branch trunk
          Hide
          Joel Koshy added a comment -

          Reopening to keep track of the follow-up. Also, I need to commit to 0.8.1.

          Show
          Joel Koshy added a comment - Reopening to keep track of the follow-up. Also, I need to commit to 0.8.1.
          Hide
          Joel Koshy added a comment -

          Committed to 0.8.1 as well

          Show
          Joel Koshy added a comment - Committed to 0.8.1 as well

            People

            • Assignee:
              Jay Kreps
              Reporter:
              Joel Koshy
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development