Kafka
  1. Kafka
  2. KAFKA-1398

Topic config changes can be lost and cause fatal exceptions on broker restarts

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.8.1
    • Fix Version/s: 0.8.1.1
    • Component/s: None
    • Labels:
      None

      Description

      Our topic config cleanup policy seems to be broken. When a broker is
      bounced and starting up:
      1 - Read all the children of the config change path
      2 - For each, if the change id is greater than the last executed change,
      then extract the topic information.
      3 - If there is a log for that topic on this broker, then apply the change.
      However, if there is no log, then delete the config change.

      In step 3, a delete triggers a child change watch firing on all the other
      brokers. The other brokers currently take all the children of the config
      path but will ignore those config changes that are less than the last
      executed change. At least one issue here is that if a broker does not have
      partitions for a topic then the lastExecutedChange is not updated (for
      that topic).

      Consider this scenario:

      • Three brokers 0, 1, 2
      • Topic A has partitions only assigned to broker 0
      • Topic B has partitions only assigned to broker 1
      • Topic C has partitions only assigned to broker 2
      • Change 0: topic A
      • Change 1: topic B
      • Change 2: topic C
      • lastExecutedChange on broker 0 is 0
      • lastExecutedChange on broker 1 is 1
      • lastExecutedChange on broker 2 is 2
      • Bounce broker 1
      • The above bounce will cause Change 0 and Change 2 to get deleted.
      • Watch fires on broker 0 and 1
      • Broker 0 will try and read the topic corresponding to change 1 (since its
        lastExecutedChange is 0) and then change 2. That read will fail:

      2014/04/15 19:35:34.236 INFO [TopicConfigManager] [main] [kafka-server] [] Processed topic config change 25 for topic xyz, setting new config to

      {retention.ms=3600000, segment.ms=3600000}

      .
      2014/04/15 19:35:34.238 FATAL [KafkaServerStartable] [main] [kafka-server] [] Fatal error during KafkaServerStable startup. Prepare to shutdown
      org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /config/changes/config_change_0000000026
      at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
      at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
      at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
      at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
      at kafka.utils.ZkUtils$.readData(ZkUtils.scala:467)
      at kafka.server.TopicConfigManager$$anonfun$kafka$server$TopicConfigManager$$processConfigChanges$2.apply(TopicConfigManager.scala:97)
      at kafka.server.TopicConfigManager$$anonfun$kafka$server$TopicConfigManager$$processConfigChanges$2.apply(TopicConfigManager.scala:93)
      at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
      at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
      at kafka.server.TopicConfigManager.kafka$server$TopicConfigManager$$processConfigChanges(TopicConfigManager.scala:93)
      at kafka.server.TopicConfigManager.processAllConfigChanges(TopicConfigManager.scala:81)
      at kafka.server.TopicConfigManager.startup(TopicConfigManager.scala:72)
      at kafka.server.KafkaServer.startup(KafkaServer.scala:104)
      at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
      ...
      Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /config/changes/config_change_0000000026
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
      at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
      at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
      at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
      at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
      at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
      at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
      ... 39 more

      Another issue is that there are two logging statements with incorrect
      qualifiers which makes things a little harder to debug. E.g.,

      2014/04/15 19:35:34.223 ERROR [TopicConfigManager] [kafka-server] [] Ignoring topic config change %d for topic %s since the change has expired

      1. KAFKA-1398.patch
        6 kB
        Jay Kreps
      2. KAFKA-1398.patch
        7 kB
        Jay Kreps
      3. KAFKA-1398.patch
        7 kB
        Jay Kreps
      4. KAFKA-1398_2014-04-18_13:03:03.patch
        3 kB
        Jay Kreps

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          2d 14h 55m 1 Jay Kreps 18/Apr/14 17:30
          Resolved Resolved Reopened Reopened
          4h 11m 1 Joel Koshy 18/Apr/14 21:41
          Reopened Reopened Resolved Resolved
          2h 41m 1 Joel Koshy 19/Apr/14 00:23
          Resolved Resolved Closed Closed
          38d 19h 19m 1 Joel Koshy 27/May/14 19:42
          Joel Koshy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Joel Koshy made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Joel Koshy added a comment -

          Committed to 0.8.1 as well

          Show
          Joel Koshy added a comment - Committed to 0.8.1 as well
          Joel Koshy made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Joel Koshy added a comment -

          Reopening to keep track of the follow-up. Also, I need to commit to 0.8.1.

          Show
          Joel Koshy added a comment - Reopening to keep track of the follow-up. Also, I need to commit to 0.8.1.
          Hide
          Jay Kreps added a comment -

          Updated reviewboard https://reviews.apache.org/r/20471/
          against branch trunk

          Show
          Jay Kreps added a comment - Updated reviewboard https://reviews.apache.org/r/20471/ against branch trunk
          Jay Kreps made changes -
          Attachment KAFKA-1398_2014-04-18_13:03:03.patch [ 12640879 ]
          Hide
          Jay Kreps added a comment -

          Created reviewboard https://reviews.apache.org/r/20498/
          against branch trunk

          Show
          Jay Kreps added a comment - Created reviewboard https://reviews.apache.org/r/20498/ against branch trunk
          Jay Kreps made changes -
          Attachment KAFKA-1398.patch [ 12640877 ]
          Hide
          Jay Kreps added a comment -

          Created reviewboard https://reviews.apache.org/r/20492/
          against branch trunk

          Show
          Jay Kreps added a comment - Created reviewboard https://reviews.apache.org/r/20492/ against branch trunk
          Jay Kreps made changes -
          Attachment KAFKA-1398.patch [ 12640860 ]
          Jay Kreps made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Jay Kreps added a comment -

          Looks like this got broken with the delete topic work.

          I refactored a bit

          • Added a unit test for the simple case of changing config
          • Moved purge logic into another method
          • Changed the zk ops to handle the case where the znode is purged in between the notification and the read.
          Show
          Jay Kreps added a comment - Looks like this got broken with the delete topic work. I refactored a bit Added a unit test for the simple case of changing config Moved purge logic into another method Changed the zk ops to handle the case where the znode is purged in between the notification and the read.
          Hide
          Jay Kreps added a comment -

          Created reviewboard https://reviews.apache.org/r/20471/
          against branch trunk

          Show
          Jay Kreps added a comment - Created reviewboard https://reviews.apache.org/r/20471/ against branch trunk
          Jay Kreps made changes -
          Attachment KAFKA-1398.patch [ 12640749 ]
          Joel Koshy made changes -
          Assignee Jay Kreps [ jkreps ]
          Joel Koshy made changes -
          Field Original Value New Value
          Link This issue is depended upon by KAFKA-1380 [ KAFKA-1380 ]
          Joel Koshy created issue -

            People

            • Assignee:
              Jay Kreps
              Reporter:
              Joel Koshy
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development