Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-4298

LogCleaner writes inconsistent compressed message set if topic message format != message format

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.10.0.1
    • Fix Version/s: 0.10.1.0
    • Component/s: None
    • Labels:
      None

      Description

      When cleaning the log, we don't want to convert messages to the format configured for the topic due to KAFKA-3915. However, the cleaner logic for writing compressed messages (in case some messages in the message set were not retained) writes the topic message format version in the magic field of the outer message instead of the actual message format. The choice of the absolute/relative offset for the inner messages will also be based on the topic message format version.

      For example, if there is an old compressed message set with magic=0 in the log and the topic is configured for magic=1, then after cleaning, the new message set will have a wrapper with magic=1, the nested messages will still have magic=0, but the message offsets will be relative. If this happens, there does not seem to be an easy way to recover without manually fixing up the log.

      The offsets still work correctly as both the clients and broker use the outer message format version to decide if the relative offset needs to be converted to an absolute offset. So the main problem turns out to be that `ByteBufferMessageSet.deepIterator` throws an exception if there is a mismatch between outer and inner message format version.

      if (newMessage.magic != wrapperMessage.magic)
                throw new IllegalStateException(s"Compressed message has magic value ${wrapperMessage.magic} " +
                  s"but inner message has magic value ${newMessage.magic}")
      

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/kafka/pull/2019

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/2019
          Hide
          hachikuji Jason Gustafson added a comment -

          Issue resolved by pull request 2019
          https://github.com/apache/kafka/pull/2019

          Show
          hachikuji Jason Gustafson added a comment - Issue resolved by pull request 2019 https://github.com/apache/kafka/pull/2019
          Hide
          ijuma Ismael Juma added a comment - - edited

          Great catch. I'm a bit unsure why this hasn't been reported so far. I think the correct description is:

          "When cleaning the log, we don't want to convert messages to the format configured for the topic due to KAFKA-3915. However, the cleaner logic for writing compressed messages (in case some messages in the message set were not retained) writes the topic message format version in the magic field of the outer message instead of the actual message format. The choice of the absolute/relative offset for the inner messages will also be based on the topic message format version.

          For example, if there is an old compressed message set with magic=0 in the log and the topic is configured for magic=1, then after cleaning, the new message set will have a wrapper with magic=1, the nested messages will still have magic=0, but the message offsets will be relative. If this happens, there does not seem to be an easy way to recover without manually fixing up the log."

          Edit: updated the description.

          Show
          ijuma Ismael Juma added a comment - - edited Great catch. I'm a bit unsure why this hasn't been reported so far. I think the correct description is: "When cleaning the log, we don't want to convert messages to the format configured for the topic due to KAFKA-3915 . However, the cleaner logic for writing compressed messages (in case some messages in the message set were not retained) writes the topic message format version in the magic field of the outer message instead of the actual message format. The choice of the absolute/relative offset for the inner messages will also be based on the topic message format version. For example, if there is an old compressed message set with magic=0 in the log and the topic is configured for magic=1, then after cleaning, the new message set will have a wrapper with magic=1, the nested messages will still have magic=0, but the message offsets will be relative. If this happens, there does not seem to be an easy way to recover without manually fixing up the log." Edit: updated the description.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user hachikuji opened a pull request:

          https://github.com/apache/kafka/pull/2019

          KAFKA-4298: Ensure compressed message sets are converted when cleaning the log

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/hachikuji/kafka KAFKA-4298

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/kafka/pull/2019.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #2019


          commit af3b31b4d94ece1603ac470bfd8d781558987501
          Author: Jason Gustafson <jason@confluent.io>
          Date: 2016-10-13T04:56:52Z

          KAFKA-4298: Ensure compressed message sets are converted when cleaning the log


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user hachikuji opened a pull request: https://github.com/apache/kafka/pull/2019 KAFKA-4298 : Ensure compressed message sets are converted when cleaning the log You can merge this pull request into a Git repository by running: $ git pull https://github.com/hachikuji/kafka KAFKA-4298 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2019.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2019 commit af3b31b4d94ece1603ac470bfd8d781558987501 Author: Jason Gustafson <jason@confluent.io> Date: 2016-10-13T04:56:52Z KAFKA-4298 : Ensure compressed message sets are converted when cleaning the log

            People

            • Assignee:
              hachikuji Jason Gustafson
              Reporter:
              hachikuji Jason Gustafson
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development