Kafka
  1. Kafka
  2. KAFKA-308

Corrupted message stored in log segment on disk

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.7
    • Fix Version/s: None
    • Component/s: core
    • Labels:
      None

      Description

      One of our consumers got stuck on a particular topic partition and threw the following exception -

      2012/03/16 05:20:51.285 ERROR [FetcherRunnable] [FetchRunnable-0] [kafka] error in FetcherRunnable for service-call:33-0: fetched offset = 387722824645: consumed offset = 387722824645
      kafka.common.InvalidMessageSizeException: invalid message size: 393216 only received bytes: 143266 at 387722824645( possible causes (1) a single message larger than the fetch size; (2) log corruption )
      at kafka.message.ByteBufferMessageSet$$anon$1.makeNextOuter(ByteBufferMessageSet.scala:114)
      at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:161)
      at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:94)
      at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:59)
      at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:51)
      at kafka.message.ByteBufferMessageSet.shallowValidBytes(ByteBufferMessageSet.scala:65)
      at kafka.message.ByteBufferMessageSet.validBytes(ByteBufferMessageSet.scala:60)
      at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:57)
      at kafka.consumer.FetcherRunnable$$anonfun$run$5.apply(FetcherRunnable.scala:79)
      at kafka.consumer.FetcherRunnable$$anonfun$run$5.apply(FetcherRunnable.scala:65)
      at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
      at scala.collection.immutable.List.foreach(List.scala:45)
      at kafka.consumer.FetcherRunnable.run(FetcherRunnable.scala:65)

      We ran the DumpLogSegments tool on the log segment for that partition and it shows the log segment is corrupted -

      [2012-03-17 17:44:45,269] INFO offset: 387722824645 isvalid: false payloadsize: 393211 magic: 0 compresscodec: NoCompressionCodec (kafka.tools.DumpLogSegments$)
      [2012-03-17 17:44:45,269] INFO

      Reading file message set from location 394088 (kafka.message.FileMessageSet)
      [2012-03-17 17:44:45,269] INFO Creating message byte buffer of size 1634499840 (kafka.message.FileMessageSet)
      Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
      at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
      at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
      at kafka.message.FileMessageSet$$anon$1.makeNext(FileMessageSet.scala:126)
      at kafka.message.FileMessageSet$$anon$1.makeNext(FileMessageSet.scala:108)
      at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:59)
      at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:51)
      at scala.collection.Iterator$class.foreach(Iterator.scala:631)
      at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:30)
      at scala.collection.IterableLike$class.foreach(IterableLike.scala:79)
      at kafka.message.MessageSet.foreach(MessageSet.scala:87)
      at kafka.tools.DumpLogSegments$.main(DumpLogSegments.scala:92)
      at kafka.tools.DumpLogSegments.main(DumpLogSegments.scala)

      Upon inspecting the log segment using hexdump, it shows that the corrupted message had a suspicious size (larger than the rest of the messages for that topic), followed by a magic byte value of 0 and attributes value of 3

      nnarkhed-ld:kafka-trunk nnarkhed$ hexdump /tmp/387722823777.kafka -s 868 -n 6 -x
      0000364 0600 0000 0300

      The first 4 bytes are the size of the mesage (393216) and the last 2 bytes are the magic byte followed by attributes byte.

        Issue Links

          Activity

          Neha Narkhede created issue -
          Hide
          Neha Narkhede added a comment -
          Show
          Neha Narkhede added a comment - The corrupted log segment is uploaded here - http://people.apache.org/~nehanarkhede/kafka-misc/kafka-308/corrupted-log.tar.gz
          Neha Narkhede made changes -
          Field Original Value New Value
          Link This issue is broken by KAFKA-309 [ KAFKA-309 ]
          Neha Narkhede made changes -
          Link This issue is broken by KAFKA-310 [ KAFKA-310 ]
          Hide
          Jay Kreps added a comment -

          Neha Narkhede If no updates perhaps we should close this--perhaps it was just random disk corruption or something...

          Show
          Jay Kreps added a comment - Neha Narkhede If no updates perhaps we should close this--perhaps it was just random disk corruption or something...
          Hide
          Neha Narkhede added a comment -

          Yes, agree.

          Show
          Neha Narkhede added a comment - Yes, agree.
          Neha Narkhede made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Cannot Reproduce [ 5 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Neha Narkhede
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development