Kafka
  1. Kafka
  2. KAFKA-308

Corrupted message stored in log segment on disk

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.7
    • Fix Version/s: None
    • Component/s: core
    • Labels:
      None

      Description

      One of our consumers got stuck on a particular topic partition and threw the following exception -

      2012/03/16 05:20:51.285 ERROR [FetcherRunnable] [FetchRunnable-0] [kafka] error in FetcherRunnable for service-call:33-0: fetched offset = 387722824645: consumed offset = 387722824645
      kafka.common.InvalidMessageSizeException: invalid message size: 393216 only received bytes: 143266 at 387722824645( possible causes (1) a single message larger than the fetch size; (2) log corruption )
      at kafka.message.ByteBufferMessageSet$$anon$1.makeNextOuter(ByteBufferMessageSet.scala:114)
      at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:161)
      at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:94)
      at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:59)
      at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:51)
      at kafka.message.ByteBufferMessageSet.shallowValidBytes(ByteBufferMessageSet.scala:65)
      at kafka.message.ByteBufferMessageSet.validBytes(ByteBufferMessageSet.scala:60)
      at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:57)
      at kafka.consumer.FetcherRunnable$$anonfun$run$5.apply(FetcherRunnable.scala:79)
      at kafka.consumer.FetcherRunnable$$anonfun$run$5.apply(FetcherRunnable.scala:65)
      at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
      at scala.collection.immutable.List.foreach(List.scala:45)
      at kafka.consumer.FetcherRunnable.run(FetcherRunnable.scala:65)

      We ran the DumpLogSegments tool on the log segment for that partition and it shows the log segment is corrupted -

      [2012-03-17 17:44:45,269] INFO offset: 387722824645 isvalid: false payloadsize: 393211 magic: 0 compresscodec: NoCompressionCodec (kafka.tools.DumpLogSegments$)
      [2012-03-17 17:44:45,269] INFO

      Reading file message set from location 394088 (kafka.message.FileMessageSet)
      [2012-03-17 17:44:45,269] INFO Creating message byte buffer of size 1634499840 (kafka.message.FileMessageSet)
      Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
      at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
      at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
      at kafka.message.FileMessageSet$$anon$1.makeNext(FileMessageSet.scala:126)
      at kafka.message.FileMessageSet$$anon$1.makeNext(FileMessageSet.scala:108)
      at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:59)
      at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:51)
      at scala.collection.Iterator$class.foreach(Iterator.scala:631)
      at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:30)
      at scala.collection.IterableLike$class.foreach(IterableLike.scala:79)
      at kafka.message.MessageSet.foreach(MessageSet.scala:87)
      at kafka.tools.DumpLogSegments$.main(DumpLogSegments.scala:92)
      at kafka.tools.DumpLogSegments.main(DumpLogSegments.scala)

      Upon inspecting the log segment using hexdump, it shows that the corrupted message had a suspicious size (larger than the rest of the messages for that topic), followed by a magic byte value of 0 and attributes value of 3

      nnarkhed-ld:kafka-trunk nnarkhed$ hexdump /tmp/387722823777.kafka -s 868 -n 6 -x
      0000364 0600 0000 0300

      The first 4 bytes are the size of the mesage (393216) and the last 2 bytes are the magic byte followed by attributes byte.

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          732d 20h 50m 1 Neha Narkhede 20/Mar/14 21:48
          Neha Narkhede made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Cannot Reproduce [ 5 ]
          Hide
          Neha Narkhede added a comment -

          Yes, agree.

          Show
          Neha Narkhede added a comment - Yes, agree.
          Hide
          Jay Kreps added a comment -

          Neha Narkhede If no updates perhaps we should close this--perhaps it was just random disk corruption or something...

          Show
          Jay Kreps added a comment - Neha Narkhede If no updates perhaps we should close this--perhaps it was just random disk corruption or something...
          Neha Narkhede made changes -
          Link This issue is broken by KAFKA-310 [ KAFKA-310 ]
          Neha Narkhede made changes -
          Field Original Value New Value
          Link This issue is broken by KAFKA-309 [ KAFKA-309 ]
          Hide
          Neha Narkhede added a comment -
          Show
          Neha Narkhede added a comment - The corrupted log segment is uploaded here - http://people.apache.org/~nehanarkhede/kafka-misc/kafka-308/corrupted-log.tar.gz
          Neha Narkhede created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Neha Narkhede
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development