Kafka
  1. Kafka
  2. KAFKA-669

Irrecoverable error on leader while rolling to a new segment

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Implemented
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:

      Description

      I setup a test cluster of 5 brokers and setup migration tool to send data for ~1000 partitions to this cluster. I see a bunch of FailedToSendMessageExceptions on the producer. After tracing some of these failed requests, I found that the broker sends a ProducerResponseStatus(-1, -1) to the producer. The impact is the producer marks the messages for those partitions as failed, retries and eventually gives up. This seems like a bug in the truncation logic. I traced the state changes topic SpamReportEvent and seems like first it was the follower, so it truncated its log. Then it became the leader and got into an irrecoverable error state due to log rolling.

      [2012-12-12 01:47:01,951] INFO [Log Manager on Broker 3] Loading log 'SpamReportEvent-0' (kafka.log.LogManager)
      [2012-12-12 01:47:01,951] INFO Creating log segment /mnt/u001/kafka_08_long_running_test/kafka-logs/SpamReportEvent-0/00000000000000000000.log (kafka.log.FileMessageSet)
      [2012-12-12 01:47:01,951] INFO Created index file /mnt/u001/kafka_08_long_running_test/kafka-logs/SpamReportEvent-0/00000000000000000000.index with maxEntries = 1310720, maxIndexSize = 10485760, entries = 1310720, lastOffset = 0 (kafka.log.OffsetIndex)
      [2012-12-12 01:47:02,949] INFO Replica Manager on Broker 3: Starting the follower state transition to follow leader 4 for topic SpamReportEvent partition 0 (kafka.server.ReplicaManager)
      [2012-12-12 01:47:02,950] INFO [Kafka Log on Broker 3], Truncated log segment /mnt/u001/kafka_08_long_running_test/kafka-logs/SpamReportEvent-0/00000000000000000000.log to target offset 0 (kafka.log.Log)
      [2012-12-12 01:47:34,745] INFO Replica Manager on Broker 3: Becoming Leader for topic [SpamReportEvent] partition [0] (kafka.server.ReplicaManager)
      [2012-12-12 01:47:34,745] INFO Replica Manager on Broker 3: Completed the leader state transition for topic SpamReportEvent partition 0 (kafka.server.ReplicaManager)
      [2012-12-12 01:47:36,071] INFO Replica Manager on Broker 3: Becoming Leader for topic [SpamReportEvent] partition [0] (kafka.server.ReplicaManager)
      [2012-12-12 01:47:36,071] INFO Replica Manager on Broker 3: Completed the leader state transition for topic SpamReportEvent partition 0 (kafka.server.ReplicaManager)
      [2012-12-12 01:48:10,545] INFO Replica Manager on Broker 3: Becoming Leader for topic [SpamReportEvent] partition [0] (kafka.server.ReplicaManager)
      [2012-12-12 01:48:10,545] INFO Replica Manager on Broker 3: Completed the leader state transition for topic SpamReportEvent partition 0 (kafka.server.ReplicaManager)
      [2012-12-12 02:02:08,148] ERROR [KafkaApi-3] Error processing ProducerRequest on SpamReportEvent:0 (kafka.server.Kafka
      Apis)
      [2012-12-12 02:06:06,081] ERROR [KafkaApi-3] Error processing ProducerRequest on SpamReportEvent:0 (kafka.server.KafkaApis)
      kafka.common.KafkaException: Trying to roll a new log segment for topic partition SpamReportEvent-0 with start offset 0 while it already exists
      at kafka.log.Log.rollToOffset(Log.scala:465)
      at kafka.log.Log.roll(Log.scala:443)
      at kafka.log.Log.maybeRoll(Log.scala:431)
      at kafka.log.Log.append(Log.scala:258)
      at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:246)
      at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:239)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
      at scala.collection.immutable.Map$Map1.foreach(Map.scala:105)
      at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
      at scala.collection.immutable.Map$Map1.map(Map.scala:93)
      at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:239)
      at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:179)
      at kafka.server.KafkaApis.handle(KafkaApis.scala:56)
      at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:41)
      at java.lang.Thread.run(Thread.java:619)

        Activity

        Jun Rao made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Neha Narkhede made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Neha Narkhede [ nehanarkhede ]
        Resolution Implemented [ 10 ]
        Hide
        Jay Kreps added a comment -

        To summarize I think this bug is fixed once those other patches go in.

        Show
        Jay Kreps added a comment - To summarize I think this bug is fixed once those other patches go in.
        Hide
        Jay Kreps added a comment -

        To close the loop on this there are multiple issues here:
        1. We are not cleaning up .index files present in the directory. This is caused because of the nature of Neha's test script
        2. We are not fixing this problem in recovery because, although the recovery logic is reasonably well tested, the variable passed into the log by log manager was reversed. As a result we were running recovery whenever there was a clean shutdown and not running recovery if there was an unclean shutdown. d'oh. Patch on KAFKA-673.
        3. The final question was why doesn't the truncateTo logic reset the size of the corrupt index to 0? I looked into this and here is the sequence of events in truncateTo(0). First we need to translate 0 to a file position, because the log is empty this translates to null which indicates 0 is earlier than the latest offset in the log (this is true). As a result there is no truncation to do on the log. It is assumed that the index can't have entries not in the log. This assumption is violated by our having an arbitrarily corrupted index. I think this is actually a valid assumption--if there is nothing in the log then there should be nothing in the index and we can skip it. The problem was that we had a corrupt index which KAFKA-670 and KAFKA-673 fix.

        Show
        Jay Kreps added a comment - To close the loop on this there are multiple issues here: 1. We are not cleaning up .index files present in the directory. This is caused because of the nature of Neha's test script 2. We are not fixing this problem in recovery because, although the recovery logic is reasonably well tested, the variable passed into the log by log manager was reversed. As a result we were running recovery whenever there was a clean shutdown and not running recovery if there was an unclean shutdown. d'oh. Patch on KAFKA-673 . 3. The final question was why doesn't the truncateTo logic reset the size of the corrupt index to 0? I looked into this and here is the sequence of events in truncateTo(0). First we need to translate 0 to a file position, because the log is empty this translates to null which indicates 0 is earlier than the latest offset in the log (this is true). As a result there is no truncation to do on the log. It is assumed that the index can't have entries not in the log. This assumption is violated by our having an arbitrarily corrupted index. I think this is actually a valid assumption--if there is nothing in the log then there should be nothing in the index and we can skip it. The problem was that we had a corrupt index which KAFKA-670 and KAFKA-673 fix.
        Hide
        Jay Kreps added a comment -

        I don't see the source of the logging you added to print out the third line "Creating index file..." What is being logged as entries? if it is OffsetIndex.entries then I think this is definitely caused by KAFKA-670 as we are creating a log index whose last offset is 0 and which contains 1310720 entries, which is definitely impossible and indicates that the file was already there. That doesn't answer the core question, though, which is how is it that we are having these .index files on disk...

        Show
        Jay Kreps added a comment - I don't see the source of the logging you added to print out the third line "Creating index file..." What is being logged as entries? if it is OffsetIndex.entries then I think this is definitely caused by KAFKA-670 as we are creating a log index whose last offset is 0 and which contains 1310720 entries, which is definitely impossible and indicates that the file was already there. That doesn't answer the core question, though, which is how is it that we are having these .index files on disk...
        Neha Narkhede made changes -
        Description I setup a test cluster of 5 brokers and setup migration tool to send data for ~1000 partitions to this cluster. I see a bunch of FailedToSendMessageExceptions on the producer. After tracing some of these failed requests, I found that the broker successfully completed the requests, yet sends a ProducerResponseStatus(-1, -1) to the producer. The impact is the producer marks the messages for those partitions as failed, retries and eventually gives up. This seems like a bug on the broker.

        The producer's log says -

        [2012-12-11 23:00:22,179] DEBUG Producer sending messages with correlation id 60626 for topics Map([SpamReportEvent,0] -> ByteBufferMessageSet(MessageAndOffset(Message(magic = 2, attributes = 0, crc = 2412809952, key = null,
         payload = java.nio.HeapByteBuffer[pos=0 lim=159 cap=159]),0), )) to broker 3 on 172.20.72.46:9092 (kafka.producer.async.DefaultEventHandler)
        [2012-12-11 23:00:22,179] INFO Connected to 172.20.72.46:9092 for producing (kafka.producer.SyncProducer)
        [2012-12-11 23:00:22,185] DEBUG Producer sent messages for topics Map([SpamReportEvent,0] -> ByteBufferMessageSet(MessageAndOffset(Message(magic = 2, attributes = 0, crc = 2412809952, key = null, payload = java.nio.HeapByteBuffer[pos=0 lim=159 cap=159]),0), )) to broker 3 on
        (kafka.producer.async.DefaultEventHandler)
        [2012-12-11 23:00:22,185] DEBUG Produce request with correlation id 60626 failed due to response ProducerResponse(0,60
        626,Map([SpamReportEvent,0] -> ProducerResponseStatus(-1,-1))). List of failed topic partitions is [CommGe
        nericMessageSendEvent,1] (kafka.producer.async.DefaultEventHandler)
        [2012-12-11 23:00:22,285] INFO Fetching metadata with correlation id 60627 for 1 topic(s) Set(SpamReportEvent) (kafka.client.ClientUtils$)
        [2012-12-11 23:00:22,312] ERROR Failed to send the following requests with correlation id 60627: ArrayBuffer(KeyedMessage(SpamReportEvent,null,Message(magic = 2, attributes = 0, crc = 2412809952, key = null, payload = java.n
        io.HeapByteBuffer[pos=0 lim=159 cap=159]))) (kafka.producer.async.DefaultEventHandler)
        [2012-12-11 23:00:22,314] ERROR Error in handling batch of 200 events (kafka.producer.async.ProducerSendThread)
        kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.
                at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:73)
                at kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:104)
                at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:87)
                at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:67)
                at scala.collection.immutable.Stream.foreach(Stream.scala:254)
                at kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:66)
                at kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:44)


        Tracing produce request with correlation id 60626 on broker 3 -


        [2012-12-11 23:00:22,180] TRACE Received request with correlation id 60626 from client : ProducerRequest(0,60626,,0,3000,Map([SpamReportEvent,0] -> ByteBufferMessageSet(MessageAndOffset(Message(magic = 2, attributes = 0, crc = 2412809952, key = null, payload = java.nio.HeapByteBuffer[pos=0 lim=159 cap=159]),0), ))) (kafka.network.RequestChannel$)
        [2012-12-11 23:00:22,180] TRACE Handling ProducerRequest with correlation id 60626 from client : ProducerRequest(0,60626,,0,3000,Map([SpamReportEvent,0] -> ByteBufferMessageSet(MessageAndOffset(Message(magic = 2, attributes = 0, crc = 2412809952, key = null, payload = java.nio.HeapByteBuffer[pos=0 lim=159 cap=159]),0), ))) (kafka.request.logger)
        [2012-12-11 23:00:22,184] TRACE Completed request with correlation id 60626 and client : ProducerRequest(0,60626,,0,3000,Map([SpamReportEvent,0] -> ByteBufferMessageSet(MessageAndOffset(Message(magic = 2, attributes = 0, crc = 2412809952, key = null, payload = java.nio.HeapByteBuffer[pos=0 lim=159 cap=159]),0), ))), totalTime:4, queueTime:0, localTime:4, remoteTime:0, sendTime:0 (kafka.network.RequestChannel$)
        [2012-12-12 02:06:06,081] ERROR [KafkaApi-3] Error processing ProducerRequest on SpamReportEvent:0 (kafka.server.KafkaApis)
        kafka.common.KafkaException: Trying to roll a new log segment for topic partition SpamReportEvent-0 with start offset 0 while it already exists
                at kafka.log.Log.rollToOffset(Log.scala:465)
                at kafka.log.Log.roll(Log.scala:443)
                at kafka.log.Log.maybeRoll(Log.scala:431)
                at kafka.log.Log.append(Log.scala:258)
                at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:246)
                at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:239)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
                at scala.collection.immutable.Map$Map1.foreach(Map.scala:105)
                at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
                at scala.collection.immutable.Map$Map1.map(Map.scala:93)
                at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:239)
                at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:179)
                at kafka.server.KafkaApis.handle(KafkaApis.scala:56)
                at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:41)
                at java.lang.Thread.run(Thread.java:619)
        I setup a test cluster of 5 brokers and setup migration tool to send data for ~1000 partitions to this cluster. I see a bunch of FailedToSendMessageExceptions on the producer. After tracing some of these failed requests, I found that the broker sends a ProducerResponseStatus(-1, -1) to the producer. The impact is the producer marks the messages for those partitions as failed, retries and eventually gives up. This seems like a bug in the truncation logic. I traced the state changes topic SpamReportEvent and seems like first it was the follower, so it truncated its log. Then it became the leader and got into an irrecoverable error state due to log rolling.

        [2012-12-12 01:47:01,951] INFO [Log Manager on Broker 3] Loading log 'SpamReportEvent-0' (kafka.log.LogManager)
        [2012-12-12 01:47:01,951] INFO Creating log segment /mnt/u001/kafka_08_long_running_test/kafka-logs/SpamReportEvent-0/00000000000000000000.log (kafka.log.FileMessageSet)
        [2012-12-12 01:47:01,951] INFO Created index file /mnt/u001/kafka_08_long_running_test/kafka-logs/SpamReportEvent-0/00000000000000000000.index with maxEntries = 1310720, maxIndexSize = 10485760, entries = 1310720, lastOffset = 0 (kafka.log.OffsetIndex)
        [2012-12-12 01:47:02,949] INFO Replica Manager on Broker 3: Starting the follower state transition to follow leader 4 for topic SpamReportEvent partition 0 (kafka.server.ReplicaManager)
        [2012-12-12 01:47:02,950] INFO [Kafka Log on Broker 3], Truncated log segment /mnt/u001/kafka_08_long_running_test/kafka-logs/SpamReportEvent-0/00000000000000000000.log to target offset 0 (kafka.log.Log)
        [2012-12-12 01:47:34,745] INFO Replica Manager on Broker 3: Becoming Leader for topic [SpamReportEvent] partition [0] (kafka.server.ReplicaManager)
        [2012-12-12 01:47:34,745] INFO Replica Manager on Broker 3: Completed the leader state transition for topic SpamReportEvent partition 0 (kafka.server.ReplicaManager)
        [2012-12-12 01:47:36,071] INFO Replica Manager on Broker 3: Becoming Leader for topic [SpamReportEvent] partition [0] (kafka.server.ReplicaManager)
        [2012-12-12 01:47:36,071] INFO Replica Manager on Broker 3: Completed the leader state transition for topic SpamReportEvent partition 0 (kafka.server.ReplicaManager)
        [2012-12-12 01:48:10,545] INFO Replica Manager on Broker 3: Becoming Leader for topic [SpamReportEvent] partition [0] (kafka.server.ReplicaManager)
        [2012-12-12 01:48:10,545] INFO Replica Manager on Broker 3: Completed the leader state transition for topic SpamReportEvent partition 0 (kafka.server.ReplicaManager)
        [2012-12-12 02:02:08,148] ERROR [KafkaApi-3] Error processing ProducerRequest on SpamReportEvent:0 (kafka.server.Kafka
        Apis)
        [2012-12-12 02:06:06,081] ERROR [KafkaApi-3] Error processing ProducerRequest on SpamReportEvent:0 (kafka.server.KafkaApis)
        kafka.common.KafkaException: Trying to roll a new log segment for topic partition SpamReportEvent-0 with start offset 0 while it already exists
                at kafka.log.Log.rollToOffset(Log.scala:465)
                at kafka.log.Log.roll(Log.scala:443)
                at kafka.log.Log.maybeRoll(Log.scala:431)
                at kafka.log.Log.append(Log.scala:258)
                at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:246)
                at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:239)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
                at scala.collection.immutable.Map$Map1.foreach(Map.scala:105)
                at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
                at scala.collection.immutable.Map$Map1.map(Map.scala:93)
                at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:239)
                at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:179)
                at kafka.server.KafkaApis.handle(KafkaApis.scala:56)
                at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:41)
                at java.lang.Thread.run(Thread.java:619)
        Neha Narkhede made changes -
        Comment [ This issue was caused by KAFKA-654 on the leader broker 3. The root cause was leftover on disk state from previous runs of KAFKA-654. Basically, the broker was leader for a few partitions that didn't have a 0.log file, but had an non-empty and full 0.index file. Due to this, when it receives a produce request, it tries to roll the log and fails. This uncovered a potential recovery bug. Ideally, since the brokers were hard killed, the recovery process should detect this issue and fix itself ]
        Neha Narkhede made changes -
        Description I setup a test cluster of 5 brokers and setup migration tool to send data for ~1000 partitions to this cluster. I see a bunch of FailedToSendMessageExceptions on the producer. After tracing some of these failed requests, I found that the broker successfully completed the requests, yet sends a ProducerResponseStatus(-1, -1) to the producer. The impact is the producer marks the messages for those partitions as failed, retries and eventually gives up. This seems like a bug on the broker.

        The producer's log says -

        [2012-12-11 23:00:22,179] DEBUG Producer sending messages with correlation id 60626 for topics Map([CommGenericMessage
        SendEvent,1] -> ByteBufferMessageSet(MessageAndOffset(Message(magic = 2, attributes = 0, crc = 2412809952, key = null,
         payload = java.nio.HeapByteBuffer[pos=0 lim=159 cap=159]),0), )) to broker 3 on 172.20.72.46:9092 (kafka.producer.async.DefaultEventHandler)
        [2012-12-11 23:00:22,179] INFO Connected to 172.20.72.46:9092 for producing (kafka.producer.SyncProducer)
        [2012-12-11 23:00:22,185] DEBUG Producer sent messages for topics Map([CommGenericMessageSendEvent,1] -> ByteBufferMessageSet(MessageAndOffset(Message(magic = 2, attributes = 0, crc = 2412809952, key = null, payload = java.nio.HeapByteBuffer[pos=0 lim=159 cap=159]),0), )) to broker 3 on
        (kafka.producer.async.DefaultEventHandler)
        [2012-12-11 23:00:22,185] DEBUG Produce request with correlation id 60626 failed due to response ProducerResponse(0,60
        626,Map([CommGenericMessageSendEvent,1] -> ProducerResponseStatus(-1,-1))). List of failed topic partitions is [CommGe
        nericMessageSendEvent,1] (kafka.producer.async.DefaultEventHandler)
        [2012-12-11 23:00:22,285] INFO Fetching metadata with correlation id 60627 for 1 topic(s) Set(CommGenericMessageSendEvent) (kafka.client.ClientUtils$)
        [2012-12-11 23:00:22,312] ERROR Failed to send the following requests with correlation id 60627: ArrayBuffer(KeyedMessage(CommGenericMessageSendEvent,null,Message(magic = 2, attributes = 0, crc = 2412809952, key = null, payload = java.n
        io.HeapByteBuffer[pos=0 lim=159 cap=159]))) (kafka.producer.async.DefaultEventHandler)
        [2012-12-11 23:00:22,314] ERROR Error in handling batch of 200 events (kafka.producer.async.ProducerSendThread)
        kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.
                at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:73)
                at kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:104)
                at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:87)
                at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:67)
                at scala.collection.immutable.Stream.foreach(Stream.scala:254)
                at kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:66)
                at kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:44)


        Tracing produce request with correlation id 60626 on broker 3 -


        [2012-12-11 23:00:22,180] TRACE Received request with correlation id 60626 from client : ProducerRequest(0,60626,,0,3000,Map([CommGenericMessageSendEvent,1] -> ByteBufferMessageSet(MessageAndOffset(Message(magic = 2, attributes = 0, crc = 2412809952, key = null, payload = java.nio.HeapByteBuffer[pos=0 lim=159 cap=159]),0), ))) (kafka.network.RequestChannel$)
        [2012-12-11 23:00:22,180] TRACE Handling ProducerRequest with correlation id 60626 from client : ProducerRequest(0,60626,,0,3000,Map([CommGenericMessageSendEvent,1] -> ByteBufferMessageSet(MessageAndOffset(Message(magic = 2, attributes = 0, crc = 2412809952, key = null, payload = java.nio.HeapByteBuffer[pos=0 lim=159 cap=159]),0), ))) (kafka.request.logger)
        [2012-12-11 23:00:22,184] TRACE Completed request with correlation id 60626 and client : ProducerRequest(0,60626,,0,3000,Map([CommGenericMessageSendEvent,1] -> ByteBufferMessageSet(MessageAndOffset(Message(magic = 2, attributes = 0, crc = 2412809952, key = null, payload = java.nio.HeapByteBuffer[pos=0 lim=159 cap=159]),0), ))), totalTime:4, queueTime:0, localTime:4, remoteTime:0, sendTime:0 (kafka.network.RequestChannel$)
        I setup a test cluster of 5 brokers and setup migration tool to send data for ~1000 partitions to this cluster. I see a bunch of FailedToSendMessageExceptions on the producer. After tracing some of these failed requests, I found that the broker successfully completed the requests, yet sends a ProducerResponseStatus(-1, -1) to the producer. The impact is the producer marks the messages for those partitions as failed, retries and eventually gives up. This seems like a bug on the broker.

        The producer's log says -

        [2012-12-11 23:00:22,179] DEBUG Producer sending messages with correlation id 60626 for topics Map([SpamReportEvent,0] -> ByteBufferMessageSet(MessageAndOffset(Message(magic = 2, attributes = 0, crc = 2412809952, key = null,
         payload = java.nio.HeapByteBuffer[pos=0 lim=159 cap=159]),0), )) to broker 3 on 172.20.72.46:9092 (kafka.producer.async.DefaultEventHandler)
        [2012-12-11 23:00:22,179] INFO Connected to 172.20.72.46:9092 for producing (kafka.producer.SyncProducer)
        [2012-12-11 23:00:22,185] DEBUG Producer sent messages for topics Map([SpamReportEvent,0] -> ByteBufferMessageSet(MessageAndOffset(Message(magic = 2, attributes = 0, crc = 2412809952, key = null, payload = java.nio.HeapByteBuffer[pos=0 lim=159 cap=159]),0), )) to broker 3 on
        (kafka.producer.async.DefaultEventHandler)
        [2012-12-11 23:00:22,185] DEBUG Produce request with correlation id 60626 failed due to response ProducerResponse(0,60
        626,Map([SpamReportEvent,0] -> ProducerResponseStatus(-1,-1))). List of failed topic partitions is [CommGe
        nericMessageSendEvent,1] (kafka.producer.async.DefaultEventHandler)
        [2012-12-11 23:00:22,285] INFO Fetching metadata with correlation id 60627 for 1 topic(s) Set(SpamReportEvent) (kafka.client.ClientUtils$)
        [2012-12-11 23:00:22,312] ERROR Failed to send the following requests with correlation id 60627: ArrayBuffer(KeyedMessage(SpamReportEvent,null,Message(magic = 2, attributes = 0, crc = 2412809952, key = null, payload = java.n
        io.HeapByteBuffer[pos=0 lim=159 cap=159]))) (kafka.producer.async.DefaultEventHandler)
        [2012-12-11 23:00:22,314] ERROR Error in handling batch of 200 events (kafka.producer.async.ProducerSendThread)
        kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.
                at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:73)
                at kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:104)
                at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:87)
                at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:67)
                at scala.collection.immutable.Stream.foreach(Stream.scala:254)
                at kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:66)
                at kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:44)


        Tracing produce request with correlation id 60626 on broker 3 -


        [2012-12-11 23:00:22,180] TRACE Received request with correlation id 60626 from client : ProducerRequest(0,60626,,0,3000,Map([SpamReportEvent,0] -> ByteBufferMessageSet(MessageAndOffset(Message(magic = 2, attributes = 0, crc = 2412809952, key = null, payload = java.nio.HeapByteBuffer[pos=0 lim=159 cap=159]),0), ))) (kafka.network.RequestChannel$)
        [2012-12-11 23:00:22,180] TRACE Handling ProducerRequest with correlation id 60626 from client : ProducerRequest(0,60626,,0,3000,Map([SpamReportEvent,0] -> ByteBufferMessageSet(MessageAndOffset(Message(magic = 2, attributes = 0, crc = 2412809952, key = null, payload = java.nio.HeapByteBuffer[pos=0 lim=159 cap=159]),0), ))) (kafka.request.logger)
        [2012-12-11 23:00:22,184] TRACE Completed request with correlation id 60626 and client : ProducerRequest(0,60626,,0,3000,Map([SpamReportEvent,0] -> ByteBufferMessageSet(MessageAndOffset(Message(magic = 2, attributes = 0, crc = 2412809952, key = null, payload = java.nio.HeapByteBuffer[pos=0 lim=159 cap=159]),0), ))), totalTime:4, queueTime:0, localTime:4, remoteTime:0, sendTime:0 (kafka.network.RequestChannel$)
        [2012-12-12 02:06:06,081] ERROR [KafkaApi-3] Error processing ProducerRequest on SpamReportEvent:0 (kafka.server.KafkaApis)
        kafka.common.KafkaException: Trying to roll a new log segment for topic partition SpamReportEvent-0 with start offset 0 while it already exists
                at kafka.log.Log.rollToOffset(Log.scala:465)
                at kafka.log.Log.roll(Log.scala:443)
                at kafka.log.Log.maybeRoll(Log.scala:431)
                at kafka.log.Log.append(Log.scala:258)
                at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:246)
                at kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:239)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
                at scala.collection.immutable.Map$Map1.foreach(Map.scala:105)
                at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
                at scala.collection.immutable.Map$Map1.map(Map.scala:93)
                at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:239)
                at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:179)
                at kafka.server.KafkaApis.handle(KafkaApis.scala:56)
                at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:41)
                at java.lang.Thread.run(Thread.java:619)
        Neha Narkhede made changes -
        Labels replication-performance bugs
        Neha Narkhede made changes -
        Field Original Value New Value
        Summary Kafka server sends producer response with an error code for successfully completed produce requests leading to spurious FailedToSendMessageException on the producer Irrecoverable error on leader while rolling to a new segment
        Neha Narkhede created issue -

          People

          • Assignee:
            Neha Narkhede
            Reporter:
            Neha Narkhede
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Due:
              Created:
              Updated:
              Resolved:

              Development