Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-4228

Sender thread death leaves KafkaProducer in a bad state

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.10.0.1
    • None
    • clients
    • None

    Description

      a KafkaProducer's Sender thread may die:

      2016/09/28 00:28:01.065 ERROR [KafkaThread] [kafka-producer-network-thread | mm_ei-lca1_uniform] [kafka-mirror-maker] [] Uncaught exception in kafka-producer-network-thread | mm_ei-lca1_uni
      java.lang.OutOfMemoryError: Java heap space
             at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[?:1.8.0_40]
             at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[?:1.8.0_40]
             at org.apache.kafka.common.requests.RequestSend.serialize(RequestSend.java:35) ~[kafka-clients-0.9.0.666.jar:?]
             at org.apache.kafka.common.requests.RequestSend.<init>(RequestSend.java:29) ~[kafka-clients-0.9.0.666.jar:?]
             at org.apache.kafka.clients.producer.internals.Sender.produceRequest(Sender.java:355) ~[kafka-clients-0.9.0.666.jar:?]
             at org.apache.kafka.clients.producer.internals.Sender.createProduceRequests(Sender.java:337) ~[kafka-clients-0.9.0.666.jar:?]
             at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211) ~[kafka-clients-0.9.0.666.jar:?]
             at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134) ~[kafka-clients-0.9.0.666.jar:?]
             at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
      

      which leaves the producer in a bad state. in this state, a call to flush(), for example, will hang indefinitely as the sender thread is not around to flush batches but theyve not been aborted.

      even worse, this can happen in MirrorMaker just before a rebalance, at which point MM will just block indefinitely during a rebalance (in beforeReleasingPartitions()).

      a rebalance participant hung in such a way will cause rebalance to fail for the rest of the participants, at which point ZKRebalancerListener.watcherExecutorThread() dies to an exception (cannot rebalance after X attempts) but the consumer that ran the thread will remain live. the end result is a bunch of zombie mirror makers and orphan topic partitions.

      a dead sender thread should result in closing the producer.
      a consumer failing to rebalance should shut down.
      any issue with the producer or consumer should cause mirror-maker death.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              radai Radai Rosenblatt
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: