Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-3656

Avoid stressing system more when already under stress

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.10.0.0
    • None
    • None

    Description

      I am working with Kafka Connect now and I am having error messages like that:

      [2016-05-04 03:11:28,226] ERROR Failed to flush WorkerSourceTask{id=geo-connector-0}, timed out while waiting for producer to flush outstanding messages, 151860 left ([FAILED toString()]) (org.apache.kafka.connect.runtime.WorkerSourceTask:237)
      [2016-05-04 03:11:28,227] ERROR Failed to commit offsets for WorkerSourceTask{id=geo-connector-0} (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:112)
      

      I didn't figure out the reason why Connect would pull so many records into memory when it clearly can't produce that fast and I don't yet know why producing messages is slow.

      But the part of {{151860 left ([FAILED toString()]) }} is interesting and I looked at the code and found this:

      if (timeoutMs <= 0) {
                              log.error(
                                      "Failed to flush {}, timed out while waiting for producer to flush outstanding "
                                              + "messages, {} left ({})", this, outstandingMessages.size(), outstandingMessages);
                              finishFailedFlush();
                              return false;
                          }
      

      So when the connector is under stress and, assuming 151860 messages, under a heavy memory pressure the code choses to take pretty much 4 * 151860 byte arrays and to convert it to a java string.
      This not only eats more memory and adds to GC, but is also useless for logging because the actual string, if it wouldn't fail, would look like:

      (topic=lamington--geo-connector, partition=null, key=null, value=[B@62c66f62=ProducerRecord(topic=lamington--geo-connector, partition=null, key=null, value=[B@62c66f62, ProducerRecord(topic=lamington--geo-connector, partition=null, key=null, .....
      

      I think it is a bug and a string representation of the outstanding messages should be removed from the log.

      Attachments

        Activity

          People

            liquanpei Liquan Pei
            alexeyraga Alexey Raga
            Ewen Cheslack-Postava Ewen Cheslack-Postava
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: