Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-3656

Avoid stressing system more when already under stress

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0.0
    • Component/s: None
    • Labels:
      None

      Description

      I am working with Kafka Connect now and I am having error messages like that:

      [2016-05-04 03:11:28,226] ERROR Failed to flush WorkerSourceTask{id=geo-connector-0}, timed out while waiting for producer to flush outstanding messages, 151860 left ([FAILED toString()]) (org.apache.kafka.connect.runtime.WorkerSourceTask:237)
      [2016-05-04 03:11:28,227] ERROR Failed to commit offsets for WorkerSourceTask{id=geo-connector-0} (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:112)
      

      I didn't figure out the reason why Connect would pull so many records into memory when it clearly can't produce that fast and I don't yet know why producing messages is slow.

      But the part of {{151860 left ([FAILED toString()]) }} is interesting and I looked at the code and found this:

      if (timeoutMs <= 0) {
                              log.error(
                                      "Failed to flush {}, timed out while waiting for producer to flush outstanding "
                                              + "messages, {} left ({})", this, outstandingMessages.size(), outstandingMessages);
                              finishFailedFlush();
                              return false;
                          }
      

      So when the connector is under stress and, assuming 151860 messages, under a heavy memory pressure the code choses to take pretty much 4 * 151860 byte arrays and to convert it to a java string.
      This not only eats more memory and adds to GC, but is also useless for logging because the actual string, if it wouldn't fail, would look like:

      (topic=lamington--geo-connector, partition=null, key=null, value=[B@62c66f62=ProducerRecord(topic=lamington--geo-connector, partition=null, key=null, value=[B@62c66f62, ProducerRecord(topic=lamington--geo-connector, partition=null, key=null, .....
      

      I think it is a bug and a string representation of the outstanding messages should be removed from the log.

        Attachments

          Activity

            People

            • Assignee:
              liquanpei Liquan Pei
              Reporter:
              alexeyraga Alexey Raga
              Reviewer:
              Ewen Cheslack-Postava
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: