Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
I am working with Kafka Connect now and I am having error messages like that:
[2016-05-04 03:11:28,226] ERROR Failed to flush WorkerSourceTask{id=geo-connector-0}, timed out while waiting for producer to flush outstanding messages, 151860 left ([FAILED toString()]) (org.apache.kafka.connect.runtime.WorkerSourceTask:237) [2016-05-04 03:11:28,227] ERROR Failed to commit offsets for WorkerSourceTask{id=geo-connector-0} (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:112)
I didn't figure out the reason why Connect would pull so many records into memory when it clearly can't produce that fast and I don't yet know why producing messages is slow.
But the part of {{151860 left ([FAILED toString()]) }} is interesting and I looked at the code and found this:
if (timeoutMs <= 0) { log.error( "Failed to flush {}, timed out while waiting for producer to flush outstanding " + "messages, {} left ({})", this, outstandingMessages.size(), outstandingMessages); finishFailedFlush(); return false; }
So when the connector is under stress and, assuming 151860 messages, under a heavy memory pressure the code choses to take pretty much 4 * 151860 byte arrays and to convert it to a java string.
This not only eats more memory and adds to GC, but is also useless for logging because the actual string, if it wouldn't fail, would look like:
(topic=lamington--geo-connector, partition=null, key=null, value=[B@62c66f62=ProducerRecord(topic=lamington--geo-connector, partition=null, key=null, value=[B@62c66f62, ProducerRecord(topic=lamington--geo-connector, partition=null, key=null, .....
I think it is a bug and a string representation of the outstanding messages should be removed from the log.