We have observed the following issue:
We had a java consumer with the same version as the reported kafka (1.0.1), this consumer was calling commit.sync() every couple of miliseconds even if there was no messages from poll, in fact this was called after the poll timeout, this consumer has some low message peaks but most of the time is inactive.
By changing the consumer behavior this problem doesn't appear.
The kafka setup has 3 brokers with a replication factor 2, the disk that is used is a ceph block storage device that is exposed as an openstack cider volume.
We noticed that at some point when the log-cleaner thread was trying to compact the __consumer_offset for this topic, it failed with:
CorruptRecordException: Record size is less than the minimum record overhead (14)
This was causing the log-cleaner to stop, filling up the available free disk space and causing kafka to stop working failing the whole system.
Is any known issues similar to this case?
Is it possible that this type of consumer behavior can cause such an issue?
It appears that the consumer will send data when we call commit sync, even if it didn't receive any messages, what is the behavior for this cases?
Is it possible for a consumer to send a message to kafka that is corrupted or for kafka to corrupt a message on disk or during replication?
Please provide some guidelines about any actions that are needed to troubleshoot.
Thanks in advance for your effort.