Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
I recently discovered an odd behaviour in one of our kafka clusters (KRaft-based, v3.7.0):
We have a topic for distributed log collection with 48 partitions. Retention is set to 84 hours, we have the default cleanup.policy=delete in place. For all but two partitions this works as expected. In two partition directories there are files going back to january and consuming the specific partitions yields data from january (showing it's not only the files lying around, they are actually processed).
Topic settings as per kafka-topics.sh --describe:
Topic: syslog TopicId: AeJLnYPnQFOtMc0ZjpH7sw PartitionCount: 48 ReplicationFactor: 2 Configs: compression.type=snappy,cleanup.policy=delete,segment.bytes=1073741824,retention.ms=302400000,max.message.bytes=2097152
Searching the cluster logs, there is no indicator of what could be the reason here (at least I did not spot anything suspicious up until now). Up to the time were deletion stopped, there are log entries showing the deleteion of old log segments, but that simply stopped. As far as I can see, there has not been any change on the cluster at that point.