What's the issue?
- There were log segments, which can not be deleted over configured retention hours.
What are impacts?
- Log space keep in increasing and finally cause space shortage.
- There are lots of log segment rolled with a smaller size. e.g log segment may be only 50mb, not the expected 1gb.
- Kafka stream or client may experience missing data.
- It will be a way used to attack Kafka server.
What's workaround adopted to resolve this issue?
- If it's already happened on your Kafka system, you will need to run a very tricky steps to resolve it.
- If it has not happened on your Kafka system yet, you may need to evaluate whether you can switch to LogAppendTime for log.message.timestamp.type.
What are the reproduce steps?
- Make sure Kafka client and server are not hosted in the same machine.
- Configure log.message.timestamp.type with CreateTime, not LogAppendTime.
- Hack Kafka client's system clock time with a future time, e.g 03/04/2025, 3:25:52 PM GMT-08:00
- Send message from Kafka client to server.
What kinds of things you need to have a look after message handled by Kafka server?
- Check the value of timestamp in log segment *.timeindex. The timestamp will be a future time after `03/04/*2025, 3:25:52 PM GMT-08:00`. (Let's say 00000000035957300794.log is the log segment which first receive the test client's message. It will be referenced in #3)
- After testing for couples of hours, there will be lots of log segment rolled with a smaller size (e.g 50mb) than the configured segment size (e.g 1gb).
- All of log segments including 00000000035957300794.* and new ones, will not be deleted over retention hours.
What's the particular logic to cause this issue?
- No deletable log segments will be returned from the following method.
private def deletableSegments(predicate: (LogSegment, Option[LogSegment]) => Boolean)|https://github.com/apache/kafka/blob/1.1/core/src/main/scala/kafka/log/Log.scala#L1227].