Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-8270

Kafka timestamp-based retention policy will not work when Kafka client has out of sync system clock issue.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.1.1
    • None
    • log, log cleaner, logging

    Description

      What's the issue?

      1. There were log segments, which can not be deleted over configured retention hours.

      What are impacts? 

      1. Log space keep in increasing and finally cause space shortage.
      2. There are lots of log segment rolled with a smaller size. e.g log segment may be only 50mb, not the expected 1gb.
      3. Kafka stream or client may experience missing data.
      4. It will be a way used to attack Kafka server.

      What's workaround adopted to resolve this issue?

      1. If it's already happened on your Kafka system, you will need to run a very tricky steps to resolve it.
      2. If it has not happened on your Kafka system yet, you may need to evaluate whether you can switch to LogAppendTime for log.message.timestamp.type. 

      What are the reproduce steps?

      1. Make sure Kafka client and server are not hosted in the same machine.
      2. Configure log.message.timestamp.type with CreateTime, not LogAppendTime.
      3. Hack Kafka client's system clock time with a future time, e.g 03/04/2025, 3:25:52 PM GMT-08:00
      4. Send message from Kafka client to server.

      What kinds of things you need to have a look after message handled by Kafka server?

      1. Check the value of timestamp in log segment *.timeindex. The timestamp will be a future time after `03/04/*2025, 3:25:52 PM GMT-08:00`.   (Let's say 00000000035957300794.log is the log segment which first receive the test client's message. It will be referenced in #3)
      2. After testing for couples of hours, there will be lots of log segment rolled with a smaller size (e.g 50mb) than the configured segment size (e.g 1gb). 
      3. All of log segments including 00000000035957300794.* and new ones, will not be deleted over retention hours.

      What's the particular logic to cause this issue?

      1. No deletable log segments will be returned from the following method.
        private def deletableSegments(predicate: (LogSegment, Option[LogSegment]) => Boolean)|https://github.com/apache/kafka/blob/1.1/core/src/main/scala/kafka/log/Log.scala#L1227].

      Attachments

        1. space issue.png
          99 kB
          Jiangtao Liu

        Activity

          People

            Unassigned Unassigned
            tony2011 Jiangtao Liu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: