Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-13575

Unwanted cleanup since log.retention.hours=-1 does not work

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 2.6.0
    • None
    • log cleaner
    • None
    • AWS, Kubernetes cluster, Kafka run as strimzi pods.
    • Important

    Description

      Use case: We have a use case in which we never want topics data to get deleted. We used `log.retention.hours=-1` to keep the topic's retention as infinite.

      But still Kafka cleanup got triggered and it cleaned our topic's data moving our oldest offset from 0 to higher value. Below is the log from Kafka using default retention of 7days=6040800000ms and moving the offsets due to deleted segment. Logs are filled with messages "due to retention time 604800000ms breach" when it should have used the infinite retention based on the set config.

      Logs of the Kafka Pod that has the cleanup traces:

      2021-12-21 15:28:20,891 INFO [Log partition=ts.praccomm.patientsms-0, dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base offsets [0,2214146,4890534,7645992,10394781,13149705,15798698,18084408,20638290,23202704,25760900,28207227,30796134,33437632,35763142,38247977,40861013,43473617,46086956,48469054,50776433,53050704,55299660,57529943,59743683,61984715,64224317,66458910,68670877,70886848,73080554,74872031,76656553,78573653,80412933,82457759,84854900,89950485,94263626,98668311,103762764,107797683,111809208,113716544,115406491,117052453,119304884,121765245,123536361,125364537,127191439,129028759,130902041,132719344,134787234,139027470,143932296,147888638,150492879,152410799,154338596,156416190,160672948,165054157,169273069,171824978,173847615,175602822,177765267,179701522,181622192,183511328,185382862,187200966,188998652,193646877,198500489,203354616,208211378,210992397,212806790,214599592,216456221,218231836,220014080,221812337,223574882,225341294,227151134,228894227,231000663,232855550,234554788,236304441,238012477,239523793,241164599,242938200,244945345,246534386,248345745,249972929,251761034,253511669,255180052,256960087,258715796,260382401,262123766,263900146,265585895,267349067,269106101,270936370,272601164,274416555,276154378,277854987,279656300,281487544,283272481,285126550,286886989,288671958,290389171,292132610,293756088,295527031,297230008,298992929,300641108,302415673,304138766,305900851,307542789,309368805,311075952,312729042,314516564,316294429,318006961,319760277,321426916,323177673,324845934,326606764,328385019,330117277,331930761,333729477,335525038,337253962,339046341,340756170,342402991,344019861,345789681,347575987,349276417,350912980,352675272,354444307,356157225,357960082,359616551,361418905,363132123,364932894,366638881,368368450] due to retention time 604800000ms breach (kafka.log.Log) [kafka-scheduler-2]
      2021-12-23 02:28:18,124 INFO [Log partition=ts.praccomm.patientsms-0, dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base offsets [370099688] due to retention time 604800000ms breach (kafka.log.Log) [kafka-scheduler-4]
      2021-12-24 15:03:18,125 INFO [Log partition=ts.praccomm.patientsms-0, dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base offsets [371913024] due to retention time 604800000ms breach (kafka.log.Log) [kafka-scheduler-1]
      2021-12-26 13:33:18,124 INFO [Log partition=ts.praccomm.patientsms-0, dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base offsets [373561655] due to retention time 604800000ms breach (kafka.log.Log) [kafka-scheduler-2]
      2021-12-28 06:18:18,124 INFO [Log partition=ts.praccomm.patientsms-0, dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base offsets [375346988] due to retention time 604800000ms breach (kafka.log.Log) [kafka-scheduler-9]
      2021-12-29 20:18:18,125 INFO [Log partition=ts.praccomm.patientsms-0, dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base offsets [377066152] due to retention time 604800000ms breach (kafka.log.Log) [kafka-scheduler-2]
      2021-12-31 12:18:18,124 INFO [Log partition=ts.praccomm.patientsms-0, dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base offsets [378821599] due to retention time 604800000ms breach (kafka.log.Log) [kafka-scheduler-2] 

      Listing out Configurations to prove we are setting the right values

      Broker Configurations:

      $ ./bin/kafka-configs.sh --command-config ./bin/client-ssl-auth.properties --bootstrap-server XXX:9094 --entity-type brokers --entity-name 2 --describe --all | grep retention
      (org.apache.kafka.clients.admin.AdminClientConfig)
        log.cleaner.delete.retention.ms=86400000 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.delete.retention.ms=86400000}
        log.retention.hours=-1 sensitive=false synonyms={STATIC_BROKER_CONFIG:log.retention.hours=-1, DEFAULT_CONFIG:log.retention.hours=168}
        log.retention.minutes=null sensitive=false synonyms={}
        offsets.retention.check.interval.ms=600000 sensitive=false synonyms={DEFAULT_CONFIG:offsets.retention.check.interval.ms=600000}
        log.retention.check.interval.ms=300000 sensitive=false synonyms={DEFAULT_CONFIG:log.retention.check.interval.ms=300000}
        log.retention.ms=null sensitive=false synonyms={}
        log.retention.bytes=-1 sensitive=false synonyms={DEFAULT_CONFIG:log.retention.bytes=-1}
        offsets.retention.minutes=525600 sensitive=false synonyms={STATIC_BROKER_CONFIG:offsets.retention.minutes=525600, DEFAULT_CONFIG:offsets.retention.minutes=10080} 

      I also checked other brokers also have same configuration:

      ./bin/kafka-configs.sh --command-config ./bin/client-ssl-auth.properties --bootstrap-server XXX --entity-type brokers --entity-name 0 --describe --all | grep log.retention 
       log.retention.hours=-1 sensitive=false synonyms={STATIC_BROKER_CONFIG:log.retention.hours=-1, DEFAULT_CONFIG:log.retention.hours=168} 
      ./bin/kafka-configs.sh --command-config ./bin/client-ssl-auth.properties --bootstrap-server XXX --entity-type brokers --entity-name 0 --describe --all | grep log.retention 
      log.retention.hours=-1 sensitive=false synonyms={STATIC_BROKER_CONFIG:log.retention.hours=-1, DEFAULT_CONFIG:log.retention.hours=168}  

      Topic Configuration: 

      $ ./bin/kafka-configs.sh --command-config ./bin/client-ssl-auth.properties --bootstrap-server XXXX:9094 --entity-type topics --entity-name ts.praccomm.patientsms --describe --all
      
      org.apache.kafka.clients.admin.AdminClientConfig)
      All configs for topic ts.praccomm.patientsms are:
        compression.type=producer sensitive=false synonyms={DEFAULT_CONFIG:compression.type=producer}
        leader.replication.throttled.replicas= sensitive=false synonyms={}
        min.insync.replicas=1 sensitive=false synonyms={DEFAULT_CONFIG:min.insync.replicas=1}
        message.downconversion.enable=true sensitive=false synonyms={DEFAULT_CONFIG:log.message.downconversion.enable=true}
        segment.jitter.ms=0 sensitive=false synonyms={}
        cleanup.policy=delete sensitive=false synonyms={DEFAULT_CONFIG:log.cleanup.policy=delete}
        flush.ms=9223372036854775807 sensitive=false synonyms={}
        follower.replication.throttled.replicas= sensitive=false synonyms={}
        segment.bytes=1073741824 sensitive=false synonyms={DEFAULT_CONFIG:log.segment.bytes=1073741824}
        retention.ms=-1 sensitive=false synonyms={}
        flush.messages=9223372036854775807 sensitive=false synonyms={DEFAULT_CONFIG:log.flush.interval.messages=9223372036854775807}
        message.format.version=2.6-IV0 sensitive=false synonyms={STATIC_BROKER_CONFIG:log.message.format.version=2.6, DEFAULT_CONFIG:log.message.format.version=2.6-IV0}
        max.compaction.lag.ms=9223372036854775807 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.max.compaction.lag.ms=9223372036854775807}
        file.delete.delay.ms=60000 sensitive=false synonyms={DEFAULT_CONFIG:log.segment.delete.delay.ms=60000}
        max.message.bytes=1572864 sensitive=false synonyms={DYNAMIC_BROKER_CONFIG:message.max.bytes=1572864, STATIC_BROKER_CONFIG:message.max.bytes=1572864, DEFAULT_CONFIG:message.max.bytes=1048588}
        min.compaction.lag.ms=0 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.min.compaction.lag.ms=0}
        message.timestamp.type=CreateTime sensitive=false synonyms={DEFAULT_CONFIG:log.message.timestamp.type=CreateTime}
        preallocate=false sensitive=false synonyms={DEFAULT_CONFIG:log.preallocate=false}
        index.interval.bytes=4096 sensitive=false synonyms={DEFAULT_CONFIG:log.index.interval.bytes=4096}
        min.cleanable.dirty.ratio=0.5 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.min.cleanable.ratio=0.5}
        unclean.leader.election.enable=false sensitive=false synonyms={DEFAULT_CONFIG:unclean.leader.election.enable=false}
        retention.bytes=-1 sensitive=false synonyms={DEFAULT_CONFIG:log.retention.bytes=-1}
        delete.retention.ms=86400000 sensitive=false synonyms={DEFAULT_CONFIG:log.cleaner.delete.retention.ms=86400000}
        segment.ms=604800000 sensitive=false synonyms={}
        message.timestamp.difference.max.ms=9223372036854775807 sensitive=false synonyms={DEFAULT_CONFIG:log.message.timestamp.difference.max.ms=9223372036854775807}
        segment.index.bytes=10485760 sensitive=false synonyms={DEFAULT_CONFIG:log.index.size.max.bytes=10485760}

      With log.retention.hours being set to -1, it was expected the log cleanup to not trigger but Kafka still used the default configuration of 7 days retention and cleaned up the segments. The cleanups are still happening as we have not changed the configurations to debug and find the cause. Let me know if you need more details.

      Please look into this.

      Attachments

        1. Screenshot 2022-01-02 at 4.22.50 PM.png
          127 kB
          Alok Kumar Singh
        2. Screenshot 2022-01-02 at 4.30.38 PM.png
          59 kB
          Alok Kumar Singh

        Activity

          People

            Unassigned Unassigned
            alok87 Alok Kumar Singh
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: