Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7074

Race condition between Kafka log deletion and compaction

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.10.2.1
    • None
    • core
    • None

    Description

      We have one of standard options of deploying Kafka-based EventSourcing microservice - deployment with reset. It includes running kafka-reset-tool before deploying and running microservice which should reset offsets for input topics and delete intermediate topics to allow microservice rebuild its internal state.
      We have faced a problem several times (its not 100% reproducible) that LogCleaner crashed with NoSuchFileException during compaction routine after such deployment. See detailed log:

      Disable possible compaction:                 [2018-06-14 18:25:01,464] INFO The cleaning for partition reports-app-ReportsContractExpansionStateStore-changelog-7 is aborted 
      Deletion happended:                          [2018-06-14 18:25:56,761] INFO Deleting index /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex (kafka.log.TimeIndex)
      Compaction failed 'cause it can't find file: [2018-06-14 18:25:57,402] ERROR [kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
                                                          kafka.common.KafkaStorageException: Failed to change the timeindex file suffix from  to .deleted for log segment 0
                                                                  at kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:340)
                                                                  at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:350)
                                                                  at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:981)
                                                                  at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1027)
                                                                  at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1022)
                                                                  at scala.collection.immutable.List.foreach(List.scala:381)
                                                                  at kafka.log.Log.replaceSegments(Log.scala:1022)
                                                                  at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:426)
                                                                  at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
                                                                  at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362)
                                                                  at scala.collection.immutable.List.foreach(List.scala:381)
                                                                  at kafka.log.Cleaner.clean(LogCleaner.scala:362)
                                                                  at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
                                                                  at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
                                                                  at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
                                                          Caused by: java.nio.file.NoSuchFileException: /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
                                                                  at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
                                                                  at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
                                                                  at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
                                                                  at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
                                                                  at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
                                                                  at java.nio.file.Files.move(Files.java:1395)
                                                                  at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:711)
                                                                  at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:126)
                                                                  ... 14 more
                                                                  Suppressed: java.nio.file.NoSuchFileException: /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex -> 
                                                          /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex.deleted
                                                                          at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
                                                                          at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
                                                                          at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
                                                                          at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
                                                                          at java.nio.file.Files.move(Files.java:1395)
                                                                          at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:708)
                                                                          ... 15 more
      
      

      It looks like disabling compaction while deleting log doesn't work.
      After that compaction stops working and space consumption grows and there is only one solution to this problem is to restart broker.

      Attachments

        1. LogCleanerError.log
          556 kB
          Phil Mikhailov

        Issue Links

          Activity

            People

              Unassigned Unassigned
              phil.mikhailov Phil Mikhailov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: