Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-8145

Broker fails with FATAL Shutdown on Windows, log or index renaming fail

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.1.0, 1.1.1, 2.0.1, 2.1.1
    • None
    • core, log

    Description

      When consumer offset log cleaning or Kafka log rolling triggered on Windows(The default rolling time is 168 hours, eg 1 week). Kafka broker will shut down due to file renaming fail, on Windows is't invalid to rename a file when it's opened, so this issue is Windows specific.

      A topical error log during log rolling:

      at kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$apply$mcI$sp$1.apply(Log.scala:1170)
      at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
      at kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1170)
      at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1161)
      at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1161)
      at kafka.log.Log.maybeHandleIOException(Log.scala:1678)
      at kafka.log.Log.deleteSegments(Log.scala:1161)
      at kafka.log.Log.deleteOldSegments(Log.scala:1156)
      at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1228)
      at kafka.log.Log.deleteOldSegments(Log.scala:1222)
      at kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:854)
      at kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:852)
      at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
      at scala.collection.immutable.List.foreach(List.scala:392)
      at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
      at kafka.log.LogManager.cleanupLogs(LogManager.scala:852)
      at kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(LogManager.scala:385)
      at kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
      at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62)
      at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
      at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)
      Suppressed: java.nio.file.FileSystemException: Z:\tmp\kafka-logs\test-0\00000000000000000000.log -> Z:\tmp\kafka-logs\test-0\00000000000000000000.log.deleted: The process cannot access the file because it is being used by another process.
      at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
      at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
      at sun.nio.fs.WindowsFileCopy.move(Unknown Source)
      at sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source)
      at java.nio.file.Files.move(Unknown Source)
      at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:694)
      ... 32 more

       
      And the log from consumer offset log cleaner:

      [2019-02-26 02:16:05,151] ERROR Failed to clean up log for __consumer_offsets-29 in dir C:\Program Files (x86)\...\tmp\kafka-logs due to IOException (kafka.server.LogDirFailureChannel)
      java.nio.file.FileSystemException: C:\Program Files (x86)\...\tmp\kafka-logs_consumer_offsets-29\00000000000000000000.log.cleaned -> C:\Program Files (x86)...\tmp\kafka-logs_consumer_offsets-29\00000000000000000000.log.swap: The process cannot access the file because it is being used by another process.
      at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
      at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
      at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
      at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287)
      at java.nio.file.Files.move(Files.java:1395)
      at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:697)
      at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:212)
      at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:415)
      at kafka.log.Log.replaceSegments(Log.scala:1644)
      at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:535)
      at kafka.log.Cleaner$$anonfun$doClean$4.apply(LogCleaner.scala:462)
      at kafka.log.Cleaner$$anonfun$doClean$4.apply(LogCleaner.scala:461)
      at scala.collection.immutable.List.foreach(List.scala:392)
      at kafka.log.Cleaner.doClean(LogCleaner.scala:461)
      at kafka.log.Cleaner.clean(LogCleaner.scala:438)
      at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:305)
      at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:291)
      at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
      Suppressed: java.nio.file.FileSystemException: C:\Program Files (x86)\...\tmp\kafka-logs_consumer_offsets-29\00000000000000000000.log.cleaned -> C:\Program Files (x86)\...\tmp\kafka-logs_consumer_offsets-29\00000000000000000000.log.swap: The process cannot access the file because it is being used by another process.
      at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
      at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
      at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301)
      at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287)
      at java.nio.file.Files.move(Files.java:1395)
      at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:694)
      ... 12 more

      Some users have already added the comments in KAFKA-6188 , but there is no feedback from the community.
       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              prehistoricpenguin prehistoricpenguin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - 168h
                  168h
                  Remaining:
                  Remaining Estimate - 168h
                  168h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified