Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-6075

Kafka cannot recover after an unclean shutdown on Windows

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.11.0.1
    • 1.1.0
    • None
    • None

    Description

      An unclean shutdown of broker on Windows cannot be recovered by Kafka. Steps to reproduce from a fresh build:

      1. Start zookeeper
      2. Start a broker
      3. Create a topic test
      4. Do an unclean shutdown of broker (find the process id by wmic process where "caption = 'java.exe' and commandline like '%server.properties%'" get processid), then kill the process by taskkill /pid #### /f
      5. Start the broker again

      This leads to the following errors:

      [2017-10-17 17:13:24,819] ERROR Error while loading log dir C:\tmp\kafka-logs (kafka.log.LogManager)
      java.nio.file.FileSystemException: C:\tmp\kafka-logs\test-0\00000000000000000000.timeindex: The process cannot access the file because it is being used by another process.
      
              at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
              at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
              at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
              at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
              at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108)
              at java.nio.file.Files.deleteIfExists(Files.java:1165)
              at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:333)
              at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:295)
              at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
              at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
              at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
              at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
              at kafka.log.Log.loadSegmentFiles(Log.scala:295)
              at kafka.log.Log.loadSegments(Log.scala:404)
              at kafka.log.Log.<init>(Log.scala:201)
              at kafka.log.Log$.apply(Log.scala:1729)
              at kafka.log.LogManager.kafka$log$LogManager$$loadLog(LogManager.scala:221)
              at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$8$$anonfun$apply$16$$anonfun$apply$2.apply$mcV$sp(LogManager.scala:292)
              at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      [2017-10-17 17:13:24,819] ERROR Error while deleting the clean shutdown file in dir C:\tmp\kafka-logs (kafka.server.LogDirFailureChannel)
      java.nio.file.FileSystemException: C:\tmp\kafka-logs\test-0\00000000000000000000.timeindex: The process cannot access the file because it is being used by another process.
      
              at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
              at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
              at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
              at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
              at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108)
              at java.nio.file.Files.deleteIfExists(Files.java:1165)
              at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:333)
              at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:295)
              at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
              at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
              at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
              at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
              at kafka.log.Log.loadSegmentFiles(Log.scala:295)
              at kafka.log.Log.loadSegments(Log.scala:404)
              at kafka.log.Log.<init>(Log.scala:201)
              at kafka.log.Log$.apply(Log.scala:1729)
              at kafka.log.LogManager.kafka$log$LogManager$$loadLog(LogManager.scala:221)
              at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$8$$anonfun$apply$16$$anonfun$apply$2.apply$mcV$sp(LogManager.scala:292)
              at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      [2017-10-17 17:13:24,819] INFO Logs loading complete in 47 ms. (kafka.log.LogManager)
      [2017-10-17 17:13:24,865] WARN Error processing kafka.log:type=LogManager,name=LogDirectoryOffline,logDirectory=C:\tmp\kafka-logs (com.yammer.metrics.reporting.JmxReporter)
      javax.management.MalformedObjectNameException: Invalid character ':' in value part of property
              at javax.management.ObjectName.construct(ObjectName.java:618)
              at javax.management.ObjectName.<init>(ObjectName.java:1382)
              at com.yammer.metrics.reporting.JmxReporter.onMetricAdded(JmxReporter.java:395)
              at com.yammer.metrics.core.MetricsRegistry.notifyMetricAdded(MetricsRegistry.java:516)
              at com.yammer.metrics.core.MetricsRegistry.getOrAdd(MetricsRegistry.java:491)
              at com.yammer.metrics.core.MetricsRegistry.newGauge(MetricsRegistry.java:79)
              at kafka.metrics.KafkaMetricsGroup$class.newGauge(KafkaMetricsGroup.scala:80)
              at kafka.log.LogManager.newGauge(LogManager.scala:50)
              at kafka.log.LogManager$$anonfun$6.apply(LogManager.scala:117)
              at kafka.log.LogManager$$anonfun$6.apply(LogManager.scala:116)
              at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
              at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
              at kafka.log.LogManager.<init>(LogManager.scala:116)
              at kafka.log.LogManager$.apply(LogManager.scala:799)
              at kafka.server.KafkaServer.startup(KafkaServer.scala:222)
              at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
              at kafka.Kafka$.main(Kafka.scala:92)
              at kafka.Kafka.main(Kafka.scala)
      [2017-10-17 17:13:24,865] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager)
      [2017-10-17 17:13:24,881] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
      [2017-10-17 17:13:25,131] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.Acceptor)
      [2017-10-17 17:13:25,147] INFO [SocketServer brokerId=0] Started 1 acceptor threads (kafka.network.SocketServer)
      [2017-10-17 17:13:25,162] INFO [ExpirationReaper-0-Produce]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
      [2017-10-17 17:13:25,162] INFO [ExpirationReaper-0-DeleteRecords]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
      [2017-10-17 17:13:25,162] INFO [ExpirationReaper-0-Fetch]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
      [2017-10-17 17:13:25,162] INFO [LogDirFailureHandler]: Starting (kafka.server.ReplicaManager$LogDirFailureHandler)
      [2017-10-17 17:13:25,162] INFO [ReplicaManager broker=0] Stopping serving replicas in dir C:\tmp\kafka-logs (kafka.server.ReplicaManager)
      [2017-10-17 17:13:25,162] INFO [ReplicaManager broker=0] Partitions  are offline due to failure on log directory C:\tmp\kafka-logs (kafka.server.ReplicaManager)
      [2017-10-17 17:13:25,162] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions  (kafka.server.ReplicaFetcherManager)
      [2017-10-17 17:13:25,178] INFO [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions  because they are in the failed log dir C:\tmp\kafka-logs (kafka.server.ReplicaManager)
      [2017-10-17 17:13:25,178] INFO Stopping serving logs in dir C:\tmp\kafka-logs (kafka.log.LogManager)
      [2017-10-17 17:13:25,178] FATAL Shutdown broker because all log dirs in C:\tmp\kafka-logs have failed (kafka.log.LogManager)
      

      Attachments

        1. 6075.v4
          4 kB
          Ted Yu

        Issue Links

          Activity

            People

              ijuma Ismael Juma
              vahid Vahid Hashemian
              Votes:
              3 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: