Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7814

Broker shut down while cleaning up log file

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Not A Problem
    • Affects Version/s: 1.1.0, 2.1.0
    • Fix Version/s: None
    • Component/s: log, offset manager
    • Labels:
      None
    • Environment:
      os: aliYun, centos7
      docker image:wurstmeister/kafka:2.12-2.1.0

      Description

      Kafka cluster with 3 brokers(version:1.1.0) and is well running for over 6 months.

      Then we modified partitions from 3 to 48 for every topic after 2018/12/12, then the brokers shutdown every 5-10 days.

      Then we upgraded the broker from 1.1.0 to 2.1.0, but the brokers still keep shutting down every 5-10 days.

      Each time, one broker shut down after the following error log, then several minutes later, the other 2 brokers shut down too, with the same error but other partition log files.

      [2019-01-11 17:16:36,572] INFO [ProducerStateManager partition=__transaction_state-11] Writing producer snapshot at offset 807760 (kafka.log.ProducerStateManager)
      [2019-01-11 17:16:36,572] INFO [Log partition=__transaction_state-11, dir=/kafka/logs] Rolled new log segment at offset 807760 in 4 ms. (kafka.log.Log)
      [2019-01-11 17:16:46,150] WARN Resetting first dirty offset of __transaction_state-35 to log start offset 194404 since the checkpointed offset 194345 is invalid. (kafka.log.LogCleanerManager$)
      [2019-01-11 17:16:46,239] ERROR Failed to clean up log for __transaction_state-11 in dir /kafka/logs due to IOException (kafka.server.LogDirFailureChannel)
      java.nio.file.NoSuchFileException: /kafka/logs/__transaction_state-11/00000000000000807727.log
              at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
              at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
              at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
              at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
              at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
              at java.nio.file.Files.move(Files.java:1395)
              at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:809)
              at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:222)
              at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
              at kafka.log.Log.asyncDeleteSegment(Log.scala:1838)
              at kafka.log.Log.$anonfun$replaceSegments$6(Log.scala:1901)
              at kafka.log.Log.$anonfun$replaceSegments$6$adapted(Log.scala:1896)
              at scala.collection.immutable.List.foreach(List.scala:388)
              at kafka.log.Log.replaceSegments(Log.scala:1896)
              at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:583)
              at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:515)
              at kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:514)
              at scala.collection.immutable.List.foreach(List.scala:388)
              at kafka.log.Cleaner.doClean(LogCleaner.scala:514)
              at kafka.log.Cleaner.clean(LogCleaner.scala:492)
              at kafka.log.LogCleaner$CleanerThread.cleanLog(LogCleaner.scala:353)
              at kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:319)
              at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:300)
              at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
              Suppressed: java.nio.file.NoSuchFileException: /kafka/logs/__transaction_state-11/00000000000000807727.log -> /kafka/logs/__transaction_state-11/00000000000000807727.log.deleted
                      at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
                      at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
                      at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
                      at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
                      at java.nio.file.Files.move(Files.java:1395)
                      at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:806)
                      ... 17 more
      [2019-01-11 17:16:46,245] INFO [ReplicaManager broker=2] Stopping serving replicas in dir /kafka/logs (kafka.server.ReplicaManager)
      [2019-01-11 17:16:46,314] INFO Stopping serving logs in dir /kafka/logs (kafka.log.LogManager)
      [2019-01-11 17:16:46,326] ERROR Shutdown broker because all log dirs in /kafka/logs have failed (kafka.log.LogManager)
      

        Attachments

        1. broker2.server.log.2019-01-11-17
          788 kB
          EverZhang
        2. broker3.server.log.2019-01-11-17
          725 kB
          EverZhang
        3. broker1.server.log.2019-01-11-17
          824 kB
          EverZhang
        4. broker1.log-cleaner.log.2019-01-11-17
          5 kB
          EverZhang
        5. broker2.log-cleaner.log.2019-01-11-17
          8 kB
          EverZhang
        6. broker3.log-cleaner.log.2019-01-11-17
          5 kB
          EverZhang

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Ever600686 EverZhang
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: