Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7156

Deleting topics with long names can bring all brokers to unrecoverable state

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.1.0
    • Fix Version/s: None
    • Component/s: core
    • Labels:
      None

      Description

      Kafka limit for the topic name is 249 symbols, so creating a topic with a name 248 symbol long is possible. However, when deleting the topic, Kafka tries to rename the data directory for the topic to add some hash and `-deleted` in the data directory, so that the resulting file name exceeds the 255 symbol file name limit in most of the Unix file systems. This provokes a  java.nio.file.FileSystemException which in turn immediately shuts down all the brokers. Further attemts to restart the broker fail with the same exception. The only way to resurrect the cluster is to manually delete the affected topic from zookeeper and from the disk on all the broker machines.

      Steps to reproduce:

      (Note: delete.topic.enable=true must be set in the config)

      > kafka-topics.sh --zookeeper localhost:2181 --create --topic aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa --partitions 1 --replication-factor 1
      > kafka-topics.sh --zookeeper localhost:2181 --delete --topic aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
       

      After these 2 commands executed all the brokers where this topic is replicated immediately shut down with the following logs:

      ERROR Error while renaming dir for aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-0 in log dir /tmp/kafka-logs (kafka.server.LogDirFailureChannel)
      
      java.nio.file.FileSystemException: /tmp/kafka-logs/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-0 -> /tmp/kafka-logs/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-0.093fd1e1728f438ea990cbad8a514b9f-delete: File name too long
      
      at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
      
      at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
      
      at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:457)
      
      at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
      
      at java.nio.file.Files.move(Files.java:1395)
      
      ...
      
      Suppressed: java.nio.file.FileSystemException: /tmp/kafka-logs/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-0 -> /tmp/kafka-logs/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-0.093fd1e1728f438ea990cbad8a514b9f-delete: File name too long
      
      at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
      
      at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
      
      at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
      
      at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
      
      at java.nio.file.Files.move(Files.java:1395)
      
      at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:694)
      
      ... 23 more
      
      [2018-07-12 13:34:45,847] INFO [ReplicaManager broker=0] Stopping serving replicas in dir /tmp/kafka-logs (kafka.server.ReplicaManager)
      
      [2018-07-12 13:34:45,848] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions  (kafka.server.ReplicaFetcherManager)
      
      [2018-07-12 13:34:45,849] INFO [ReplicaAlterLogDirsManager on broker 0] Removed fetcher for partitions  (kafka.server.ReplicaAlterLogDirsManager)
      
      [2018-07-12 13:34:45,851] INFO [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions  and stopped moving logs for partitions  because they are in the failed log directory /tmp/kafka-logs. (kafka.server.ReplicaManager)
      
      [2018-07-12 13:34:45,851] INFO Stopping serving logs in dir /tmp/kafka-logs (kafka.log.LogManager)
      
      [2018-07-12 13:34:45,854] ERROR Shutdown broker because all log dirs in /tmp/kafka-logs have failed (kafka.log.LogManager)
      
      [2018-07-12 13:34:46,264] WARN Exception causing close of session 0x1648e0b3ec80004 due to java.io.IOException: Connection reset by peer (org.apache.zookeeper.server.NIOServerCnxn)
      
      [2018-07-12 13:34:46,264] INFO Closed socket connection for client /0:0:0:0:0:0:0:1:63972 which had sessionid 0x1648e0b3ec80004 (org.apache.zookeeper.server.NIOServerCnxn)
      

       Note, that 

      [2018-07-12 13:34:45,854] ERROR Shutdown broker because all log dirs in /tmp/kafka-logs have failed (kafka.log.LogManager)

      is happening regardless whether the topic with a long name is the only one on the broker or not.

      Further attempts to restart the brokers fail with the same error until all the mentions of the deleted topic is removed from Zookeeper and the files are removed from the data directories on all the brokers.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                Pchelolo Petr Pchelko
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: