Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-12560

Accidental delete of some log files kafka-authorizer.log and kafka-request.log can break topics in cluster

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.2.1
    • Fix Version/s: None
    • Component/s: log
    • Labels:
      None
    • Environment:
      AWS EC2 i3.xlarge
    • Flags:
      Patch

      Description

      These two log files have 0 byte size and last created modified or updated  along time ago and these files seem to be created on startup.

      When disks fill up, sometimes admins go through old log files and delete them. In this case, these two .log were also picked up. Note that I acknowledge this is a mistake as it should be filtered by extension but nevertheless anything in /var/log should not cause process to have issues and if the file doesn't exist, the process should simply recreate it.

      The observed effect of this on a HA kafka cluster with 3 nodes, the replica members were out of sync with just one member being leader and not in sync with rest of the nodes while some topics had two in sync instead of 3 and the leader had a -1 status. Restarting kafka did not resolve so in order to resolve, kafka was flushed in one of the node where it had lots of -1 leader status and then rest of the nodes restarted.  However, this should not happen even if someone deleted kafka-authorizer.log and kafka-request.log and should be more resistant of breaking down due to file system changes.

      Issue occurs several hours later.

       

      root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime +30 -print | xargs ls -ll |grep '.log$'
      -rw-r--r-- 1 root root        0 Nov 26 05:38 /var/log/kafka/kafka-authorizer.log
      -rw-r--r-- 1 root root        0 Nov 26 05:38 /var/log/kafka/kafka-request.log
      
      root@ip-172-17-1-63:/var/log/kafka# stat kafka-request.log                                                                                                                                                         
        File: kafka-request.log                                                                                                                                                                                          
        Size: 0               Blocks: 0          IO Block: 4096   regular empty file                                                                                                                                     
      Device: 34h/52d Inode: 768131      Links: 1                                                                                                                                                                        
      Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)                                                                                                                                           
      Access: 2020-11-26 05:38:04.724561118 +0000                                                                                                                                                                        
      Modify: 2020-11-26 05:38:04.724561118 +0000                                                                                                                                                                        
      Change: 2020-11-26 05:38:04.724561118 +0000                                                                                                                                                                        
       Birth: -                                                                                                                                                                                                          
      root@ip-172-17-1-63:/var/log/kafka# stat kafka-authorizer.log                                                                                                                                                      
        File: kafka-authorizer.log                                                                                                                                                                                       
        Size: 0               Blocks: 0          IO Block: 4096   regular empty file                                                                                                                                     
      Device: 34h/52d Inode: 768132      Links: 1                                                                                                                                                                        
      Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)                                                                                                                                           
      Access: 2020-11-26 05:38:04.724561118 +0000                                                              
      Modify: 2020-11-26 05:38:04.724561118 +0000                                                              
      Change: 2020-11-26 05:38:04.724561118 +0000                                                              
       Birth: -                                                                                                
      

      Checking file handles in staging, I can see these files while not updated in 30 days are part of java process and the pid was current:

      root@ip-172-17-1-63:/var/log/kafka# lsof kafka-authorizer.log
      COMMAND PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
      java     28 root   86w   REG   0,52        0 768132 kafka-authorizer.log
      root@ip-172-17-1-63:/var/log/kafka# lsof kafka-request.log
      COMMAND PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
      java     28 root   85w   REG   0,52        0 768131 kafka-request.log
      root@ip-172-17-1-63:/var/log/kafka# ps -ef | grep java
      root        28    11 83 20:12 ?        02:32:37 /usr/local/openjdk-11/bin/java -Xmx2G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=9999 -Dkafka.logs.dir=/var/log/kafka -Dlog4j.configuration=file:/opt/kafka_2.12-2.2.1/bin/../config/log4j.properties -cp /opt/kafka_2.12-2.2.1/bin/../libs/activation-1.1.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/commons-lang3-3.8.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-api-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-basic-auth-extension-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-file-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-json-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-runtime-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-transforms-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/guava-20.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-annotations-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-core-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-databind-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-datatype-jdk8-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-jaxrs-base-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-jaxrs-json-provider-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-module-jaxb-annotations-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.inject-1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-client-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-common-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-server-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-client-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-continuation-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-http-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-io-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-security-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-server-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-servlet-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-servlets-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-util-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-clients-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-log4j-appender-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-examples-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-scala_2.12-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-test-utils-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-tools-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka_2.12-2.2.1-sources.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka_2.12-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/log4j-1.2.17.jar:/opt/kafka_2.12-2.2.1/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/maven-artifact-3.6.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/reflections-0.9.11.jar:/opt/kafka_2.12-2.2.1/bin/../libs/rocksdbjni-5.15.10.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-library-2.12.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-reflect-2.12.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka_2.12-2.2.1/bin/../libs/slf4j-log4j12-1.7.25.jar:/opt/kafka_2.12-2.2.1/bin/../libs/snappy-java-1.1.7.2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/validation-api-1.1.0.Final.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zkclient-0.11.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zookeeper-3.4.13.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zstd-jni-1.3.8-1.jar kafka.Kafka /opt/kafka_2.12-2.2.1/config/server.properties
      root        29    10  0 20:12 ?        00:00:38 /usr/bin/java -Dzookeeper.log.dir=/var/log/zookeeper -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp /etc/zookeeper/conf:/usr/share/java/jline.jar:/usr/share/java/log4j-1.2.jar:/usr/share/java/xercesImpl.jar:/usr/share/java/xmlParserAPIs.jar:/usr/share/java/netty.jar:/usr/share/java/slf4j-api.jar:/usr/share/java/slf4j-log4j12.jar:/usr/share/java/zookeeper.jar -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=true org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/zookeeper/conf/zoo.cfg
      

      Below confirms that all extensions that do are not current log (identified by just ".log") and match these ".log.x", ".log.x.gz", ".backup" or ".gz" do not have open file handles and can be safely cleaned. To verify, I checked lsof for all files matching logrotate extensions modified in last 2 days:

      root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime -2 -ls | egrep -e '.log.$|.backup$|.gz$'
         768472    136 -rw-r--r--   1 root     root       137614 Mar 25 06:25 /var/log/kafka/server.log.1.gz
         768517     16 -rw-r--r--   1 root     root        13823 Mar 24 04:58 /var/log/kafka/log-cleaner.log.2.gz
         768431  10248 -rw-r--r--   1 root     root     10485918 Mar 24 01:27 /var/log/kafka/server.log.1-2021032406.backup
         768507    688 -rw-r--r--   1 root     root       701207 Mar 24 06:24 /var/log/kafka/server.log.2.gz
         768401  51508 -rw-r--r--   1 root     root     52742072 Mar 24 23:59 /var/log/kafka/controller.log.1-2021032506.backup
         768459  10248 -rw-r--r--   1 root     root     10485913 Mar 25 05:51 /var/log/kafka/server.log.1-2021032506.backup
         768432   1664 -rw-r--r--   1 root     root      1701962 Mar 24 05:54 /var/log/kafka/state-change.log.2.gz
         768071  51208 -rw-r--r--   1 root     root     52429175 Mar 24 20:16 /var/log/kafka/state-change.log.1-2021032506.backup
         768522     20 -rw-r--r--   1 root     root        16598 Mar 25 06:23 /var/log/kafka/log-cleaner.log.1.gz
         768465   1052 -rw-r--r--   1 root     root      1075433 Mar 25 06:08 /var/log/kafka/state-change.log.1.gz
         768562   3544 -rw-r--r--   1 root     root      3625106 Mar 25 06:24 /var/log/kafka/controller.log.1.gz
      root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime -2 -ls | egrep -e '.log.$|.backup$|.gz$' | awk '{print $11}' | xargs lsof
      

      However, I still think accidental deletion of any log files should not affect operational state of the Kafka cluster.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rufus Rufus Skyfii
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: