Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16408

Ensure LeaseRecheckIntervalMs is greater than zero

    XMLWordPrintableJSON

Details

    Description

      There is a problem with the try catch statement in the LeaseMonitor daemon (in LeaseManager.java), when an unknown exception is caught, it simply prints a warning message and continues with the next loop. 

      An extreme case is when the configuration item 'dfs.namenode.lease-recheck-interval-ms' is accidentally set to a negative number by the user, as the configuration item is read without checking its range, 'fsnamesystem. getLeaseRecheckIntervalMs()' returns this value and is used as an argument to Thread.sleep(). A negative argument will cause Thread.sleep() to throw an IllegalArgumentException, which will be caught by 'catch(Throwable e)' and a warning message will be printed. 

      This behavior is repeated for each subsequent loop. This means that a huge amount of repetitive messages will be printed to the log file in a short period of time, quickly consuming disk space and affecting the operation of the system.

      As you can see, 178M log files are generated in one minute.

       

      ll logs/
      total 174456
      drwxrwxr-x  2 hadoop hadoop      4096 1月   3 15:13 ./
      drwxr-xr-x 11 hadoop hadoop      4096 1月   3 15:13 ../
      -rw-rw-r--  1 hadoop hadoop     36342 1月   3 15:14 hadoop-hadoop-datanode-ljq1.log
      -rw-rw-r--  1 hadoop hadoop      1243 1月   3 15:13 hadoop-hadoop-datanode-ljq1.out
      -rw-rw-r--  1 hadoop hadoop 178545466 1月   3 15:14 hadoop-hadoop-namenode-ljq1.log
      -rw-rw-r--  1 hadoop hadoop       692 1月   3 15:13 hadoop-hadoop-namenode-ljq1.out
      -rw-rw-r--  1 hadoop hadoop     33201 1月   3 15:14 hadoop-hadoop-secondarynamenode-ljq1.log
      -rw-rw-r--  1 hadoop hadoop      3764 1月   3 15:14 hadoop-hadoop-secondarynamenode-ljq1.out
      -rw-rw-r--  1 hadoop hadoop         0 1月   3 15:13 SecurityAuth-hadoop.audit
       
      tail -n 15 logs/hadoop-hadoop-namenode-ljq1.log 
      2022-01-03 15:14:46,032 WARN org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
      java.lang.IllegalArgumentException: timeout value is negative
              at java.base/java.lang.Thread.sleep(Native Method)
              at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
              at java.base/java.lang.Thread.run(Thread.java:829)
      2022-01-03 15:14:46,033 WARN org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
      java.lang.IllegalArgumentException: timeout value is negative
              at java.base/java.lang.Thread.sleep(Native Method)
              at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
              at java.base/java.lang.Thread.run(Thread.java:829)
      2022-01-03 15:14:46,033 WARN org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
      java.lang.IllegalArgumentException: timeout value is negative
              at java.base/java.lang.Thread.sleep(Native Method)
              at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
              at java.base/java.lang.Thread.run(Thread.java:829)
      

       

      I think there are two potential solutions. 

      The first is to adjust the position of the try catch statement in the LeaseMonitor daemon by moving 'catch(Throwable e)' to the outside of the loop body. This can be done like the NameNodeResourceMonitor daemon, which ends the thread when an unexpected exception is caught. 

      The second is to use Precondition.checkArgument() to scope the configuration item 'dfs.namenode.lease-recheck-interval-ms' when it is read, to avoid the wrong configuration item can affect the subsequent operation of the program.

       

      Attachments

        Issue Links

          Activity

            People

              fujx ECFuzz
              fujx ECFuzz
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 1h Original Estimate - 1h
                  1h
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 20m
                  3h 20m