Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13727

Log full stack trace if DiskBalancer exits with an unhandled exception

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0.3
    • Fix Version/s: 3.2.0, 3.0.4, 3.1.2
    • Component/s: diskbalancer
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      In HDFS-13175 it was discovered that when a DN reports the usage on a volume to be greater than the volume capacity, the disk balancer will fail with an unhelpful error:

      $ hdfs diskbalancer -report -top 5
      
      18/06/11 10:19:43 INFO command.Command: Processing report command
      18/06/11 10:19:44 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
      18/06/11 10:19:44 INFO block.BlockTokenSecretManager: Setting block keys
      18/06/11 10:19:44 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
      18/06/11 10:19:44 ERROR tools.DiskBalancerCLI: java.lang.IllegalArgumentException
      

      In HDFS-13175, a change was made to include more details in the exception name, so after the change the code is:

        public void setUsed(long dfsUsedSpace) {
          Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
              "DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
              dfsUsedSpace, getCapacity());
          this.used = dfsUsedSpace;
        }
      

      There may however be other scenarios that cause the balancer to exit with an unhandled exception, and it would be helpful if the tool logged out the full stack trace on error rather than just the exception name.

      In DiskBalancerCLI.java, the relevant code is:

        public static void main(String[] argv) throws Exception {
          DiskBalancerCLI shell = new DiskBalancerCLI(new HdfsConfiguration());
          int res = 0;
          try {
            res = ToolRunner.run(shell, argv);
          } catch (Exception ex) {
            LOG.error(ex.toString());
            res = 1;
          }
          System.exit(res);
        }
      

      We should change the error logged in the exception block to log out the full stack to give more information on all unhandled errors, eg:

      LOG.error(ex.toString(), ex);
      

        Attachments

        1. HDFS-13727.002.patch
          1 kB
          Gabor Bota
        2. HDFS-13727.001.patch
          1 kB
          Gabor Bota

          Activity

            People

            • Assignee:
              gabor.bota Gabor Bota
              Reporter:
              sodonnell Stephen O'Donnell
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: