Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15438

Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.3.1, 3.4.0, 3.2.3
    • balancer & mover
    • None
    • Reviewed

    Description

      In HDFS disk balancer, the config parameter "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number of errors we can ignore for a specific move between two disks before it is abandoned.

      The parameter can accept value that >= 0. And setting the value to 0 should mean no error tolerance. However, setting the value to 0 will simply don't do the block copy even there is no disk error occur because the while loop condition item.getErrorCount() < getMaxError(item) will not satisfied.

      // Gets the next block that we can copy
      private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
                                               DiskBalancerWorkItem item) {
            while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
              try {
                ... //get the block
              }  catch (IOException e) {
                  item.incErrorCount();
              }
             if (item.getErrorCount() >= getMaxError(item)) {
              item.setErrMsg("Error count exceeded.");
              LOG.info("Maximum error count exceeded. Error count: {} Max error:{} ",
                  item.getErrorCount(), item.getMaxDiskErrors());
            }
      

      How to fix

      Change the while loop condition to support value 0.
       

      Attachments

        1. HDFS-15438.000.patch
          1.0 kB
          AMC-team
        2. HDFS-15438.001.patch
          2 kB
          AMC-team
        3. Screen Shot 2020-09-03 at 4.33.53 PM.png
          316 kB
          AMC-team

        Activity

          People

            AMC-team AMC-team
            AMC-team AMC-team
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: