Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8825 Enhancements to Balancer
  3. HDFS-8278

HDFS Balancer should consider remaining storage % when checking for under-utilized machines

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: balancer & mover
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      DFS balancer mistakenly identifies a node with very little storage space remaining as an "underutilized" node and tries to move large amounts of data to that particular node.

      All these block moves fail to execute successfully, as the % utilization is less relevant than the dfs remaining storage on that node.

      15/04/24 04:25:55 INFO balancer.Balancer: 0 over-utilized: []
      15/04/24 04:25:55 INFO balancer.Balancer: 1 underutilized: [172.19.1.46:50010:DISK]
      15/04/24 04:25:55 INFO balancer.Balancer: Need to move 47.68 GB to make the cluster balanced.
      15/04/24 04:25:55 INFO balancer.Balancer: Decided to move 413.08 MB bytes from 172.19.1.52:50010:DISK to 172.19.1.46:50010:DISK
      15/04/24 04:25:55 INFO balancer.Balancer: Will move 413.08 MB in this iteration
      15/04/24 04:25:55 WARN balancer.Dispatcher: Failed to move blk_1078689321_1099517353638 with size=131146 from 172.19.1.52:50010:DISK to 172.19.1.46:50010:DISK through 172.19.1.53:50010: Got error, status message opReplaceBlock BP-942051088-172.18.1.41-1370508013893:blk_1078689321_1099517353638 received exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The volume with the most available space (=225042432 B) is less than the block size (=268435456 B)., block move is failed
      

      The machine in concern is under-full when it comes to the BP utilization, but has very little free space available for blocks.

      Decommission Status : Normal
      Configured Capacity: 3826907185152 (3.48 TB)
      DFS Used: 2817262833664 (2.56 TB)
      Non DFS Used: 1000621305856 (931.90 GB)
      DFS Remaining: 9023045632 (8.40 GB)
      DFS Used%: 73.62%
      DFS Remaining%: 0.24%
      Configured Cache Capacity: 8589934592 (8 GB)
      Cache Used: 0 (0 B)
      Cache Remaining: 8589934592 (8 GB)
      Cache Used%: 0.00%
      Cache Remaining%: 100.00%
      Xceivers: 3
      Last contact: Fri Apr 24 04:28:36 PDT 2015
      

      The machine has 0.40 Gb of non-RAM storage available on that node, so it is futile to attempt to move any blocks to that particular machine.

      This is a similar concern when a machine loses disks, since the comparisons of utilization always compare percentages per-node. Even that scenario needs to cap data movement to that node to the "DFS Remaining %" variable.

      Trying to move any more data than that to a given node will always fail.

      1. h8278_20150817.patch
        3 kB
        Tsz Wo Nicholas Sze

        Issue Links

          Activity

          Hide
          szetszwo Tsz Wo Nicholas Sze added a comment -

          Balancer does consider remaining storage, which is used to compute max-size-to-move. The problem here is that datanode will throw DiskOutOfSpaceException if there is no space for a full block. In the description, the required size is only 131146 (~= 128k) but default block size is 268435456 (=256M).

          15/04/24 04:25:55 WARN balancer.Dispatcher: Failed to move blk_1078689321_1099517353638 with size=131146 from 172.19.1.52:50010:DISK to 172.19.1.46:50010:DISK through 172.19.1.53:50010: Got error, status message opReplaceBlock BP-942051088-172.18.1.41-1370508013893:blk_1078689321_1099517353638 received exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The volume with the most available space (=225042432 B) is less than the block size (=268435456 B)., block move is failed
          
          Show
          szetszwo Tsz Wo Nicholas Sze added a comment - Balancer does consider remaining storage, which is used to compute max-size-to-move. The problem here is that datanode will throw DiskOutOfSpaceException if there is no space for a full block. In the description, the required size is only 131146 (~= 128k) but default block size is 268435456 (=256M). 15/04/24 04:25:55 WARN balancer.Dispatcher: Failed to move blk_1078689321_1099517353638 with size=131146 from 172.19.1.52:50010:DISK to 172.19.1.46:50010:DISK through 172.19.1.53:50010: Got error, status message opReplaceBlock BP-942051088-172.18.1.41-1370508013893:blk_1078689321_1099517353638 received exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The volume with the most available space (=225042432 B) is less than the block size (=268435456 B)., block move is failed
          Hide
          szetszwo Tsz Wo Nicholas Sze added a comment -

          h8278_20150817.patch: counts only the storage with remaining storage >= default block size.

          I also removes the use of threshold in computeMaxSize2Move(..).

          Show
          szetszwo Tsz Wo Nicholas Sze added a comment - h8278_20150817.patch: counts only the storage with remaining storage >= default block size. I also removes the use of threshold in computeMaxSize2Move(..).
          Hide
          jingzhao Jing Zhao added a comment -

          +1 pending Jenkins.

          Show
          jingzhao Jing Zhao added a comment - +1 pending Jenkins.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 17m 48s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 javac 7m 58s There were no new javac warning messages.
          +1 javadoc 9m 56s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 1m 26s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 23s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 2m 32s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 native 3m 10s Pre-build of native portion
          -1 hdfs tests 174m 22s Tests failed in hadoop-hdfs.
              219m 35s  



          Reason Tests
          Failed unit tests hadoop.fs.viewfs.TestViewFsWithXAttrs
          Timed out tests org.apache.hadoop.cli.TestHDFSCLI



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12750868/h8278_20150817.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / c77bd6a
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12012/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12012/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12012/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 17m 48s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 58s There were no new javac warning messages. +1 javadoc 9m 56s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 1m 26s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 23s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 2m 32s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 10s Pre-build of native portion -1 hdfs tests 174m 22s Tests failed in hadoop-hdfs.     219m 35s   Reason Tests Failed unit tests hadoop.fs.viewfs.TestViewFsWithXAttrs Timed out tests org.apache.hadoop.cli.TestHDFSCLI Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12750868/h8278_20150817.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / c77bd6a hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12012/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12012/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12012/console This message was automatically generated.
          Hide
          szetszwo Tsz Wo Nicholas Sze added a comment -

          Thanks Jing for reviewing the patch.

          I have committed this.

          Show
          szetszwo Tsz Wo Nicholas Sze added a comment - Thanks Jing for reviewing the patch. I have committed this.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8315 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8315/)
          HDFS-8278. When computing max-size-to-move in Balancer, count only the storage with remaining >= default block size. (szetszwo: rev 51a00964da0e399718d1cec25ff692a32d7642b7)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8315 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8315/ ) HDFS-8278 . When computing max-size-to-move in Balancer, count only the storage with remaining >= default block size. (szetszwo: rev 51a00964da0e399718d1cec25ff692a32d7642b7) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #1021 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1021/)
          HDFS-8278. When computing max-size-to-move in Balancer, count only the storage with remaining >= default block size. (szetszwo: rev 51a00964da0e399718d1cec25ff692a32d7642b7)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1021 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1021/ ) HDFS-8278 . When computing max-size-to-move in Balancer, count only the storage with remaining >= default block size. (szetszwo: rev 51a00964da0e399718d1cec25ff692a32d7642b7) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #291 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/291/)
          HDFS-8278. When computing max-size-to-move in Balancer, count only the storage with remaining >= default block size. (szetszwo: rev 51a00964da0e399718d1cec25ff692a32d7642b7)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #291 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/291/ ) HDFS-8278 . When computing max-size-to-move in Balancer, count only the storage with remaining >= default block size. (szetszwo: rev 51a00964da0e399718d1cec25ff692a32d7642b7) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #288 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/288/)
          HDFS-8278. When computing max-size-to-move in Balancer, count only the storage with remaining >= default block size. (szetszwo: rev 51a00964da0e399718d1cec25ff692a32d7642b7)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #288 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/288/ ) HDFS-8278 . When computing max-size-to-move in Balancer, count only the storage with remaining >= default block size. (szetszwo: rev 51a00964da0e399718d1cec25ff692a32d7642b7) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2237 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2237/)
          HDFS-8278. When computing max-size-to-move in Balancer, count only the storage with remaining >= default block size. (szetszwo: rev 51a00964da0e399718d1cec25ff692a32d7642b7)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2237 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2237/ ) HDFS-8278 . When computing max-size-to-move in Balancer, count only the storage with remaining >= default block size. (szetszwo: rev 51a00964da0e399718d1cec25ff692a32d7642b7) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2218 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2218/)
          HDFS-8278. When computing max-size-to-move in Balancer, count only the storage with remaining >= default block size. (szetszwo: rev 51a00964da0e399718d1cec25ff692a32d7642b7)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2218 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2218/ ) HDFS-8278 . When computing max-size-to-move in Balancer, count only the storage with remaining >= default block size. (szetszwo: rev 51a00964da0e399718d1cec25ff692a32d7642b7) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #280 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/280/)
          HDFS-8278. When computing max-size-to-move in Balancer, count only the storage with remaining >= default block size. (szetszwo: rev 51a00964da0e399718d1cec25ff692a32d7642b7)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #280 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/280/ ) HDFS-8278 . When computing max-size-to-move in Balancer, count only the storage with remaining >= default block size. (szetszwo: rev 51a00964da0e399718d1cec25ff692a32d7642b7) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

            People

            • Assignee:
              szetszwo Tsz Wo Nicholas Sze
              Reporter:
              gopalv Gopal V
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development