[HDFS-13728] Disk Balancer should not fail if volume usage is greater than capacity - ASF JIRA

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.0.3
Fix Version/s: 3.2.0, 3.0.4, 3.1.2
Component/s: diskbalancer
Labels:
None

Hadoop Flags:

Reviewed

Description

We have seen a couple of scenarios where the disk balancer fails because a datanode reports more spaced used on a disk than its capacity, which should not be possible.

This is due to the check below in DiskBalancerVolume.java:

  public void setUsed(long dfsUsedSpace) {
    Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
        "DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
        dfsUsedSpace, getCapacity());
    this.used = dfsUsedSpace;
  }

While I agree that it should not be possible for a DN to report more usage on a volume than its capacity, there seems to be some issue that causes this to occur sometimes.

In general, this full disk is what causes someone to want to run the Disk Balancer, only to find it fails with the error.

There appears to be nothing you can do to force the Disk Balancer to run at this point, but in the scenarios I saw, some data was removed from the disk and usage dropped below the capacity resolving the issue.

Can we considered relaxing the above check, and if the usage is greater than the capacity, just set the usage to the capacity so the calculations all work ok?

Eg something like this:

   public void setUsed(long dfsUsedSpace) {
-    Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
-    this.used = dfsUsedSpace;
+    if (dfsUsedSpace > this.getCapacity()) {
+      this.used = this.getCapacity();
+    } else {
+      this.used = dfsUsedSpace;
+    }
   }

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-13728.001.patch
12/Jul/18 19:59
3 kB
Stephen O'Donnell

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Stephen O'Donnell

Reporter:: Stephen O'Donnell

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 11/Jul/18 16:33

Updated:: 08/Aug/18 05:28

Resolved:: 08/Aug/18 05:06

Agile

View on Board

Disk Balancer should not fail if volume usage is greater than capacity

Details

Description

Attachments

Attachments

Activity

People

Dates

Agile

Slack

Issue deployment