[HDFS-3570] Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used space - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Patch Available
Priority: Minor
Resolution: Unresolved
Affects Version/s: 2.0.0-alpha
Fix Version/s: None
Component/s: balancer & mover
Labels:
- pull-request-available

Target Version/s:

3.5.0

Description

Report from a user here: https://groups.google.com/a/cloudera.org/d/msg/cdh-user/pIhNyDVxdVY/b7ENZmEvBjIJ, post archived at http://pastebin.com/eVFkk0A0

This user had a specific DN that had a large non-DFS usage among dfs.data.dirs, and very little DFS usage (which is computed against total possible capacity).

Balancer apparently only looks at the usage, and ignores to consider that non-DFS usage may also be high on a DN/cluster. Hence, it thinks that if a DFS Usage report from DN is 8% only, its got a lot of free space to write more blocks, when that isn't true as shown by the case of this user. It went on scheduling writes to the DN to balance it out, but the DN simply can't accept any more blocks as a result of its disks' state.

I think it would be better if we computed the actual utilization based on (100-(actual remaining space))/(capacity), as opposed to the current (dfs used)/(capacity). Thoughts?

This isn't very critical, however, cause it is very rare to see DN space being used for non DN data, but it does expose a valid bug.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-3570.003.patch
22/Jul/15 06:09
5 kB
Akira Ajisaka
HDFS-3570.2.patch
06/Feb/14 07:46
15 kB
Akira Ajisaka
HDFS-3570.aash.1.patch
06/Feb/14 05:01
3 kB
Andrew Ash

Issue Links

is related to

HDFS-8278 HDFS Balancer should consider remaining storage % when checking for under-utilized machines

Resolved

links to

GitHub Pull Request #5044

Activity

People

Assignee:: Ashutosh Gupta

Reporter:: Harsh J

Votes:: 1 Vote for this issue

Watchers:: 19 Start watching this issue

Dates

Created:: 26/Jun/12 21:19

Updated:: 04/Jan/24 07:55