Details
-
Improvement
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
2.8.2
-
None
-
None
-
None
Description
In a heterogeneous HDFS cluster, datanode capacity and usage are very different.
Now we can use HDFS-8131, a usage-aware block placement policy to deal with the problem. However, this policy could be more flexible.
1, The probability of a node with high usage being chosen is fixed once the parameter is set. That is the probability is always the same no matter its usage is 90% or 70%. When the usage of a node is close to full, its probability of being chosen should be lower.
2, When the difference of usage is below 5%(hard code), the two nodes are considered the same usage. I think it's OK when usage is 30% and 35%, but when usage is 93% and 98%, they should not be treated equally. The correction of probability could be more smooth.
In my opinion, when we choose one node from two candidates (A: usage 30%, B: usage 60%), we can calculate the probability according to the available storage. p(A) = 70%/(70% + 40%), p(B) = 40% (70% +40%). When a node is close to full, the probability would be very small.
Also we could have another factor to weaken this correctness, and make the modification not so aggressive.
Any thought? liushaohui
Attachments
Attachments
Issue Links
- relates to
-
HDFS-8131 Implement a space balanced block placement policy
- Resolved