Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
I took a look at AvailableSpaceVolumeChoosingPolicy. It seems a bit overkill and includes some configuration items that seem a bit arbitrary with no real clear guidance on how to effectively use them:
dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction
dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold
I have created an alternative implementation that does not require any external configuration, is thread-safe, and requires no synchronization.
"Weighted Randomized Ordering"
http://stackoverflow.com/questions/23971365/weighted-randomized-ordering
Conceptually, a dart-board is constructed of several wedges, each wedge represents a disk volume. The more available space that a volume has relative to the other volumes, the larger its wedge. Then, a dart is thrown at the board and whichever wedge(volume) the dart lands on, that wedge is assigned the incoming block.
Over time, the wedges balance and all have an equal chance of being "hit."