Currently the logic in choosing storage for blocks is not a good way. It always uses the first valid storage of a given StorageType (see DataNodeDescriptor#chooseStorage4Block). This should not be a good selection. That means blcoks will always be written to the same volume (first volume) and other valid volumes have no choices. This problem is brought up by this comment ( https://issues.apache.org/jira/browse/HDFS-9807?focusedCommentId=15878382&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15878382 )
There is one solution from me:
- First, based on existing storages in one node, extract all the valid storages into a collection.
- Then, disrupt the order of these vaild storages, get a new collection.
- Finally, get the first storage from the new storages collection.
These steps will be executed in DataNodeDescriptor#chooseStorage4Block and replace current logic. I think this improvement can be done as a subtask under
HDFS-11419. Any further comments are welcomed.