Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
In HDFS disk balancer, the config parameter "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number of errors we can ignore for a specific move between two disks before it is abandoned.
The parameter can accept value that >= 0. And setting the value to 0 should mean no error tolerance. However, setting the value to 0 will simply don't do the block copy even there is no disk error occur because the while loop condition item.getErrorCount() < getMaxError(item) will not satisfied.
// Gets the next block that we can copy private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, DiskBalancerWorkItem item) { while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { try { ... //get the block } catch (IOException e) { item.incErrorCount(); } if (item.getErrorCount() >= getMaxError(item)) { item.setErrMsg("Error count exceeded."); LOG.info("Maximum error count exceeded. Error count: {} Max error:{} ", item.getErrorCount(), item.getMaxDiskErrors()); }
How to fix
Change the while loop condition to support value 0.