Manoj Govindassamy Thanks for updating the patch as well providing a fix for this issues. Couple of minor comments.
Changes in queryWorkStatus can be pulled into a function and invoked twice for source and destination.
createWorkPlan – We left the "Disk Balancer" – in the datanode side so it is easy to grep for error messages in datanode logs. If you don't mind can you please put it back.
createWorkPlan – In this error messages it is easier for people to puzzle out the volume from the path than from UUID. I think we should leave the path in the error message. Please scan the file for removal of "Disk Balancer" from logging, also add that string in LOG messages that are added new.
nit : Line : 655 function comments say we are returning volume, but in reality we are returning volume UUID.
copyBlocks#1013, 1021 : this.setExitFlag is spurious, since we return from the function in the next line. That flag never gets evaluated. Also would you be able to set the path of volume in the error string instead of the UUID of the volumes.
FsDataSetImpl.java#getFsVolumeReference Do we really need to a function to FsDataSet Interface ? Cannot we do this via getVolumeReferences and using a helper function in DiskBalancer.java itself, I don't think we should add this to FsDatasetSpi.
TestDiskBalancer.java#577 – You might want to assert that test failed along with logging an info for the reconfig error.
TestDiskBalancer.java#597 – We verify that we are getting DiskbalancerException, but not the payload inside the exception. Would it make sense to verify that error is indeed some kind of volume error and string verify that verifies whether it is the source or dest ?