Details
Description
Scenario:
1. 3 Node cluster with "dfs.client.block.write.replace-datanode-on-failure.policy" as DEFAULT
Block is written with x data.
2. One of the Datanode, NOT the first DN, is down
3. Client tries to append data to block and fails since one DN is down.
4. calls recoverLease() on the file.
5. Successfull recovery happens.
Issue:
1. DNs which were connected from client before encountering mirror down, will have the reservedSpaceForReplicas incremented, BUT never decremented.
2. So in long run DN's all space will be in reservedSpaceForReplicas resulting OutOfSpace errors.