The idea of this jira is to discuss and implement an efficient failure(block movement failure) handling logic at the datanode cooridnator. Code reference.
Following are the possible errors during block movement:
- Block pinned - no retries marked as success/no-retry to NN. It is not possible to relocate this block to another datanode.
- Network errors(IOException) - no retries maked as failure/retry to NN.
- No disk space(IOException) - no retries maked as failure/retry to NN.
- Gen_Stamp mismatches - no retries marked as failure/retry to NN. Could be a case that the file might have re-opened.