Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
0.20-append
-
None
Description
Saw an issue where multiple datanodes are trying to recover at the same time, and all of them failed. I think the issue is in FSDataset.tryUpdateBlock, we do the rename of blk_B_OldGS to blk_B_OldGS_tmpNewGS and then check that the generation stamp is moving upwards. Because of this, invalid update block calls are blocked, but they then cause future updateBlock calls to fail with "Meta file not found" errors.