Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.8.0
-
None
-
None
Description
Overlapping lease recoveries for the same file will NPE in the DatanodeManager while creating LeaseRecoveryCommands, possibly losing other recovery commands.
- client1 calls recoverLease, file is added to DN1's recovery queue
- client2 calls recoverLease, file is added to DN2's recovery queue
- one DN heartbeats, gets the block recovery command and it completes the synchronization before the other DN heartbeats; ie. file is closed.
- other DN heartbeats, takes block from recovery queue, assumes it's still under construction, gets a NPE calling getExpectedLocations
//check lease recovery BlockInfo[] blocks = nodeinfo.getLeaseRecoveryCommand(Integer.MAX_VALUE); if (blocks != null) { BlockRecoveryCommand brCommand = new BlockRecoveryCommand( blocks.length); for (BlockInfo b : blocks) { BlockUnderConstructionFeature uc = b.getUnderConstructionFeature(); assert uc != null; final DatanodeStorageInfo[] storages = uc.getExpectedStorageLocations();
This is "ok" to the NN state if only 1 block was queued. All recoveries are lost if multiple blocks were queued. Recovery will not occur until the client explicitly retries or the lease monitor recovers the lease.