- Inside Datanode#HandleVolumeFailures(), removing a failed volume is a 2-step process.
- First it's removed from from the volumes list
- Later in time are the replicas scrubbed from the volume map
- A concurrent thread generating blockReports may access the replicaMap accessing a non existing VolumeID.
He made a fix for that and we have been using it on our clusters since Hadoop-2.7.
By analyzing the code, the bug is still applicable to Trunk.
- The path Datanode#removeVolumes() is safe because the two step process in FsDataImpl.removeVolumes() FsDatasetImpl.java#L577 is protected by datasetWriteLock .
- The path Datanode#handleVolumeFailures() is not safe because the failed volume is removed from the list without acquiring datasetWriteLock.FsVolumList#239
The race condition can cause the caller of getBlockReports() to throw NPE if the RUR is referring to a volume that has already been removed FsDatasetImpl.java#L1976.