Description
When removing a disk from an actively writing DataNode, the BlockReceiver working on the disk throws ReplicaNotFoundException because the replicas are removed from the memory:
2015-03-26 08:02:43,154 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removed volume: /data/2/dfs/dn/current 2015-03-26 08:02:43,163 INFO org.apache.hadoop.hdfs.server.common.Storage: Removing block level storage: /data/2/dfs/dn/current/BP-51301509-10.20.202.114-1427296597742 2015-03-26 08:02:43,163 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver.run(): org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Cannot append to a non-existent replica BP-51301509-10.20.202.114-1427296597742:blk_1073742979_2160 at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getReplicaInfo(FsDatasetImpl.java:615) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1362) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.finalizeBlock(BlockReceiver.java:1281) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1241) at java.lang.Thread.run(Thread.java:745)
FsVolumeList#removeVolume waits all threads release FsVolumeReference on the volume to be removed, however, in PacketResponder#finalizeBlock(), it calls
private void finalizeBlock(long startTime) throws IOException { BlockReceiver.this.close(); final long endTime = ClientTraceLog.isInfoEnabled() ? System.nanoTime() : 0; block.setNumBytes(replicaInfo.getNumBytes()); datanode.data.finalizeBlock(block);
The FsVolumeReference was released in BlockReceiver.this.close() before calling datanode.data.finalizeBlock(block).