Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7996

After swapping a volume, BlockReceiver reports ReplicaNotFoundException

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.7.0
    • Component/s: datanode
    • Labels:
      None
    • Target Version/s:

      Description

      When removing a disk from an actively writing DataNode, the BlockReceiver working on the disk throws ReplicaNotFoundException because the replicas are removed from the memory:

      2015-03-26 08:02:43,154 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removed volume: /data/2/dfs/dn/current
      2015-03-26 08:02:43,163 INFO org.apache.hadoop.hdfs.server.common.Storage: Removing block level storage: /data/2/dfs/dn/current/BP-51301509-10.20.202.114-1427296597742
      2015-03-26 08:02:43,163 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver.run():
      org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Cannot append to a non-existent replica BP-51301509-10.20.202.114-1427296597742:blk_1073742979_2160
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getReplicaInfo(FsDatasetImpl.java:615)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1362)
              at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.finalizeBlock(BlockReceiver.java:1281)
              at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1241)
              at java.lang.Thread.run(Thread.java:745)
      

      FsVolumeList#removeVolume waits all threads release FsVolumeReference on the volume to be removed, however, in PacketResponder#finalizeBlock(), it calls

      private void finalizeBlock(long startTime) throws IOException {
            BlockReceiver.this.close();
            final long endTime = ClientTraceLog.isInfoEnabled() ? System.nanoTime()
                : 0;
            block.setNumBytes(replicaInfo.getNumBytes());
            datanode.data.finalizeBlock(block);
      

      The FsVolumeReference was released in BlockReceiver.this.close() before calling datanode.data.finalizeBlock(block).

        Attachments

        1. HDFS-7996.000.patch
          6 kB
          Lei (Eddy) Xu
        2. HDFS-7996.001.patch
          6 kB
          Lei (Eddy) Xu
        3. HDFS-7996.002.patch
          7 kB
          Lei (Eddy) Xu

          Activity

            People

            • Assignee:
              eddyxu Lei (Eddy) Xu
              Reporter:
              eddyxu Lei (Eddy) Xu
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: