Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7996

After swapping a volume, BlockReceiver reports ReplicaNotFoundException

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 2.6.0
    • 2.7.0
    • datanode
    • None

    Description

      When removing a disk from an actively writing DataNode, the BlockReceiver working on the disk throws ReplicaNotFoundException because the replicas are removed from the memory:

      2015-03-26 08:02:43,154 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removed volume: /data/2/dfs/dn/current
      2015-03-26 08:02:43,163 INFO org.apache.hadoop.hdfs.server.common.Storage: Removing block level storage: /data/2/dfs/dn/current/BP-51301509-10.20.202.114-1427296597742
      2015-03-26 08:02:43,163 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver.run():
      org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Cannot append to a non-existent replica BP-51301509-10.20.202.114-1427296597742:blk_1073742979_2160
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getReplicaInfo(FsDatasetImpl.java:615)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1362)
              at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.finalizeBlock(BlockReceiver.java:1281)
              at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1241)
              at java.lang.Thread.run(Thread.java:745)
      

      FsVolumeList#removeVolume waits all threads release FsVolumeReference on the volume to be removed, however, in PacketResponder#finalizeBlock(), it calls

      private void finalizeBlock(long startTime) throws IOException {
            BlockReceiver.this.close();
            final long endTime = ClientTraceLog.isInfoEnabled() ? System.nanoTime()
                : 0;
            block.setNumBytes(replicaInfo.getNumBytes());
            datanode.data.finalizeBlock(block);
      

      The FsVolumeReference was released in BlockReceiver.this.close() before calling datanode.data.finalizeBlock(block).

      Attachments

        1. HDFS-7996.002.patch
          7 kB
          Lei (Eddy) Xu
        2. HDFS-7996.001.patch
          6 kB
          Lei (Eddy) Xu
        3. HDFS-7996.000.patch
          6 kB
          Lei (Eddy) Xu

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            eddyxu Lei (Eddy) Xu
            eddyxu Lei (Eddy) Xu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment