Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11472

Fix inconsistent replica size after a data pipeline failure

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 2.7.4, 3.0.0-beta1, 2.8.2
    • Component/s: datanode
    • Labels:
      None

      Description

      We observed a case where a replica's on disk length is less than acknowledged length, breaking the assumption in recovery code.

      2017-01-08 01:41:03,532 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to obtain replica info for block (=BP-947993742-10.204.0.136-1362248978912:blk_2526438952_1101394519586) from datanode (=DatanodeInfoWithStorage[10.204.138.17:1004,null,null])
      java.io.IOException: THIS IS NOT SUPPOSED TO HAPPEN: getBytesOnDisk() < getVisibleLength(), rip=ReplicaBeingWritten, blk_2526438952_1101394519586, RBW
        getNumBytes()     = 27530
        getBytesOnDisk()  = 27006
        getVisibleLength()= 27268
        getVolume()       = /data/6/hdfs/datanode/current
        getBlockFile()    = /data/6/hdfs/datanode/current/BP-947993742-10.204.0.136-1362248978912/current/rbw/blk_2526438952
        bytesAcked=27268
        bytesOnDisk=27006
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery(FsDatasetImpl.java:2284)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery(FsDatasetImpl.java:2260)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:2566)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.callInitReplicaRecovery(DataNode.java:2577)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2645)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.access$400(DataNode.java:245)
              at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2551)
              at java.lang.Thread.run(Thread.java:745)
      

      It turns out that if an exception is thrown within BlockReceiver#receivePacket, the in-memory replica on disk length may not be updated, but the data is written to disk anyway.

      For example, here's one exception we observed

      2017-01-08 01:40:59,512 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-947993742-10.204.0.136-1362248978912:blk_2526438952_1101394499067
      java.nio.channels.ClosedByInterruptException
              at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
              at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:269)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.adjustCrcChannelPosition(FsDatasetImpl.java:1484)
              at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.adjustCrcFilePosition(BlockReceiver.java:994)
              at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:670)
              at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:857)
              at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:797)
              at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
              at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
              at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
              at java.lang.Thread.run(Thread.java:745)
      

      There are potentially other places and causes where an exception is thrown within BlockReceiver#receivePacket, so it may not make much sense to alleviate it for this particular exception. Instead, we should improve replica recovery code to handle the case where ondisk size is less than acknowledged size, and update in-memory checksum accordingly.

        Attachments

        1. HDFS-11472.001.patch
          10 kB
          Wei-Chiu Chuang
        2. HDFS-11472.002.patch
          10 kB
          Wei-Chiu Chuang
        3. HDFS-11472.003.patch
          10 kB
          Wei-Chiu Chuang
        4. HDFS-11472.004.patch
          8 kB
          Erik Krogen
        5. HDFS-11472.005.patch
          8 kB
          Erik Krogen
        6. HDFS-11472.testcase.patch
          2 kB
          Wei-Chiu Chuang
        7. HDFS-11472-branch-2.005.patch
          8 kB
          Erik Krogen
        8. HDFS-11472-branch-2.7.005.patch
          7 kB
          Erik Krogen
        9. HDFS-11472-branch-2.8.005.patch
          8 kB
          Erik Krogen

          Activity

            People

            • Assignee:
              xkrogen Erik Krogen
              Reporter:
              jojochuang Wei-Chiu Chuang
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: