Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10490

Client may never recovery replica after a timeout during sending packet

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.6.0
    • None
    • datanode
    • None

    Description

      For newly created replica, a meta file is created in constructor of BlockReceiver (for WRITE_BLOCK op). Its header will be written lazily (buffered in memory first by BufferedOutputStream).
      If following packets fail to deliver (e.g. in extreme network condition), the header may never get flush until closed.
      However, BlockReceiver will not call close until block receiving is finished or exception(s) encountered. Also in extreme network condition, both RST & FIN may not deliver in time.

      In this case, if client tries to initiates a transferBlock to a new datanode (in addDatanode2ExistingPipeline), existing datanode will see an empty meta if its BlockReceiver did not close in time.
      Then, after HDFS-3429, a default DataChecksum (NULL, 512) will be used during transfer. So when client then tries to recover pipeline after completely transferred, it may encounter the following exception:

      java.io.IOException: Client requested checksum DataChecksum(type=CRC32C, chunkSize=4096) when appending to an existing block with different chunk size: DataChecksum(type=NULL, chunkSize=512)
              at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.createStreams(ReplicaInPipeline.java:230)
              at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:226)
              at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:798)
              at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
              at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:76)
              at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:243)
              at java.lang.Thread.run(Thread.java:745)
      

      This will repeat, until exhausted by datanode replacement policy.

      Also to note that, with bad luck (like I), 20k clients are all doing this. It's to some extend a DDoS attack to NameNode (because of getAdditionalDataNode calls).

      I suggest we flush immediately after header is written, preventing anybody from seeing empty meta file for avoiding the issue.

      Attachments

        1. HDFS-10490.patch
          0.8 kB
          He Tianyi
        2. HDFS-10490.0001.patch
          5 kB
          He Tianyi

        Issue Links

          Activity

            People

              Unassigned Unassigned
              He Tianyi He Tianyi
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: