Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14646

Standby NameNode should not upload fsimage to an inappropriate NameNode.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 3.1.2
    • None
    • hdfs

    Description

      Problem Description:
      In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed.

      Depending on the version of Jetty, this behavior can lead to different consequences : 

      1.Under Hadoop 2.7.2 (with Jetty 6.1.26)
      After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN side, so the SNN will insignificantly send the FsImage to the peer NN continuously, causing a waste of time and bandwidth. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB, This is indeed a big waste.

      2.Under newest release-3.2.0-RC1 (with Jetty 9.3.24) and trunk (with Jetty 9.3.27)
      After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will be auto closed, and then SNN will directly get an "Error writing request body to server" exception, as below, note this test needs a relatively big FSImage (e.g. 10MB level):

      2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_0000000000003364240, fileSize: 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 bytes.
       java.io.IOException: Error writing request body to server
       at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
       at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
       at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
       at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
       at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
       at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
       at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
       at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
       2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_0000000000003364240, fileSize: 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 bytes.
       java.io.IOException: Error writing request body to server
       at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
       at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
       at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
       at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)  

                        

      Solution:
      A standby NameNode should not upload fsimage to an inappropriate NameNode, when he plans to put a FsImage to the peer NN, he need to check whether he really need to put it at this time.

      In detail, local SNN should establish an HTTP connection with the peer NN, send the put request, and then immediately read the response (this is the key point). If the peer NN does not reply an HTTP_OK, it means the local SNN should not put image at this time.

      Attachments

        1. HDFS-14646.000.patch
          17 kB
          Xudong Cao
        2. HDFS-14646.001.patch
          32 kB
          Xudong Cao
        3. HDFS-14646.002.patch
          32 kB
          Hemanth Boyina

        Issue Links

          Activity

            People

              xudongcao Xudong Cao
              xudongcao Xudong Cao
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: