Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3069

A failure on SecondaryNameNode truncates the primary NameNode image.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.13.0
    • 0.16.3
    • None
    • None
    • Reviewed

    Description

      When the primary name-node pulls the new image from the secondary,
      and the transfer fails for some reason then the primary considers the new image,
      which may not be completely transfered yet or may be not transfered at all,
      as a valid one and will roll it into the new files system image, which will be either corrupted or empty.
      The problem here is that the error message from the secondary node does not reach the primary.
      And this happens because TransferFsImage.getFileServer() closes the connection output stream
      in its finalize section. The secondary later sends the error reply which cannot be received by the primary
      and causes the following exception on the secondary:

      08/03/21 12:16:52 ERROR NameNode.Secondary: java.io.FileNotFoundException: \hadoop-data\hdfs\namesecondary\destimage.tmp (The system cannot find the file specified)
      08/03/21 12:16:56 WARN /: /getimage?getimage=1: 
      java.lang.IllegalStateException: Committed
      	at org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212)
      	at org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375)
      	at org.apache.hadoop.dfs.SecondaryNameNode$GetImageServlet.doGet(SecondaryNameNode.java:485)
      	at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
      	at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
      	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
      	at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
      	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
      	at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
      	at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
      	at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
      	at org.mortbay.http.HttpServer.service(HttpServer.java:954)
      	at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
      	at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
      	at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
      	at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
      	at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
      	at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
      

      But the exception does not effect the behavior of the primary node. Since the stream is closed the primary thinks
      the file transfer was successfully finished and acts further accordingly.
      There 2 bugs that need to be fixed here.

      1. The error message should be delivered to the primary, and the primary should not corrupt its image in case of an error.
      2. The doGet() method of both HttpServlet-s should catch not only IOException-s but any exceptions.
        If we miss NPE or SecurityException the main image will truncated.

      Attachments

        1. TruncatePrimaryImageBug.patch
          11 kB
          Konstantin Shvachko
        2. TruncatePrimaryImageBug.patch
          11 kB
          Konstantin Shvachko

        Issue Links

          Activity

            People

              shv Konstantin Shvachko
              shv Konstantin Shvachko
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: