Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.13.0
-
None
-
None
-
Reviewed
Description
When the primary name-node pulls the new image from the secondary,
and the transfer fails for some reason then the primary considers the new image,
which may not be completely transfered yet or may be not transfered at all,
as a valid one and will roll it into the new files system image, which will be either corrupted or empty.
The problem here is that the error message from the secondary node does not reach the primary.
And this happens because TransferFsImage.getFileServer() closes the connection output stream
in its finalize section. The secondary later sends the error reply which cannot be received by the primary
and causes the following exception on the secondary:
08/03/21 12:16:52 ERROR NameNode.Secondary: java.io.FileNotFoundException: \hadoop-data\hdfs\namesecondary\destimage.tmp (The system cannot find the file specified) 08/03/21 12:16:56 WARN /: /getimage?getimage=1: java.lang.IllegalStateException: Committed at org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212) at org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375) at org.apache.hadoop.dfs.SecondaryNameNode$GetImageServlet.doGet(SecondaryNameNode.java:485) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
But the exception does not effect the behavior of the primary node. Since the stream is closed the primary thinks
the file transfer was successfully finished and acts further accordingly.
There 2 bugs that need to be fixed here.
- The error message should be delivered to the primary, and the primary should not corrupt its image in case of an error.
- The doGet() method of both HttpServlet-s should catch not only IOException-s but any exceptions.
If we miss NPE or SecurityException the main image will truncated.
Attachments
Attachments
Issue Links
- is part of
-
HADOOP-2585 Automatic namespace recovery from the secondary image.
- Closed