Issue Details (XML | Word | Printable)

Key: HADOOP-3069
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Konstantin Shvachko
Reporter: Konstantin Shvachko
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

A failure on SecondaryNameNode truncates the primary NameNode image.

Created: 21/Mar/08 09:21 PM   Updated: 08/Jul/09 04:42 PM
Return to search
Component/s: None
Affects Version/s: 0.13.0
Fix Version/s: 0.16.3

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works TruncatePrimaryImageBug.patch 2008-04-09 05:56 PM Konstantin Shvachko 11 kB
Text File Licensed for inclusion in ASF works TruncatePrimaryImageBug.patch 2008-04-09 01:27 AM Konstantin Shvachko 11 kB
Issue Links:
Incorporates
 

Hadoop Flags: Reviewed
Resolution Date: 09/Apr/08 09:37 PM


 Description  « Hide
When the primary name-node pulls the new image from the secondary,
and the transfer fails for some reason then the primary considers the new image,
which may not be completely transfered yet or may be not transfered at all,
as a valid one and will roll it into the new files system image, which will be either corrupted or empty.
The problem here is that the error message from the secondary node does not reach the primary.
And this happens because TransferFsImage.getFileServer() closes the connection output stream
in its finalize section. The secondary later sends the error reply which cannot be received by the primary
and causes the following exception on the secondary:
08/03/21 12:16:52 ERROR NameNode.Secondary: java.io.FileNotFoundException: \hadoop-data\hdfs\namesecondary\destimage.tmp (The system cannot find the file specified)
08/03/21 12:16:56 WARN /: /getimage?getimage=1: 
java.lang.IllegalStateException: Committed
	at org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212)
	at org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375)
	at org.apache.hadoop.dfs.SecondaryNameNode$GetImageServlet.doGet(SecondaryNameNode.java:485)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
	at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
	at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
	at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
	at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
	at org.mortbay.http.HttpServer.service(HttpServer.java:954)
	at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
	at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
	at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
	at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
	at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
	at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

But the exception does not effect the behavior of the primary node. Since the stream is closed the primary thinks
the file transfer was successfully finished and acts further accordingly.
There 2 bugs that need to be fixed here.

  1. The error message should be delivered to the primary, and the primary should not corrupt its image in case of an error.
  2. The doGet() method of both HttpServlet-s should catch not only IOException-s but any exceptions.
    If we miss NPE or SecurityException the main image will truncated.


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Repository Revision Date User Message
ASF #646548 Wed Apr 09 21:28:18 UTC 2008 shv HADOOP-3069. Primary name-node should not truncate image when transfering it from the secondary. Contributed by Konstantin Shvachko.
Files Changed
MODIFY /hadoop/core/trunk/src/java/org/apache/hadoop/dfs/TransferFsImage.java
MODIFY /hadoop/core/trunk/src/test/org/apache/hadoop/dfs/TestCheckpoint.java
MODIFY /hadoop/core/trunk/src/java/org/apache/hadoop/dfs/SecondaryNameNode.java
MODIFY /hadoop/core/trunk/src/java/org/apache/hadoop/dfs/GetImageServlet.java
MODIFY /hadoop/core/trunk/CHANGES.txt

Repository Revision Date User Message
ASF #646551 Wed Apr 09 21:30:32 UTC 2008 shv HADOOP-3069. Primary name-node should not truncate image when transferring it from the secondary. Contributed by Konstantin Shvachko.
Files Changed
MODIFY /hadoop/core/branches/branch-0.17/src/java/org/apache/hadoop/dfs/TransferFsImage.java
MODIFY /hadoop/core/branches/branch-0.17/src/java/org/apache/hadoop/dfs/SecondaryNameNode.java
MODIFY /hadoop/core/branches/branch-0.17/src/java/org/apache/hadoop/dfs/GetImageServlet.java
MODIFY /hadoop/core/branches/branch-0.17/CHANGES.txt
MODIFY /hadoop/core/branches/branch-0.17/src/test/org/apache/hadoop/dfs/TestCheckpoint.java

Repository Revision Date User Message
ASF #646556 Wed Apr 09 21:37:35 UTC 2008 shv HADOOP-3069. Primary name-node should not truncate image when transferring it from the secondary. Contributed by Konstantin Shvachko.
Files Changed
MODIFY /hadoop/core/branches/branch-0.16/src/java/org/apache/hadoop/dfs/TransferFsImage.java
MODIFY /hadoop/core/branches/branch-0.16/src/java/org/apache/hadoop/dfs/SecondaryNameNode.java
MODIFY /hadoop/core/branches/branch-0.16/src/java/org/apache/hadoop/dfs/GetImageServlet.java
MODIFY /hadoop/core/branches/branch-0.16/CHANGES.txt
MODIFY /hadoop/core/branches/branch-0.16/src/test/org/apache/hadoop/dfs/TestCheckpoint.java