Issue Details (XML | Word | Printable)

Key: HADOOP-4542
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Raghu Angadi
Reporter: Robert Chansler
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Fault in TestDistributedUpgrade

Created: 29/Oct/08 11:54 PM   Updated: 08/Jul/09 04:43 PM
Return to search
Component/s: test
Affects Version/s: 0.18.0
Fix Version/s: 0.18.3

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-4542.patch 2008-11-25 11:39 PM Raghu Angadi 0.9 kB
Text File Licensed for inclusion in ASF works HADOOP-4542.patch 2008-11-25 12:10 AM Raghu Angadi 0.7 kB

Release Note: TestDistributedUpgrade used succeed for wrong reasons.
Resolution Date: 26/Nov/08 07:37 PM


 Description  « Hide
A TestDistributedUpgrade subtest checks that the Name Node does not start when a distributed upgrade is required. In 0.18, the subtest fails when the Name Node does start. The fault is with the test, not HDFS. Not a problem in 0.19.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Robert Chansler added a comment - 29/Oct/08 11:54 PM
Konstantin writes:
This is the test problem. Directory names are messed up.
The name-node code is fine.
The test itself is fixed in 0.19 and works correctly.
I would not worry about this failure in 0.18

Raghu Angadi added a comment - 18/Nov/08 06:17 PM
I am not able to reproduce this in 0.18. Which subtest fails? Any info on the jira that fixed this for 0.19 would be useful.

Raghu Angadi added a comment - 25/Nov/08 12:10 AM

Attached simple patch makes testDistributedUpgrade() a no-op. This easy fix is preferred since :

  • This is going only to 0.18
  • There is no requirement for DistributedUpgrade in 0.18
  • Already fixed in 0.19 and trunk (not exactly sure by which patch).
  • Even if I fix it (from a patch from 0.19), it is hard for me to reproduce.

Raghu Angadi added a comment - 25/Nov/08 06:55 PM
'ant test-patch' on 0.18 :
     [exec] +1 overall.

     [exec]     +1 @author.  The patch does not contain any @author tags.

     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.

     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.

     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.

Konstantin Shvachko added a comment - 25/Nov/08 11:27 PM
Here is the exception thrown by 0.18 for TestDistributedUpgrade:
2008-11-25 22:50:18,060 ERROR fs.FSNamesystem (FSNamesystem.java:<init>(275)) - FSNamesystem initialization failed.
org.apache.hadoop.dfs.InconsistentFSStateException: Directory /home/shv/branch-0.18/build/test/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
	at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:211)
	at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
	at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
	at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:273)
	at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)
	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
	at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
	at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:264)
	at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:93)
	at org.apache.hadoop.dfs.TestDistributedUpgrade.startNameNodeShouldFail(TestDistributedUpgrade.java:54)
	at org.apache.hadoop.dfs.TestDistributedUpgrade.testDistributedUpgrade(TestDistributedUpgrade.java:97)

And here is the correct exception that should be thrown in this case and is thrown in 0.20

2008-11-25 22:53:37,165 ERROR namenode.FSNamesystem (FSNamesystem.java:<init>(282)) - FSNamesystem initialization failed.
java.io.IOException: 
File system image contains an old layout version -7.
An upgrade to version -18 is required.
Please restart NameNode with -upgrade option.
	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:312)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:299)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:280)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:169)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:247)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:907)
	at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:275)
	at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:168)
	at org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade.startNameNodeShouldFail(TestDistributedUpgrade.java:63)
	at org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade.testDistributedUpgrade(TestDistributedUpgrade.java:110)

The problem is that in 0.18 MiniDFSCluster is configured with the storage being in /build/test/dfs/name, while TestDFSUpgradeFromImage unpacks it into /build/test/data/dfs/name1. And this was fixed by HADOOP-3965 or HADOOP-3948. This should be controled by manageDfsDirs parameter.


Raghu Angadi added a comment - 25/Nov/08 11:39 PM
Thanks Konstantin.

So on 0.18 test passes but always for wrong reason. The attached patch fixes that. This fix was part of HADOOP-2885.


Konstantin Shvachko added a comment - 25/Nov/08 11:45 PM
Yes, and on Hudson the name-node does not fail (although it should) because some previous test does not cleanup directory build/test/dfs/name, which still contains a legal image.

Raghu Angadi added a comment - 25/Nov/08 11:48 PM
right. Thanks for looking into the root cause of this.

The patch could be smaller but I kept it same as 0.19 for consistency.


Konstantin Shvachko added a comment - 25/Nov/08 11:57 PM
+1.
This will re-point the name-node to a correct storage directory, will cause its failure with the correct exception, and let Hudson build succeed.

Raghu Angadi added a comment - 26/Nov/08 06:54 PM
Thanks Konstantin. I will commit this to 0.18. 'ant test-patch' :
     [exec] +1 overall.

     [exec]     +1 @author.  The patch does not contain any @author tags.

     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.

     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.

     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.

Raghu Angadi added a comment - 26/Nov/08 07:37 PM
I just committed this.