Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.18.0
    • Fix Version/s: 0.18.3
    • Component/s: test
    • Labels:
      None
    • Release Note:
      TestDistributedUpgrade used succeed for wrong reasons.

      Description

      A TestDistributedUpgrade subtest checks that the Name Node does not start when a distributed upgrade is required. In 0.18, the subtest fails when the Name Node does start. The fault is with the test, not HDFS. Not a problem in 0.19.

      1. HADOOP-4542.patch
        0.7 kB
        Raghu Angadi
      2. HADOOP-4542.patch
        0.9 kB
        Raghu Angadi

        Activity

        Hide
        Robert Chansler added a comment -

        Konstantin writes:
        This is the test problem. Directory names are messed up.
        The name-node code is fine.
        The test itself is fixed in 0.19 and works correctly.
        I would not worry about this failure in 0.18

        Show
        Robert Chansler added a comment - Konstantin writes: This is the test problem. Directory names are messed up. The name-node code is fine. The test itself is fixed in 0.19 and works correctly. I would not worry about this failure in 0.18
        Hide
        Raghu Angadi added a comment -

        I am not able to reproduce this in 0.18. Which subtest fails? Any info on the jira that fixed this for 0.19 would be useful.

        Show
        Raghu Angadi added a comment - I am not able to reproduce this in 0.18. Which subtest fails? Any info on the jira that fixed this for 0.19 would be useful.
        Hide
        Raghu Angadi added a comment -

        Attached simple patch makes testDistributedUpgrade() a no-op. This easy fix is preferred since :

        • This is going only to 0.18
        • There is no requirement for DistributedUpgrade in 0.18
        • Already fixed in 0.19 and trunk (not exactly sure by which patch).
        • Even if I fix it (from a patch from 0.19), it is hard for me to reproduce.
        Show
        Raghu Angadi added a comment - Attached simple patch makes testDistributedUpgrade() a no-op. This easy fix is preferred since : This is going only to 0.18 There is no requirement for DistributedUpgrade in 0.18 Already fixed in 0.19 and trunk (not exactly sure by which patch). Even if I fix it (from a patch from 0.19), it is hard for me to reproduce.
        Hide
        Raghu Angadi added a comment -

        'ant test-patch' on 0.18 :

             [exec] +1 overall.
        
             [exec]     +1 @author.  The patch does not contain any @author tags.
        
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
        
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
        
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
        
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
        
        Show
        Raghu Angadi added a comment - 'ant test-patch' on 0.18 : [exec] +1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        Hide
        Konstantin Shvachko added a comment -

        Here is the exception thrown by 0.18 for TestDistributedUpgrade:

        2008-11-25 22:50:18,060 ERROR fs.FSNamesystem (FSNamesystem.java:<init>(275)) - FSNamesystem initialization failed.
        org.apache.hadoop.dfs.InconsistentFSStateException: Directory /home/shv/branch-0.18/build/test/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
        	at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:211)
        	at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
        	at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294)
        	at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:273)
        	at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)
        	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
        	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
        	at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
        	at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:264)
        	at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:93)
        	at org.apache.hadoop.dfs.TestDistributedUpgrade.startNameNodeShouldFail(TestDistributedUpgrade.java:54)
        	at org.apache.hadoop.dfs.TestDistributedUpgrade.testDistributedUpgrade(TestDistributedUpgrade.java:97)
        

        And here is the correct exception that should be thrown in this case and is thrown in 0.20

        2008-11-25 22:53:37,165 ERROR namenode.FSNamesystem (FSNamesystem.java:<init>(282)) - FSNamesystem initialization failed.
        java.io.IOException: 
        File system image contains an old layout version -7.
        An upgrade to version -18 is required.
        Please restart NameNode with -upgrade option.
        	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:312)
        	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
        	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:299)
        	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:280)
        	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:169)
        	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:247)
        	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:907)
        	at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:275)
        	at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:168)
        	at org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade.startNameNodeShouldFail(TestDistributedUpgrade.java:63)
        	at org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade.testDistributedUpgrade(TestDistributedUpgrade.java:110)
        

        The problem is that in 0.18 MiniDFSCluster is configured with the storage being in /build/test/dfs/name, while TestDFSUpgradeFromImage unpacks it into /build/test/data/dfs/name1. And this was fixed by HADOOP-3965 or HADOOP-3948. This should be controled by manageDfsDirs parameter.

        Show
        Konstantin Shvachko added a comment - Here is the exception thrown by 0.18 for TestDistributedUpgrade: 2008-11-25 22:50:18,060 ERROR fs.FSNamesystem (FSNamesystem.java:<init>(275)) - FSNamesystem initialization failed. org.apache.hadoop.dfs.InconsistentFSStateException: Directory /home/shv/branch-0.18/build/test/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible. at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:211) at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80) at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:294) at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:273) at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148) at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193) at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179) at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830) at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:264) at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:93) at org.apache.hadoop.dfs.TestDistributedUpgrade.startNameNodeShouldFail(TestDistributedUpgrade.java:54) at org.apache.hadoop.dfs.TestDistributedUpgrade.testDistributedUpgrade(TestDistributedUpgrade.java:97) And here is the correct exception that should be thrown in this case and is thrown in 0.20 2008-11-25 22:53:37,165 ERROR namenode.FSNamesystem (FSNamesystem.java:<init>(282)) - FSNamesystem initialization failed. java.io.IOException: File system image contains an old layout version -7. An upgrade to version -18 is required. Please restart NameNode with -upgrade option. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:312) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:299) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:280) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:169) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:247) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:907) at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:275) at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:168) at org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade.startNameNodeShouldFail(TestDistributedUpgrade.java:63) at org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade.testDistributedUpgrade(TestDistributedUpgrade.java:110) The problem is that in 0.18 MiniDFSCluster is configured with the storage being in /build/test/dfs/name , while TestDFSUpgradeFromImage unpacks it into /build/test/data/dfs/name1 . And this was fixed by HADOOP-3965 or HADOOP-3948 . This should be controled by manageDfsDirs parameter.
        Hide
        Raghu Angadi added a comment -

        Thanks Konstantin.

        So on 0.18 test passes but always for wrong reason. The attached patch fixes that. This fix was part of HADOOP-2885.

        Show
        Raghu Angadi added a comment - Thanks Konstantin. So on 0.18 test passes but always for wrong reason. The attached patch fixes that. This fix was part of HADOOP-2885 .
        Hide
        Konstantin Shvachko added a comment -

        Yes, and on Hudson the name-node does not fail (although it should) because some previous test does not cleanup directory build/test/dfs/name, which still contains a legal image.

        Show
        Konstantin Shvachko added a comment - Yes, and on Hudson the name-node does not fail (although it should) because some previous test does not cleanup directory build/test/dfs/name , which still contains a legal image.
        Hide
        Raghu Angadi added a comment -

        right. Thanks for looking into the root cause of this.

        The patch could be smaller but I kept it same as 0.19 for consistency.

        Show
        Raghu Angadi added a comment - right. Thanks for looking into the root cause of this. The patch could be smaller but I kept it same as 0.19 for consistency.
        Hide
        Konstantin Shvachko added a comment -

        +1.
        This will re-point the name-node to a correct storage directory, will cause its failure with the correct exception, and let Hudson build succeed.

        Show
        Konstantin Shvachko added a comment - +1. This will re-point the name-node to a correct storage directory, will cause its failure with the correct exception, and let Hudson build succeed.
        Hide
        Raghu Angadi added a comment -

        Thanks Konstantin. I will commit this to 0.18. 'ant test-patch' :

             [exec] +1 overall.
        
             [exec]     +1 @author.  The patch does not contain any @author tags.
        
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
        
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
        
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
        
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
        
        Show
        Raghu Angadi added a comment - Thanks Konstantin. I will commit this to 0.18. 'ant test-patch' : [exec] +1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        Hide
        Raghu Angadi added a comment -

        I just committed this.

        Show
        Raghu Angadi added a comment - I just committed this.

          People

          • Assignee:
            Raghu Angadi
            Reporter:
            Robert Chansler
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development