Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4462

2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.2-alpha
    • Fix Version/s: 2.0.3-alpha, 0.23.7
    • Component/s: namenode
    • Labels:
      None

      Description

      The 2NN currently has logic to detect when its on-disk FS metadata needs an upgrade with respect to the NN's metadata (i.e. the layout versions are different) and in this case it will proceed with the checkpoint despite storage signatures not matching precisely if the BP ID and Cluster ID do match exactly. However, in situations where we're upgrading from versions of HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints will always fail with an error like the following:

      13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent checkpoint fields.
      LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = BP-1520616013-172.21.3.106-1359680537136.
      Expecting respectively: -19; 403832480; 0; ; .
      
      1. HDFS-4462.patch
        6 kB
        Aaron T. Myers
      2. HDFS-4462.patch
        8 kB
        Aaron T. Myers
      3. HDFS-4462.patch
        11 kB
        Aaron T. Myers
      4. HDFS-4462.patch
        9 kB
        Aaron T. Myers
      5. HDFS-4462.patch
        9 kB
        Aaron T. Myers

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #517 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/517/)
        HDFS-4462. 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS. (atm) (Revision 1442620)

        Result = SUCCESS
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1442620
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java
        • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/StorageInfo.java
        • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CheckpointSignature.java
        • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
        • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
        • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java
        • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSecondaryNameNodeUpgrade.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #517 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/517/ ) HDFS-4462 . 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS. (atm) (Revision 1442620) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1442620 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/StorageInfo.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CheckpointSignature.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSecondaryNameNodeUpgrade.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1335 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1335/)
        HDFS-4462. 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS. Contributed by Aaron T. Myers. (Revision 1442375)

        Result = SUCCESS
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1442375
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/StorageInfo.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CheckpointSignature.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSecondaryNameNodeUpgrade.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1335 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1335/ ) HDFS-4462 . 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS. Contributed by Aaron T. Myers. (Revision 1442375) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1442375 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/StorageInfo.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CheckpointSignature.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSecondaryNameNodeUpgrade.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1307 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1307/)
        HDFS-4462. 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS. Contributed by Aaron T. Myers. (Revision 1442375)

        Result = FAILURE
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1442375
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/StorageInfo.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CheckpointSignature.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSecondaryNameNodeUpgrade.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1307 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1307/ ) HDFS-4462 . 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS. Contributed by Aaron T. Myers. (Revision 1442375) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1442375 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/StorageInfo.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CheckpointSignature.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSecondaryNameNodeUpgrade.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #118 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/118/)
        HDFS-4462. 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS. Contributed by Aaron T. Myers. (Revision 1442375)

        Result = FAILURE
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1442375
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/StorageInfo.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CheckpointSignature.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSecondaryNameNodeUpgrade.java
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #118 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/118/ ) HDFS-4462 . 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS. Contributed by Aaron T. Myers. (Revision 1442375) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1442375 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/StorageInfo.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CheckpointSignature.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSecondaryNameNodeUpgrade.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3318 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3318/)
        HDFS-4462. 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS. Contributed by Aaron T. Myers. (Revision 1442375)

        Result = SUCCESS
        atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1442375
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/StorageInfo.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CheckpointSignature.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSecondaryNameNodeUpgrade.java
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #3318 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3318/ ) HDFS-4462 . 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS. Contributed by Aaron T. Myers. (Revision 1442375) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1442375 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/StorageInfo.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CheckpointSignature.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSecondaryNameNodeUpgrade.java
        Hide
        Aaron T. Myers added a comment -

        I've just committed this to trunk and branch-2. Thanks a lot for the reviews/testing, Chris and Todd.

        Show
        Aaron T. Myers added a comment - I've just committed this to trunk and branch-2. Thanks a lot for the reviews/testing, Chris and Todd.
        Hide
        Aaron T. Myers added a comment -

        Since test-patch came back clean after implementing Todd's suggestion, I'm going to commit this momentarily based on Todd's +1.

        Show
        Aaron T. Myers added a comment - Since test-patch came back clean after implementing Todd's suggestion, I'm going to commit this momentarily based on Todd's +1.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12567875/HDFS-4462.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3943//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3943//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567875/HDFS-4462.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3943//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3943//console This message is automatically generated.
        Hide
        Aaron T. Myers added a comment -

        Thanks a lot for the review and the suggestion, Todd. I agree that your proposed change is correct and makes things a bit clearer. Attached patch implements that.

        Show
        Aaron T. Myers added a comment - Thanks a lot for the review and the suggestion, Todd. I agree that your proposed change is correct and makes things a bit clearer. Attached patch implements that.
        Hide
        Todd Lipcon added a comment -

        The patch looks correct, but my brain was too small to comprehend the new 'if' statement. Does the following diff vs your patch look right to you?

        -    if ((checkpointImage.getNamespaceID() == 0) ||
        -        (((dstStorage.versionSupportsFederation() &&
        -            sig.isSameCluster(checkpointImage)) ||
        -          (!dstStorage.versionSupportsFederation() &&
        -            sig.namespaceIdMatches(checkpointImage))) &&
        -         !sig.storageVersionMatches(checkpointImage.getStorage()))) {
        +    boolean isFreshCheckpointer = (checkpointImage.getNamespaceID() == 0);
        +    boolean sameCluster =
        +      (dstStorage.versionSupportsFederation() && sig.isSameCluster(checkpointImage)) ||
        +      (!dstStorage.versionSupportsFederation() && sig.namespaceIdMatches(checkpointImage));
        +
        +    if (isFreshCheckpointer ||
        +        (sameCluster && !sig.storageVersionMatches(checkpointImage.getStorage()))) {
        

        I think it makes the code a lot easier to understand. If my change looks right, +1 with it incorporated into your patch.

        Show
        Todd Lipcon added a comment - The patch looks correct, but my brain was too small to comprehend the new 'if' statement. Does the following diff vs your patch look right to you? - if ((checkpointImage.getNamespaceID() == 0) || - (((dstStorage.versionSupportsFederation() && - sig.isSameCluster(checkpointImage)) || - (!dstStorage.versionSupportsFederation() && - sig.namespaceIdMatches(checkpointImage))) && - !sig.storageVersionMatches(checkpointImage.getStorage()))) { + boolean isFreshCheckpointer = (checkpointImage.getNamespaceID() == 0); + boolean sameCluster = + (dstStorage.versionSupportsFederation() && sig.isSameCluster(checkpointImage)) || + (!dstStorage.versionSupportsFederation() && sig.namespaceIdMatches(checkpointImage)); + + if (isFreshCheckpointer || + (sameCluster && !sig.storageVersionMatches(checkpointImage.getStorage()))) { I think it makes the code a lot easier to understand. If my change looks right, +1 with it incorporated into your patch.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12567678/HDFS-4462.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3939//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3939//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567678/HDFS-4462.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3939//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3939//console This message is automatically generated.
        Hide
        Chris Nauroth added a comment -

        +1 for the new patch

        I confirmed that it fixed the test failure.

        Show
        Chris Nauroth added a comment - +1 for the new patch I confirmed that it fixed the test failure.
        Hide
        Aaron T. Myers added a comment -

        Missed a test failure from the last patch. This patch should fix the test failure,

        Show
        Aaron T. Myers added a comment - Missed a test failure from the last patch. This patch should fix the test failure,
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12567641/HDFS-4462.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.server.namenode.TestStartupOptionUpgrade

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3937//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3937//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567641/HDFS-4462.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestStartupOptionUpgrade +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3937//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3937//console This message is automatically generated.
        Hide
        Chris Nauroth added a comment -

        +1 for the new patch

        Tests pass with the new patch too.

        Thank you for addressing the extremely paranoid feedback.

        Show
        Chris Nauroth added a comment - +1 for the new patch Tests pass with the new patch too. Thank you for addressing the extremely paranoid feedback.
        Hide
        Aaron T. Myers added a comment -

        Thanks a lot for the review, Chris, and for running those additional tests.

        Your suggestion does seem pretty paranoid (odds are 1 over 2^31), but better to be overly conservative in cases such as this.

        Please take a look at the updated patch. This patch expressly checks to see if the local metadata's layout version supports federation or not, and only compares the namespace IDs if it doesn't support federation. If federation is supported, all three fields are compared.

        Show
        Aaron T. Myers added a comment - Thanks a lot for the review, Chris, and for running those additional tests. Your suggestion does seem pretty paranoid (odds are 1 over 2^31), but better to be overly conservative in cases such as this. Please take a look at the updated patch. This patch expressly checks to see if the local metadata's layout version supports federation or not, and only compares the namespace IDs if it doesn't support federation. If federation is supported, all three fields are compared.
        Hide
        Chris Nauroth added a comment -

        Hi, Aaron. The code looks good. I applied the patch to branch-2 and ran multiple test suites related to checkpoints and 2NN.

        -  boolean isSameCluster(FSImage si) {
        -    return namespaceID == si.getStorage().namespaceID &&
        -      clusterID.equals(si.getClusterID()) &&
        -      blockpoolID.equals(si.getBlockPoolID());
        +  boolean namespaceIdMatches(FSImage si) {
        +    return namespaceID == si.getStorage().namespaceID;
           }
        

        Considering that namespace ID is an integer, whereas cluster ID is based on a GUID, it seems there is higher likelihood of accidental collision. Then, CheckpointSignature#validateStorageInfo could misidentify a match. It's still highly unlikely (but non-zero).

        I'm wondering if a safer change would be (pseudo-code):

        if namespace ID + cluster ID + blockpool ID are defined on both
          compare all 3 fields
        else if only namespace ID is defined on one of them
          compare only namespace ID
        

        This would keep the logic the same for upgrades between 2 post-federation versions, and just change the logic for the case of pre-fed -> post-fed.

        Or am I being too paranoid?

        Show
        Chris Nauroth added a comment - Hi, Aaron. The code looks good. I applied the patch to branch-2 and ran multiple test suites related to checkpoints and 2NN. - boolean isSameCluster(FSImage si) { - return namespaceID == si.getStorage().namespaceID && - clusterID.equals(si.getClusterID()) && - blockpoolID.equals(si.getBlockPoolID()); + boolean namespaceIdMatches(FSImage si) { + return namespaceID == si.getStorage().namespaceID; } Considering that namespace ID is an integer, whereas cluster ID is based on a GUID, it seems there is higher likelihood of accidental collision. Then, CheckpointSignature#validateStorageInfo could misidentify a match. It's still highly unlikely (but non-zero). I'm wondering if a safer change would be (pseudo-code): if namespace ID + cluster ID + blockpool ID are defined on both compare all 3 fields else if only namespace ID is defined on one of them compare only namespace ID This would keep the logic the same for upgrades between 2 post-federation versions, and just change the logic for the case of pre-fed -> post-fed. Or am I being too paranoid?
        Hide
        Aaron T. Myers added a comment -

        Arun C Murthy Blocker? Probably not. Pretty good to have? I think so. There's a pretty simple work-around: when upgrading from a pre-federation version of HDFS, blow away your 2NN checkpoint dirs before starting up your 2NN again. A problem will arise if an admin doesn't notice that all of their 2NN checkpoints are failing post-upgrade.

        Regardless, it's a pretty simple change - I'm hoping it can get committed today.

        Show
        Aaron T. Myers added a comment - Arun C Murthy Blocker? Probably not. Pretty good to have? I think so. There's a pretty simple work-around: when upgrading from a pre-federation version of HDFS, blow away your 2NN checkpoint dirs before starting up your 2NN again. A problem will arise if an admin doesn't notice that all of their 2NN checkpoints are failing post-upgrade. Regardless, it's a pretty simple change - I'm hoping it can get committed today.
        Hide
        Arun C Murthy added a comment -

        Aaron T. Myers Is this a 2.0.3 blocker? Tx.

        Show
        Arun C Murthy added a comment - Aaron T. Myers Is this a 2.0.3 blocker? Tx.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12567534/HDFS-4462.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3934//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3934//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567534/HDFS-4462.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3934//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3934//console This message is automatically generated.
        Hide
        Aaron T. Myers added a comment -

        That's what I get for making one last change to the patch without running the tests again.

        Attached patch should fix the test failures.

        Show
        Aaron T. Myers added a comment - That's what I get for making one last change to the patch without running the tests again. Attached patch should fix the test failures.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12567483/HDFS-4462.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.server.namenode.TestSecondaryWebUi
        org.apache.hadoop.hdfs.server.namenode.TestStartup
        org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs
        org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
        org.apache.hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3930//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3930//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567483/HDFS-4462.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestSecondaryWebUi org.apache.hadoop.hdfs.server.namenode.TestStartup org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs org.apache.hadoop.hdfs.server.namenode.TestCheckpoint org.apache.hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3930//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3930//console This message is automatically generated.
        Hide
        Aaron T. Myers added a comment -

        Here's a patch which addresses the issue by only comparing the namespace IDs to determine if this is the same NN/2NN pair, instead of the NS IDs and BP IDs/Cluster IDs. NS IDs predate federation so this allows upgrades from pre-federation versions of HDFS. This patch also makes sure that we definitely reload the downloaded fsimage from disk in the case that we've identified that the 2NN's metadata does not match the layout version of the NN's.

        Show
        Aaron T. Myers added a comment - Here's a patch which addresses the issue by only comparing the namespace IDs to determine if this is the same NN/2NN pair, instead of the NS IDs and BP IDs/Cluster IDs. NS IDs predate federation so this allows upgrades from pre-federation versions of HDFS. This patch also makes sure that we definitely reload the downloaded fsimage from disk in the case that we've identified that the 2NN's metadata does not match the layout version of the NN's.

          People

          • Assignee:
            Aaron T. Myers
            Reporter:
            Aaron T. Myers
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development