Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1623 High Availability Framework for HDFS NN
  3. HDFS-2974

HA: MiniDFSCluster does not delete standby NN name dirs during format

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • HA branch (HDFS-1623)
    • HA branch (HDFS-1623)
    • ha, test
    • None

    Description

      When the MiniDFSCluster is formtting an HA cluster, it formats the first NN, and then copies the contents of that directory to the second NNs name dirs. However, the second NNs name dirs are not emptied first, and thus a previous test's state may interfere with the test.

      Attachments

        1. HDFS-2974-HDFS-1623.patch
          2 kB
          Aaron Myers

        Activity

          atm Aaron Myers added a comment -

          Here's a patch which addresses the issue.

          This was discovered because of the commit of HDFS-2952. I didn't notice this problem because when I ran the HA tests, it just so happened that TestDFSUpgradeWithHA was run last, and thus it did not interfere with any other test. To test this patch, I ran the following, which results in TestDFSUpgradeWithHA running before the other test:

          mvn -Dtest=TestDFSUpgradeWithHA,TestDNFencingWithReplication test
          

          Without this patch, TestDNFencingWithReplication should fail.

          atm Aaron Myers added a comment - Here's a patch which addresses the issue. This was discovered because of the commit of HDFS-2952 . I didn't notice this problem because when I ran the HA tests, it just so happened that TestDFSUpgradeWithHA was run last, and thus it did not interfere with any other test. To test this patch, I ran the following, which results in TestDFSUpgradeWithHA running before the other test: mvn -Dtest=TestDFSUpgradeWithHA,TestDNFencingWithReplication test Without this patch, TestDNFencingWithReplication should fail.
          atm Aaron Myers added a comment -

          Whoops! Looks like I uploaded the wrong patch last time. Here's the right one.

          I also discovered that this fix caused TestCheckpoint to fail, because of a bug in TestCheckpoint. In one of the TestCheckpoint cases, we restart a mini cluster to assert that having a locked storage directory will cause the NN to fail to start. However, the test case errantly had the NN format the name dir on start. The only reason this test was passing was because of the bug in MiniDFSCluster. The attached patch fixes this bug as well.

          atm Aaron Myers added a comment - Whoops! Looks like I uploaded the wrong patch last time. Here's the right one. I also discovered that this fix caused TestCheckpoint to fail, because of a bug in TestCheckpoint. In one of the TestCheckpoint cases, we restart a mini cluster to assert that having a locked storage directory will cause the NN to fail to start. However, the test case errantly had the NN format the name dir on start. The only reason this test was passing was because of the bug in MiniDFSCluster. The attached patch fixes this bug as well.
          atm Aaron Myers added a comment -

          Oh, also, I ran the full HDFS test suite. The only failures were TestCheckpoint (fixed in the latest patch) and TestBalancerWithHANameNodes, which fails when run on its own without this patch.

          atm Aaron Myers added a comment - Oh, also, I ran the full HDFS test suite. The only failures were TestCheckpoint (fixed in the latest patch) and TestBalancerWithHANameNodes, which fails when run on its own without this patch.
          tlipcon Todd Lipcon added a comment -

          +1

          tlipcon Todd Lipcon added a comment - +1
          atm Aaron Myers added a comment -

          Thanks a lot for the review, Todd. I've just committed this to the HA branch.

          atm Aaron Myers added a comment - Thanks a lot for the review, Todd. I've just committed this to the HA branch.
          hudson Hudson added a comment -

          Integrated in Hadoop-Hdfs-HAbranch-build #83 (See https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/83/)
          HDFS-2974. MiniDFSCluster does not delete standby NN name dirs during format. Contributed by Aaron T. Myers. (Revision 1291126)

          Result = UNSTABLE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291126
          Files :

          • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-1623.txt
          • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java
          hudson Hudson added a comment - Integrated in Hadoop-Hdfs-HAbranch-build #83 (See https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/83/ ) HDFS-2974 . MiniDFSCluster does not delete standby NN name dirs during format. Contributed by Aaron T. Myers. (Revision 1291126) Result = UNSTABLE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291126 Files : /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/CHANGES. HDFS-1623 .txt /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java

          People

            atm Aaron Myers
            atm Aaron Myers
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: