Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11714

Newly added NN storage directory won't get initialized and cause space exhaustion

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.7.3
    • Fix Version/s: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.2
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When an empty namenode storage directory is detected on normal NN startup, it may not be fully initialized. The new directory is still part of "in-service" NNStrage and when a checkpoint image is uploaded, a copy will also be written there. However, the retention manager won't be able to purge old files since it is lacking a VERSION file. This causes fsimages to pile up in the directory. With a big name space, the disk will be filled in the order of days or weeks.

      1. HDFS-11714.trunk.patch
        7 kB
        Kihwal Lee
      2. HDFS-11714.v2.branch-2.patch
        6 kB
        Kihwal Lee
      3. HDFS-11714.v2.trunk.patch
        6 kB
        Kihwal Lee
      4. HDFS-11714.v3.branch-2.patch
        6 kB
        Kihwal Lee
      5. HDFS-11714.v3.trunk.patch
        6 kB
        Kihwal Lee

        Activity

        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        2.8.1 became a security release. Moving fix-version to 2.8.2 after the fact.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - 2.8.1 became a security release. Moving fix-version to 2.8.2 after the fact.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11663 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11663/)
        HDFS-11714. Newly added NN storage directory won't get initialized and (kihwal: rev 4cfc8664362ed04b01872e854715a36dad9408a6)

        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11663 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11663/ ) HDFS-11714 . Newly added NN storage directory won't get initialized and (kihwal: rev 4cfc8664362ed04b01872e854715a36dad9408a6) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java
        Hide
        kihwal Kihwal Lee added a comment -

        Thanks Zhe Zhang. I've committed this to trunk through branch-2.7.

        Show
        kihwal Kihwal Lee added a comment - Thanks Zhe Zhang . I've committed this to trunk through branch-2.7.
        Hide
        zhz Zhe Zhang added a comment -

        Kihwal Lee In most other JIRAs we don't wait for pre-commit Jenkins on lower branch patches. So I would say pls go ahead with the commits.

        Show
        zhz Zhe Zhang added a comment - Kihwal Lee In most other JIRAs we don't wait for pre-commit Jenkins on lower branch patches. So I would say pls go ahead with the commits.
        Hide
        kihwal Kihwal Lee added a comment - - edited

        the branch-2 patch is identical except for the variables in the test case. The trunk case use an array of namenodes, nns[], whereas branch-2 uses individual variable nn0 and nn1. The patch tested up to branch-2.7. This area of code hasn't changed at all since 2.7.

        Zhe Zhang do you think it needs to go through a separate precommit?

        Show
        kihwal Kihwal Lee added a comment - - edited the branch-2 patch is identical except for the variables in the test case. The trunk case use an array of namenodes, nns[], whereas branch-2 uses individual variable nn0 and nn1. The patch tested up to branch-2.7. This area of code hasn't changed at all since 2.7. Zhe Zhang do you think it needs to go through a separate precommit?
        Hide
        kihwal Kihwal Lee added a comment -

        You beat me. Thanks.

        Show
        kihwal Kihwal Lee added a comment - You beat me. Thanks.
        Hide
        kihwal Kihwal Lee added a comment -

        TestPipelinesFailover, TestBalancer are passing. The rest is HADOOP-14368.
        Zhe Zhang, please review the updated patches.

        Show
        kihwal Kihwal Lee added a comment - TestPipelinesFailover, TestBalancer are passing. The rest is HADOOP-14368 . Zhe Zhang , please review the updated patches.
        Hide
        zhz Zhe Zhang added a comment -

        Thanks Kihwal! +1 on v3 trunk patch.

        Show
        zhz Zhe Zhang added a comment - Thanks Kihwal! +1 on v3 trunk patch.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 12s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 13m 31s trunk passed
        +1 compile 0m 47s trunk passed
        +1 checkstyle 0m 36s trunk passed
        +1 mvnsite 0m 52s trunk passed
        +1 mvneclipse 0m 15s trunk passed
        -1 findbugs 1m 38s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
        +1 javadoc 0m 40s trunk passed
        +1 mvninstall 0m 48s the patch passed
        +1 compile 0m 45s the patch passed
        +1 javac 0m 45s the patch passed
        +1 checkstyle 0m 33s the patch passed
        +1 mvnsite 0m 50s the patch passed
        +1 mvneclipse 0m 11s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 43s the patch passed
        +1 javadoc 0m 38s the patch passed
        -1 unit 67m 53s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 19s The patch does not generate ASF License warnings.
        93m 34s



        Reason Tests
        Failed junit tests hadoop.hdfs.server.namenode.TestMetadataVersionOutput
          hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
          hadoop.hdfs.server.balancer.TestBalancer
          hadoop.hdfs.server.namenode.TestStartup



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ac17dc
        JIRA Issue HDFS-11714
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865812/HDFS-11714.v3.trunk.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 60593acdecd6 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 07b98e7
        Default Java 1.8.0_121
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19252/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19252/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19252/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19252/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 13m 31s trunk passed +1 compile 0m 47s trunk passed +1 checkstyle 0m 36s trunk passed +1 mvnsite 0m 52s trunk passed +1 mvneclipse 0m 15s trunk passed -1 findbugs 1m 38s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 0m 40s trunk passed +1 mvninstall 0m 48s the patch passed +1 compile 0m 45s the patch passed +1 javac 0m 45s the patch passed +1 checkstyle 0m 33s the patch passed +1 mvnsite 0m 50s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 43s the patch passed +1 javadoc 0m 38s the patch passed -1 unit 67m 53s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 93m 34s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestMetadataVersionOutput   hadoop.hdfs.server.namenode.ha.TestPipelinesFailover   hadoop.hdfs.server.balancer.TestBalancer   hadoop.hdfs.server.namenode.TestStartup Subsystem Report/Notes Docker Image:yetus/hadoop:0ac17dc JIRA Issue HDFS-11714 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865812/HDFS-11714.v3.trunk.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 60593acdecd6 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 07b98e7 Default Java 1.8.0_121 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19252/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/19252/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19252/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19252/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        kihwal Kihwal Lee added a comment -

        Filed HDFS-11729.

        Show
        kihwal Kihwal Lee added a comment - Filed HDFS-11729 .
        Hide
        kihwal Kihwal Lee added a comment -

        Attaching new patches.

        Show
        kihwal Kihwal Lee added a comment - Attaching new patches.
        Hide
        kihwal Kihwal Lee added a comment -

        I will file a jira for the retention manager improvement.

        Show
        kihwal Kihwal Lee added a comment - I will file a jira for the retention manager improvement.
        Hide
        zhz Zhe Zhang added a comment -

        Thanks for the clarification Kihwal Lee. +1 on the latest patch pending cosmetic comments #2~3. Comment #1 is up to you.

        The equivalent code for non-HA case (saveNamespace) also unconditionally overwrites existing VERSION. The reasoning is, regardless of previous state, now it has the up-to-date checkpoint, so it should have an accompanying VERSION file. So it is expected to overwrite if a VERSION already exists. I don't think we need to do anything here.

        Agreed.

        At minimum, it already logs a WARN. What do you think should be done?

        My question was more on the retention behavior: should the retention manager catch the exception, do some logging, and keep purging old fsimage files? In either case, I don't think it should block this JIRA – this JIRA already fixed the most immediate issue.

        Show
        zhz Zhe Zhang added a comment - Thanks for the clarification Kihwal Lee . +1 on the latest patch pending cosmetic comments #2~3. Comment #1 is up to you. The equivalent code for non-HA case (saveNamespace) also unconditionally overwrites existing VERSION. The reasoning is, regardless of previous state, now it has the up-to-date checkpoint, so it should have an accompanying VERSION file. So it is expected to overwrite if a VERSION already exists. I don't think we need to do anything here. Agreed. At minimum, it already logs a WARN. What do you think should be done? My question was more on the retention behavior: should the retention manager catch the exception, do some logging, and keep purging old fsimage files? In either case, I don't think it should block this JIRA – this JIRA already fixed the most immediate issue.
        Hide
        kihwal Kihwal Lee added a comment -

        What if a VERSION file already exists in the directory for some reason? Should we at least print a WARN for further investigation?

        The equivalent code for non-HA case (saveNamespace) also unconditionally overwrites existing VERSION. The reasoning is, regardless of previous state, now it has the up-to-date checkpoint, so it should have an accompanying VERSION file. So it is expected to overwrite if a VERSION already exists. I don't think we need to do anything here.

        On the retention manager, is it the right behavior to skip purging old image files if VERSION is missing? Should we do a follow-on fix to handle the case where the VERSION file is lost for some other reasons (mis-operaiton etc.)?

        At minimum, it already logs a WARN. What do you think should be done? Report a storage error by calling reportErrorsOnDirectory()? This will cause the storage dir to be in the "failed" list, which will be recovered later online. The recovery check should be made to check for existence of VERSION then.

        Show
        kihwal Kihwal Lee added a comment - What if a VERSION file already exists in the directory for some reason? Should we at least print a WARN for further investigation? The equivalent code for non-HA case (saveNamespace) also unconditionally overwrites existing VERSION. The reasoning is, regardless of previous state, now it has the up-to-date checkpoint, so it should have an accompanying VERSION file. So it is expected to overwrite if a VERSION already exists. I don't think we need to do anything here. On the retention manager, is it the right behavior to skip purging old image files if VERSION is missing? Should we do a follow-on fix to handle the case where the VERSION file is lost for some other reasons (mis-operaiton etc.)? At minimum, it already logs a WARN. What do you think should be done? Report a storage error by calling reportErrorsOnDirectory() ? This will cause the storage dir to be in the "failed" list, which will be recovered later online. The recovery check should be made to check for existence of VERSION then.
        Hide
        zhz Zhe Zhang added a comment -

        Thanks Kihwal Lee for the fix. The overall logic of adding VERSION at checkpoint time LGTM. A few minors:

        1. The new newDirs variable could be hard to understand without the full context of this JIRA. Can we add some Javadoc at declaration time?
        2. filesInCurrent in the test is unnecessary
        3. "If ha is enabled" => HA
        4. What if a VERSION file already exists in the directory for some reason? Should we at least print a WARN for further investigation?
          +      try {
          +        storage.writeProperties(sd);
          
        5. On the retention manager, is it the right behavior to skip purging old image files if VERSION is missing? Should we do a follow-on fix to handle the case where the VERSION file is lost for some other reasons (mis-operaiton etc.)?
        Show
        zhz Zhe Zhang added a comment - Thanks Kihwal Lee for the fix. The overall logic of adding VERSION at checkpoint time LGTM. A few minors: The new newDirs variable could be hard to understand without the full context of this JIRA. Can we add some Javadoc at declaration time? filesInCurrent in the test is unnecessary "If ha is enabled" => HA What if a VERSION file already exists in the directory for some reason? Should we at least print a WARN for further investigation? + try { + storage.writeProperties(sd); On the retention manager, is it the right behavior to skip purging old image files if VERSION is missing? Should we do a follow-on fix to handle the case where the VERSION file is lost for some other reasons (mis-operaiton etc.)?
        Hide
        daryn Daryn Sharp added a comment -

        +1

        Show
        daryn Daryn Sharp added a comment - +1
        Hide
        kihwal Kihwal Lee added a comment -

        These are passing.

        -------------------------------------------------------
         T E S T S
        -------------------------------------------------------
        Running org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
        Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 93.099 sec - in org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
        Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
        Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 84.628 sec - in org.apache.hadoop.hdfs.server.datanode.
          TestDataNodeVolumeFailureReporting
        

        The rest are broken by YARN-679. Filed HADOOP-14368.

        Show
        kihwal Kihwal Lee added a comment - These are passing. ------------------------------------------------------- T E S T S ------------------------------------------------------- Running org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 93.099 sec - in org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 84.628 sec - in org.apache.hadoop.hdfs.server.datanode. TestDataNodeVolumeFailureReporting The rest are broken by YARN-679 . Filed HADOOP-14368 .
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 15s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 15m 0s trunk passed
        +1 compile 0m 49s trunk passed
        +1 checkstyle 0m 39s trunk passed
        +1 mvnsite 1m 1s trunk passed
        +1 mvneclipse 0m 17s trunk passed
        -1 findbugs 1m 52s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
        +1 javadoc 0m 42s trunk passed
        +1 mvninstall 0m 56s the patch passed
        +1 compile 0m 56s the patch passed
        +1 javac 0m 56s the patch passed
        +1 checkstyle 0m 37s the patch passed
        +1 mvnsite 0m 57s the patch passed
        +1 mvneclipse 0m 13s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 57s the patch passed
        +1 javadoc 0m 41s the patch passed
        -1 unit 77m 58s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 24s The patch does not generate ASF License warnings.
        106m 42s



        Reason Tests
        Failed junit tests hadoop.hdfs.TestClientProtocolForPipelineRecovery
          hadoop.hdfs.server.namenode.TestMetadataVersionOutput
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
          hadoop.hdfs.server.namenode.TestStartup



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ac17dc
        JIRA Issue HDFS-11714
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865593/HDFS-11714.v2.trunk.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux eead047cc235 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 64f68cb
        Default Java 1.8.0_121
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19247/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19247/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19247/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 15m 0s trunk passed +1 compile 0m 49s trunk passed +1 checkstyle 0m 39s trunk passed +1 mvnsite 1m 1s trunk passed +1 mvneclipse 0m 17s trunk passed -1 findbugs 1m 52s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 0m 42s trunk passed +1 mvninstall 0m 56s the patch passed +1 compile 0m 56s the patch passed +1 javac 0m 56s the patch passed +1 checkstyle 0m 37s the patch passed +1 mvnsite 0m 57s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 57s the patch passed +1 javadoc 0m 41s the patch passed -1 unit 77m 58s hadoop-hdfs in the patch failed. +1 asflicense 0m 24s The patch does not generate ASF License warnings. 106m 42s Reason Tests Failed junit tests hadoop.hdfs.TestClientProtocolForPipelineRecovery   hadoop.hdfs.server.namenode.TestMetadataVersionOutput   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.server.namenode.TestStartup Subsystem Report/Notes Docker Image:yetus/hadoop:0ac17dc JIRA Issue HDFS-11714 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865593/HDFS-11714.v2.trunk.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux eead047cc235 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 64f68cb Default Java 1.8.0_121 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19247/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/19247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19247/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19247/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        kihwal Kihwal Lee added a comment -

        Attaching updated patches. Everything is confined in FSImage as you suggested. I think it is safe. The branch-2 patch only differs slightly in the new test.

        Show
        kihwal Kihwal Lee added a comment - Attaching updated patches. Everything is confined in FSImage as you suggested. I think it is safe. The branch-2 patch only differs slightly in the new test.
        Hide
        daryn Daryn Sharp added a comment -

        In FSImage, please correct the inventive word: "tirggered". It would be nice if initNewDirs was encapsulated in FSImage since it's not something the servlet should need to know about but I'm not familiar enough with the code to know what will break. Up to you.

        And we need the branch-2 patch.

        Show
        daryn Daryn Sharp added a comment - In FSImage, please correct the inventive word: "tirggered". It would be nice if initNewDirs was encapsulated in FSImage since it's not something the servlet should need to know about but I'm not familiar enough with the code to know what will break. Up to you. And we need the branch-2 patch.
        Hide
        kihwal Kihwal Lee added a comment -

        None of the findbugs warning was introduced by the patch.
        TestPipelinesFailover passes when run on my machine multiple times.
        TestNameNodeMetadataConsistency is not caused by this patch. See HDFS-11396.

        Show
        kihwal Kihwal Lee added a comment - None of the findbugs warning was introduced by the patch. TestPipelinesFailover passes when run on my machine multiple times. TestNameNodeMetadataConsistency is not caused by this patch. See HDFS-11396 .
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 25s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 14m 39s trunk passed
        +1 compile 0m 47s trunk passed
        +1 checkstyle 0m 37s trunk passed
        +1 mvnsite 0m 54s trunk passed
        +1 mvneclipse 0m 15s trunk passed
        -1 findbugs 1m 43s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
        +1 javadoc 0m 39s trunk passed
        +1 mvninstall 0m 47s the patch passed
        +1 compile 0m 45s the patch passed
        +1 javac 0m 45s the patch passed
        +1 checkstyle 0m 34s the patch passed
        +1 mvnsite 0m 50s the patch passed
        +1 mvneclipse 0m 11s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 44s the patch passed
        +1 javadoc 0m 37s the patch passed
        -1 unit 69m 14s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 21s The patch does not generate ASF License warnings.
        96m 22s



        Reason Tests
        Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
          hadoop.hdfs.server.namenode.ha.TestPipelinesFailover



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ac17dc
        JIRA Issue HDFS-11714
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865433/HDFS-11714.trunk.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux c7e969c175d9 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 9460721
        Default Java 1.8.0_121
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19223/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19223/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19223/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19223/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 25s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 14m 39s trunk passed +1 compile 0m 47s trunk passed +1 checkstyle 0m 37s trunk passed +1 mvnsite 0m 54s trunk passed +1 mvneclipse 0m 15s trunk passed -1 findbugs 1m 43s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 0m 39s trunk passed +1 mvninstall 0m 47s the patch passed +1 compile 0m 45s the patch passed +1 javac 0m 45s the patch passed +1 checkstyle 0m 34s the patch passed +1 mvnsite 0m 50s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 44s the patch passed +1 javadoc 0m 37s the patch passed -1 unit 69m 14s hadoop-hdfs in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 96m 22s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency   hadoop.hdfs.server.namenode.ha.TestPipelinesFailover Subsystem Report/Notes Docker Image:yetus/hadoop:0ac17dc JIRA Issue HDFS-11714 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865433/HDFS-11714.trunk.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c7e969c175d9 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9460721 Default Java 1.8.0_121 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19223/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/19223/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19223/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19223/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        kihwal Kihwal Lee added a comment -

        The version file can be created in an empty storage diectory in other ways. Doing a full upgrade or finalizing a rolling upgrade will cause unconditional write of VERSION to all storage directories.

        Attaching a patch. It saves any new directory to a set and when a checkpoint is written, a VERSION file is also written. This is roughly equivalent to the non-HA mechanism of doing saveNamespace() causing creation of a VERSION file.

        Show
        kihwal Kihwal Lee added a comment - The version file can be created in an empty storage diectory in other ways. Doing a full upgrade or finalizing a rolling upgrade will cause unconditional write of VERSION to all storage directories. Attaching a patch. It saves any new directory to a set and when a checkpoint is written, a VERSION file is also written. This is roughly equivalent to the non-HA mechanism of doing saveNamespace() causing creation of a VERSION file.
        Hide
        kihwal Kihwal Lee added a comment -

        The original design pre-HA was to just create directory structure under the new directory. Then the inspector reports some directories are new. This causes the namesystem to call saveNamespace(), which unconditionally writes a VERSION in all storage directories. This still happens for non-HA mode.

        For HA, the fisrt part still happens, but does not do saveNamespace() automatically.

        [main] INFO namenode.FSImage: Storage directory /xxx/hadoop/var/hdfs/namedir1 is not formatted.
        [main] INFO namenode.FSImage: Formatting ...
        ...
        WARN namenode.FSImage: Storage directory Storage Directory/xxx/hadoop/var/hdfs/namedir1 contains no VERSION file. Skipping...
        

        The last line is when a fsimage is searched and being loaded.

        When a checkpoint is uploaded, the retention manager fails to delete old files in the directory.

        INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_00000001234567890123 size 20000000000000 bytes.
        INFO namenode.FSImageTransactionalStorageInspector: No version file in /xxx/hadoop/var/hdfs/namedir1
        INFO namenode.NNStorageRetentionManager: Going to retain 2 images with txid >= 1234567890122
        INFO namenode.NNStorageRetentionManager: Purging old image
        
        Show
        kihwal Kihwal Lee added a comment - The original design pre-HA was to just create directory structure under the new directory. Then the inspector reports some directories are new. This causes the namesystem to call saveNamespace() , which unconditionally writes a VERSION in all storage directories. This still happens for non-HA mode. For HA, the fisrt part still happens, but does not do saveNamespace() automatically. [main] INFO namenode.FSImage: Storage directory /xxx/hadoop/var/hdfs/namedir1 is not formatted. [main] INFO namenode.FSImage: Formatting ... ... WARN namenode.FSImage: Storage directory Storage Directory/xxx/hadoop/var/hdfs/namedir1 contains no VERSION file. Skipping... The last line is when a fsimage is searched and being loaded. When a checkpoint is uploaded, the retention manager fails to delete old files in the directory. INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_00000001234567890123 size 20000000000000 bytes. INFO namenode.FSImageTransactionalStorageInspector: No version file in /xxx/hadoop/var/hdfs/namedir1 INFO namenode.NNStorageRetentionManager: Going to retain 2 images with txid >= 1234567890122 INFO namenode.NNStorageRetentionManager: Purging old image

          People

          • Assignee:
            kihwal Kihwal Lee
            Reporter:
            kihwal Kihwal Lee
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development