Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9655

NN should start JVM pause monitor before loading fsimage

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:

      Description

      We have seen many cases of NameNode startup either extremely slow or even hung. Most of them were caused by insufficient heap size with regard to the metadata size. Those cases were resolved by increasing the heap size.

      However it did take support team some time to root cause. JVM pause warning messages would greatly assist in such diagnosis, but NN starts JVM pause monitor after fsimage/edits loading.

      Propose to start JVM pause monitor before loading fsimage/edits.

        Activity

        Hide
        jzhuge John Zhuge added a comment -

        Patch 001:

        • Start JVM pause monitor before loading fsimage during NN startup.
        • Log an info message when JVM pause monitor starts.
        Show
        jzhuge John Zhuge added a comment - Patch 001: Start JVM pause monitor before loading fsimage during NN startup. Log an info message when JVM pause monitor starts.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        0 mvndep 0m 19s Maven dependency ordering for branch
        +1 mvninstall 7m 11s trunk passed
        +1 compile 6m 2s trunk passed with JDK v1.8.0_66
        +1 compile 6m 47s trunk passed with JDK v1.7.0_91
        +1 checkstyle 0m 57s trunk passed
        +1 mvnsite 1m 53s trunk passed
        +1 mvneclipse 0m 27s trunk passed
        +1 findbugs 3m 35s trunk passed
        +1 javadoc 2m 0s trunk passed with JDK v1.8.0_66
        +1 javadoc 2m 48s trunk passed with JDK v1.7.0_91
        0 mvndep 0m 17s Maven dependency ordering for patch
        +1 mvninstall 2m 20s the patch passed
        +1 compile 6m 10s the patch passed with JDK v1.8.0_66
        +1 javac 6m 10s the patch passed
        +1 compile 7m 24s the patch passed with JDK v1.7.0_91
        +1 javac 7m 24s the patch passed
        +1 checkstyle 1m 8s the patch passed
        +1 mvnsite 2m 8s the patch passed
        +1 mvneclipse 0m 29s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 4m 27s the patch passed
        +1 javadoc 2m 18s the patch passed with JDK v1.8.0_66
        +1 javadoc 3m 1s the patch passed with JDK v1.7.0_91
        +1 unit 7m 45s hadoop-common in the patch passed with JDK v1.8.0_66.
        -1 unit 67m 27s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
        -1 unit 8m 50s hadoop-common in the patch failed with JDK v1.7.0_91.
        -1 unit 69m 3s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
        +1 asflicense 0m 25s Patch does not generate ASF License warnings.
        216m 47s



        Reason Tests
        JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestLeaseRecoveryStriped
          hadoop.hdfs.server.namenode.TestRecoverStripedBlocks
        JDK v1.7.0_91 Failed junit tests hadoop.security.ssl.TestReloadingX509TrustManager
          hadoop.ipc.TestIPC
          hadoop.hdfs.shortcircuit.TestShortCircuitCache
          hadoop.hdfs.qjournal.client.TestQuorumJournalManager
          hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12782730/HDFS-9655.001.patch
        JIRA Issue HDFS-9655
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 5bd3f4127b2d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / da77f42
        Default Java 1.7.0_91
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/14141/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/14141/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_91.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/14141/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
        unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14141/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14141/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_91.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14141/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
        JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14141/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
        Max memory used 76MB
        Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14141/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. 0 mvndep 0m 19s Maven dependency ordering for branch +1 mvninstall 7m 11s trunk passed +1 compile 6m 2s trunk passed with JDK v1.8.0_66 +1 compile 6m 47s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 57s trunk passed +1 mvnsite 1m 53s trunk passed +1 mvneclipse 0m 27s trunk passed +1 findbugs 3m 35s trunk passed +1 javadoc 2m 0s trunk passed with JDK v1.8.0_66 +1 javadoc 2m 48s trunk passed with JDK v1.7.0_91 0 mvndep 0m 17s Maven dependency ordering for patch +1 mvninstall 2m 20s the patch passed +1 compile 6m 10s the patch passed with JDK v1.8.0_66 +1 javac 6m 10s the patch passed +1 compile 7m 24s the patch passed with JDK v1.7.0_91 +1 javac 7m 24s the patch passed +1 checkstyle 1m 8s the patch passed +1 mvnsite 2m 8s the patch passed +1 mvneclipse 0m 29s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 4m 27s the patch passed +1 javadoc 2m 18s the patch passed with JDK v1.8.0_66 +1 javadoc 3m 1s the patch passed with JDK v1.7.0_91 +1 unit 7m 45s hadoop-common in the patch passed with JDK v1.8.0_66. -1 unit 67m 27s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 8m 50s hadoop-common in the patch failed with JDK v1.7.0_91. -1 unit 69m 3s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 25s Patch does not generate ASF License warnings. 216m 47s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestLeaseRecoveryStriped   hadoop.hdfs.server.namenode.TestRecoverStripedBlocks JDK v1.7.0_91 Failed junit tests hadoop.security.ssl.TestReloadingX509TrustManager   hadoop.ipc.TestIPC   hadoop.hdfs.shortcircuit.TestShortCircuitCache   hadoop.hdfs.qjournal.client.TestQuorumJournalManager   hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12782730/HDFS-9655.001.patch JIRA Issue HDFS-9655 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 5bd3f4127b2d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / da77f42 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/14141/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14141/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_91.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14141/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14141/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14141/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_91.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14141/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14141/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14141/console This message was automatically generated.
        Hide
        jzhuge John Zhuge added a comment -

        Unit test failures seem unrelated.

        Show
        jzhuge John Zhuge added a comment - Unit test failures seem unrelated.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #9142 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9142/)
        HDFS-9655. NN should start JVM pause monitor before loading fsimage. (lei: rev 2ec438e8f7cd77cb48fd1264781e60a48e331908)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9142 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9142/ ) HDFS-9655 . NN should start JVM pause monitor before loading fsimage. (lei: rev 2ec438e8f7cd77cb48fd1264781e60a48e331908) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        eddyxu Lei (Eddy) Xu added a comment -

        +1. The test failures seems not relevant to me.

        Thanks a lot for working on this, John Zhuge!

        Committed to trunk, branch-2 and branch-2.8.

        Show
        eddyxu Lei (Eddy) Xu added a comment - +1. The test failures seems not relevant to me. Thanks a lot for working on this, John Zhuge ! Committed to trunk, branch-2 and branch-2.8.
        Hide
        jzhuge John Zhuge added a comment -

        Thanks Lei (Eddy) Xu.

        Show
        jzhuge John Zhuge added a comment - Thanks Lei (Eddy) Xu .

          People

          • Assignee:
            jzhuge John Zhuge
            Reporter:
            jzhuge John Zhuge
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development