Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10941

Improve BlockManager#processMisReplicatesAsync log

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.4, 3.0.0-alpha2
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      BlockManager#processMisReplicatesAsync is the daemon thread running inside namenode to handle miserplicated blocks. As shown below, it has a trace log for each of the block in the cluster being processed (10000 blocks per iteration after sleep 10s).

        MisReplicationResult res = processMisReplicatedBlock(block);
        if (LOG.isTraceEnabled()) {
          LOG.trace("block " + block + ": " + res);
        }
      

      However, it is not very useful as dumping every block in the cluster will overwhelm the namenode log without much useful information assuming the majority of the blocks are not over/under replicated. This ticket is opened to improve the log for easy troubleshooting of block replication related issues by:

      1) add debug log for blocks that get under/over replicated result during processMisReplicatedBlock()

      2) or change to trace log for only blocks that get non-OK result during processMisReplicatedBlock()

      1. HDFS-10941.001.patch
        2 kB
        Chen Liang
      2. HDFS-10941.002.patch
        2 kB
        Chen Liang
      3. HDFS-10941.002.patch
        2 kB
        Chen Liang
      4. HDFS-10941.003.patch
        2 kB
        Chen Liang

        Activity

        Hide
        brahmareddy Brahma Reddy Battula added a comment -

        Zhe Zhang looks chnages.txt not update for this jira in branch-2.7..can you please update .?

        Show
        brahmareddy Brahma Reddy Battula added a comment - Zhe Zhang looks chnages.txt not update for this jira in branch-2.7..can you please update .?
        Hide
        zhz Zhe Zhang added a comment -

        Thanks for the work Chen. I just backported to branch-2.7.

        Show
        zhz Zhe Zhang added a comment - Thanks for the work Chen. I just backported to branch-2.7.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10825 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10825/)
        HDFS-10941. Improve BlockManager#processMisReplicatesAsync log. (xyao: rev 4484b48498b2ab2a40a404c487c7a4e875df10dc)

        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10825 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10825/ ) HDFS-10941 . Improve BlockManager#processMisReplicatesAsync log. (xyao: rev 4484b48498b2ab2a40a404c487c7a4e875df10dc) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
        Hide
        xyao Xiaoyu Yao added a comment -

        Thanks Chen Liang for the contribution and Xiaobing Zhou for the reviews. I've commit the fix to trunk, branch-2.8 and branch-2.

        Show
        xyao Xiaoyu Yao added a comment - Thanks Chen Liang for the contribution and Xiaobing Zhou for the reviews. I've commit the fix to trunk, branch-2.8 and branch-2.
        Hide
        xyao Xiaoyu Yao added a comment -

        Thanks Chen Liang for the update. The latest patch LGTM. +1 and I will commit it shortly.

        Show
        xyao Xiaoyu Yao added a comment - Thanks Chen Liang for the update. The latest patch LGTM. +1 and I will commit it shortly.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 23s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 7m 39s trunk passed
        +1 compile 0m 45s trunk passed
        +1 checkstyle 0m 29s trunk passed
        +1 mvnsite 0m 58s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 1m 47s trunk passed
        +1 javadoc 0m 40s trunk passed
        +1 mvninstall 0m 46s the patch passed
        +1 compile 0m 42s the patch passed
        +1 javac 0m 42s the patch passed
        +1 checkstyle 0m 25s the patch passed
        +1 mvnsite 0m 48s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 45s the patch passed
        +1 javadoc 0m 37s the patch passed
        -1 unit 82m 24s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 20s The patch does not generate ASF License warnings.
        102m 3s



        Reason Tests
        Failed junit tests hadoop.hdfs.tools.TestDelegationTokenFetcher
          hadoop.hdfs.TestPersistBlocks
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
          hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:e809691
        JIRA Issue HDFS-10941
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12838588/HDFS-10941.003.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 0bb9a235a3bc 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 503e73e
        Default Java 1.8.0_101
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/17529/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17529/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17529/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 23s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 39s trunk passed +1 compile 0m 45s trunk passed +1 checkstyle 0m 29s trunk passed +1 mvnsite 0m 58s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 47s trunk passed +1 javadoc 0m 40s trunk passed +1 mvninstall 0m 46s the patch passed +1 compile 0m 42s the patch passed +1 javac 0m 42s the patch passed +1 checkstyle 0m 25s the patch passed +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 45s the patch passed +1 javadoc 0m 37s the patch passed -1 unit 82m 24s hadoop-hdfs in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 102m 3s Reason Tests Failed junit tests hadoop.hdfs.tools.TestDelegationTokenFetcher   hadoop.hdfs.TestPersistBlocks   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength Subsystem Report/Notes Docker Image:yetus/hadoop:e809691 JIRA Issue HDFS-10941 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12838588/HDFS-10941.003.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 0bb9a235a3bc 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 503e73e Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/17529/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17529/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17529/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        vagarychen Chen Liang added a comment -

        Thanks Xiaoyu Yao for the comments! Uploaded v003 patch.

        Show
        vagarychen Chen Liang added a comment - Thanks Xiaoyu Yao for the comments! Uploaded v003 patch.
        Hide
        xyao Xiaoyu Yao added a comment -

        Thanks Chen Liang for the update. The v3 patch has a potential perf issue with the wrapper approach. The toString() and string concat cost will always be there even with the if (LOG.isTraceEnabled()) guard inside the wrapper.

        I would suggest we leverage the slf4j parameterized logging like below to avoid it without the wrapper.
        More detail about sl4fj logging performance can be found here: http://www.slf4j.org/faq.html#logging_performance.

               case UNDER_REPLICATED:
                    LOG.trace("under replicated block: {} result: {}", block, res);
                    nrUnderReplicated++;
        
        Show
        xyao Xiaoyu Yao added a comment - Thanks Chen Liang for the update. The v3 patch has a potential perf issue with the wrapper approach. The toString() and string concat cost will always be there even with the if (LOG.isTraceEnabled()) guard inside the wrapper. I would suggest we leverage the slf4j parameterized logging like below to avoid it without the wrapper. More detail about sl4fj logging performance can be found here: http://www.slf4j.org/faq.html#logging_performance . case UNDER_REPLICATED: LOG.trace( "under replicated block: {} result: {}" , block, res); nrUnderReplicated++;
        Hide
        vagarychen Chen Liang added a comment -

        The failed tests seem unrelated. Local tests never had TestEncryptionZones.testStartFileRetry failed. And the other three tests randomly fail either with or without the patch so it appears the tests themselves are flaky.

        Show
        vagarychen Chen Liang added a comment - The failed tests seem unrelated. Local tests never had TestEncryptionZones.testStartFileRetry failed. And the other three tests randomly fail either with or without the patch so it appears the tests themselves are flaky.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 11s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 8m 35s trunk passed
        +1 compile 0m 51s trunk passed
        +1 checkstyle 0m 28s trunk passed
        +1 mvnsite 1m 1s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 1m 51s trunk passed
        +1 javadoc 0m 41s trunk passed
        +1 mvninstall 0m 53s the patch passed
        +1 compile 0m 44s the patch passed
        +1 javac 0m 44s the patch passed
        +1 checkstyle 0m 24s the patch passed
        +1 mvnsite 0m 50s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 47s the patch passed
        +1 javadoc 0m 36s the patch passed
        -1 unit 77m 33s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 20s The patch does not generate ASF License warnings.
        98m 26s



        Reason Tests
        Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID
          hadoop.hdfs.TestEncryptionZones
          hadoop.hdfs.server.datanode.TestDirectoryScanner
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Issue HDFS-10941
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12836985/HDFS-10941.002.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 258639f113f8 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 7534aee
        Default Java 1.8.0_101
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/17414/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17414/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17414/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 11s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 8m 35s trunk passed +1 compile 0m 51s trunk passed +1 checkstyle 0m 28s trunk passed +1 mvnsite 1m 1s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 51s trunk passed +1 javadoc 0m 41s trunk passed +1 mvninstall 0m 53s the patch passed +1 compile 0m 44s the patch passed +1 javac 0m 44s the patch passed +1 checkstyle 0m 24s the patch passed +1 mvnsite 0m 50s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 47s the patch passed +1 javadoc 0m 36s the patch passed -1 unit 77m 33s hadoop-hdfs in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 98m 26s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID   hadoop.hdfs.TestEncryptionZones   hadoop.hdfs.server.datanode.TestDirectoryScanner   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10941 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12836985/HDFS-10941.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 258639f113f8 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 7534aee Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/17414/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17414/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17414/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        vagarychen Chen Liang added a comment -

        re-upload v002 patch to try to trigger Jenkins

        Show
        vagarychen Chen Liang added a comment - re-upload v002 patch to try to trigger Jenkins
        Hide
        xiaobingo Xiaobing Zhou added a comment -

        +1 pending on Jenkins. Thanks.

        Show
        xiaobingo Xiaobing Zhou added a comment - +1 pending on Jenkins. Thanks.
        Hide
        vagarychen Chen Liang added a comment -

        Thanks Xiaoyu Yao for the comments! Uploaded v002 patch to fix it. The failed unit tests do not seem to be related.

        Show
        vagarychen Chen Liang added a comment - Thanks Xiaoyu Yao for the comments! Uploaded v002 patch to fix it. The failed unit tests do not seem to be related.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 18s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 6m 53s trunk passed
        +1 compile 0m 44s trunk passed
        +1 checkstyle 0m 27s trunk passed
        +1 mvnsite 0m 51s trunk passed
        +1 mvneclipse 0m 12s trunk passed
        +1 findbugs 1m 38s trunk passed
        +1 javadoc 0m 39s trunk passed
        +1 mvninstall 0m 44s the patch passed
        +1 compile 0m 41s the patch passed
        +1 javac 0m 41s the patch passed
        +1 checkstyle 0m 24s the patch passed
        +1 mvnsite 0m 50s the patch passed
        +1 mvneclipse 0m 9s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 46s the patch passed
        +1 javadoc 0m 36s the patch passed
        -1 unit 72m 7s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 21s The patch does not generate ASF License warnings.
        90m 31s



        Reason Tests
        Failed junit tests hadoop.hdfs.web.TestWebHDFS
          hadoop.hdfs.TestEncryptionZones
          hadoop.security.TestPermission



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Issue HDFS-10941
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835631/HDFS-10941.001.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux e2f50c04aef8 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / ac35ee9
        Default Java 1.8.0_101
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/17331/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17331/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17331/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 6m 53s trunk passed +1 compile 0m 44s trunk passed +1 checkstyle 0m 27s trunk passed +1 mvnsite 0m 51s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 38s trunk passed +1 javadoc 0m 39s trunk passed +1 mvninstall 0m 44s the patch passed +1 compile 0m 41s the patch passed +1 javac 0m 41s the patch passed +1 checkstyle 0m 24s the patch passed +1 mvnsite 0m 50s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 46s the patch passed +1 javadoc 0m 36s the patch passed -1 unit 72m 7s hadoop-hdfs in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 90m 31s Reason Tests Failed junit tests hadoop.hdfs.web.TestWebHDFS   hadoop.hdfs.TestEncryptionZones   hadoop.security.TestPermission Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10941 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835631/HDFS-10941.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e2f50c04aef8 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ac35ee9 Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/17331/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17331/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17331/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        xiaobingo Xiaobing Zhou added a comment -

        Thanks for the patch Chen Liang, can you add a shared function to move the logs into it? It will keep code clean.

        Show
        xiaobingo Xiaobing Zhou added a comment - Thanks for the patch Chen Liang , can you add a shared function to move the logs into it? It will keep code clean.
        Hide
        xyao Xiaoyu Yao added a comment -

        Thanks Chen Liang for working on this. Patch LGTM. +1 pending Jenkins.

        Show
        xyao Xiaoyu Yao added a comment - Thanks Chen Liang for working on this. Patch LGTM. +1 pending Jenkins.
        Hide
        vagarychen Chen Liang added a comment -

        To minimize useless log messages, taking the 2nd approach suggested by Xiaoyu Yao.

        Show
        vagarychen Chen Liang added a comment - To minimize useless log messages, taking the 2nd approach suggested by Xiaoyu Yao .

          People

          • Assignee:
            vagarychen Chen Liang
            Reporter:
            xyao Xiaoyu Yao
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development