Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11194

Maintain aggregated peer performance metrics on NameNode

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha4
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The metrics collected in HDFS-10917 should be reported to and aggregated on NameNode as part of heart beat messages. This will make is easy to expose it through JMX to users who are interested in them.

      1. HDFS-11194.01.patch
        111 kB
        Arpit Agarwal
      2. HDFS-11194.02.patch
        121 kB
        Arpit Agarwal
      3. HDFS-11194.03.patch
        121 kB
        Arpit Agarwal
      4. HDFS-11194.04.patch
        121 kB
        Arpit Agarwal
      5. HDFS-11194.05.patch
        121 kB
        Arpit Agarwal
      6. HDFS-11194.06.patch
        121 kB
        Arpit Agarwal
      7. HDFS-11194-03-04.delta
        10 kB
        Arpit Agarwal

        Activity

        Hide
        apurtell Andrew Purtell added a comment - - edited

        It would be an interesting exercise to make a back of the envelope calculation of how much online storage would be required for the aggregate metrics of say 2000 DataNodes. The patch on HDFS-10917 adds 6 MutableQuantiles. Would it be just these or all 60 or so metrics in DataNodeMetrics.java ? Assume each metric is a simple long counter for the sake of argument. That would require 8 * 60 * 2000 = ~1 MB of storage. Of course there will be data structure overheads, the quantile metrics are not single longs, etc. As an operator I can report that maintaining the stability of NameNodes (2.7.x) with respect to avoiding full GC or the Linux OOM killer is already a nontrivial exercise. Perhaps this feature if implemented could be made optional, but it would be better to take an approach like Accumulo and implement a separate metrics aggregation service for the task that can be comfortably run elsewhere than the NN. (See https://accumulo.apache.org/1.7/accumulo_user_manual#_monitor)

        Show
        apurtell Andrew Purtell added a comment - - edited It would be an interesting exercise to make a back of the envelope calculation of how much online storage would be required for the aggregate metrics of say 2000 DataNodes. The patch on HDFS-10917 adds 6 MutableQuantiles. Would it be just these or all 60 or so metrics in DataNodeMetrics.java ? Assume each metric is a simple long counter for the sake of argument. That would require 8 * 60 * 2000 = ~1 MB of storage. Of course there will be data structure overheads, the quantile metrics are not single longs, etc. As an operator I can report that maintaining the stability of NameNodes (2.7.x) with respect to avoiding full GC or the Linux OOM killer is already a nontrivial exercise. Perhaps this feature if implemented could be made optional, but it would be better to take an approach like Accumulo and implement a separate metrics aggregation service for the task that can be comfortably run elsewhere than the NN. (See https://accumulo.apache.org/1.7/accumulo_user_manual#_monitor )
        Hide
        drankye Kai Zheng added a comment -

        For the trade-off, would it be good to put such new metrics into off-heap? We have lots of discussions like this but it's hard for existing inode things so blocked. Maybe it's good for new memory consuming data structures in NameNode.

        Show
        drankye Kai Zheng added a comment - For the trade-off, would it be good to put such new metrics into off-heap? We have lots of discussions like this but it's hard for existing inode things so blocked. Maybe it's good for new memory consuming data structures in NameNode.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Attached a patch. This builds upon the downstream peer latencies collected by DataNodes during the write pipeline (HDFS-10917).

        Over time, the DataNodes will have sufficient samples to determine which peers are slow relative to the rest. This logic should be conservative with some high/low thresholds as safeguards so the set of outliers is tiny compared to all peers. These peers can be reported to the NameNode occasionally, allowing it to detect the top N slow nodes ranked by the number of peers that found them slow.

        The attached patch looks large but most of it is plumbing. The interesting changes are in two classes:

        1. SlowNodeDetector (on the DataNode) – Find high outliers given aggregate peer latencies.
        2. SlowPeerTracker (on the NameNode) – Accumulate reports from DataNodes and expose the top N (currently 5) slow nodes via NameNode JMX, an idea borrowed from HDFS-6982.

        The idea of collecting peer statistics to find slow nodes also came up at the HDFS BoF at a Hadoop Summit (proposed by Allen W., I think). The statistical analysis has ideas from Tsz Wo Nicholas Sze.

        Thank you for the comments Andrew Purtell and Kai Zheng. All of the above is off by default. Assuming 3% of the nodes in the cluster are flagged as outliers by each node (any higher and we need to further tone down the outlier detection), then in a 3000 node cluster the expected NN state is 3000 * (3000 * 3%) * 25 bytes/report ~ 7MB.

        Show
        arpitagarwal Arpit Agarwal added a comment - Attached a patch. This builds upon the downstream peer latencies collected by DataNodes during the write pipeline ( HDFS-10917 ). Over time, the DataNodes will have sufficient samples to determine which peers are slow relative to the rest. This logic should be conservative with some high/low thresholds as safeguards so the set of outliers is tiny compared to all peers. These peers can be reported to the NameNode occasionally, allowing it to detect the top N slow nodes ranked by the number of peers that found them slow. The attached patch looks large but most of it is plumbing. The interesting changes are in two classes: SlowNodeDetector (on the DataNode) – Find high outliers given aggregate peer latencies. SlowPeerTracker (on the NameNode) – Accumulate reports from DataNodes and expose the top N (currently 5) slow nodes via NameNode JMX, an idea borrowed from HDFS-6982 . The idea of collecting peer statistics to find slow nodes also came up at the HDFS BoF at a Hadoop Summit (proposed by Allen W., I think). The statistical analysis has ideas from Tsz Wo Nicholas Sze . Thank you for the comments Andrew Purtell and Kai Zheng . All of the above is off by default. Assuming 3% of the nodes in the cluster are flagged as outliers by each node (any higher and we need to further tone down the outlier detection), then in a 3000 node cluster the expected NN state is 3000 * (3000 * 3%) * 25 bytes/report ~ 7MB.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 13s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 18 new or modified test files.
        0 mvndep 0m 57s Maven dependency ordering for branch
        +1 mvninstall 12m 42s trunk passed
        +1 compile 9m 43s trunk passed
        +1 checkstyle 1m 52s trunk passed
        +1 mvnsite 2m 36s trunk passed
        +1 mvneclipse 0m 55s trunk passed
        +1 findbugs 4m 40s trunk passed
        +1 javadoc 1m 57s trunk passed
        0 mvndep 0m 15s Maven dependency ordering for patch
        +1 mvninstall 2m 6s the patch passed
        +1 compile 9m 34s the patch passed
        +1 cc 9m 34s the patch passed
        -1 javac 9m 34s root generated 1 new + 690 unchanged - 0 fixed = 691 total (was 690)
        -0 checkstyle 1m 52s root: The patch generated 22 new + 1604 unchanged - 5 fixed = 1626 total (was 1609)
        +1 mvnsite 2m 56s the patch passed
        +1 mvneclipse 0m 55s the patch passed
        -1 whitespace 0m 0s The patch has 19 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
        -1 findbugs 1m 38s hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
        -1 findbugs 1m 44s hadoop-hdfs-project/hadoop-hdfs-client generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
        -1 findbugs 2m 4s hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
        -1 javadoc 0m 48s hadoop-common-project_hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
        -1 unit 9m 9s hadoop-common in the patch failed.
        +1 unit 1m 1s hadoop-hdfs-client in the patch passed.
        -1 unit 68m 47s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 35s The patch does not generate ASF License warnings.
        141m 40s



        Reason Tests
        FindBugs module:hadoop-common-project/hadoop-common
          Inconsistent synchronization of org.apache.hadoop.metrics2.lib.RollingAverages.currentSnapshot; locked 57% of time Unsynchronized access at RollingAverages.java:57% of time Unsynchronized access at RollingAverages.java:[line 226]
        FindBugs module:hadoop-hdfs-project/hadoop-hdfs-client
          Redundant nullcheck of org.apache.hadoop.hdfs.server.protocol.SlowPeerReports.slowPeers, which is known to be non-null in org.apache.hadoop.hdfs.server.protocol.SlowPeerReports.equals(Object) Redundant null check at SlowPeerReports.java:is known to be non-null in org.apache.hadoop.hdfs.server.protocol.SlowPeerReports.equals(Object) Redundant null check at SlowPeerReports.java:[line 100]
          Redundant nullcheck of org.apache.hadoop.hdfs.server.protocol.SlowPeerReports.slowPeers, which is known to be non-null in org.apache.hadoop.hdfs.server.protocol.SlowPeerReports.hashCode() Redundant null check at SlowPeerReports.java:is known to be non-null in org.apache.hadoop.hdfs.server.protocol.SlowPeerReports.hashCode() Redundant null check at SlowPeerReports.java:[line 106]
        FindBugs module:hadoop-hdfs-project/hadoop-hdfs
          org.apache.hadoop.hdfs.server.blockmanagement.SlowPeerTracker$ReportForJson defines compareTo(SlowPeerTracker$ReportForJson) and uses Object.equals() At SlowPeerTracker.java:Object.equals() At SlowPeerTracker.java:[lines 236-241]
        Failed junit tests hadoop.metrics2.lib.TestRollingAverages
          hadoop.hdfs.server.datanode.TestBPOfferService
          hadoop.tools.TestHdfsConfigFields
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue HDFS-11194
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12845009/HDFS-11194.01.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc
        uname Linux 2335ac3c55e2 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / f216276
        Default Java 1.8.0_111
        findbugs v3.0.0
        javac https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/diff-compile-javac-root.txt
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/diff-checkstyle-root.txt
        whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/whitespace-eol.txt
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/new-findbugs-hadoop-common-project_hadoop-common.html
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs-client.html
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
        javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/diff-javadoc-javadoc-hadoop-common-project_hadoop-common.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17981/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17981/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 18 new or modified test files. 0 mvndep 0m 57s Maven dependency ordering for branch +1 mvninstall 12m 42s trunk passed +1 compile 9m 43s trunk passed +1 checkstyle 1m 52s trunk passed +1 mvnsite 2m 36s trunk passed +1 mvneclipse 0m 55s trunk passed +1 findbugs 4m 40s trunk passed +1 javadoc 1m 57s trunk passed 0 mvndep 0m 15s Maven dependency ordering for patch +1 mvninstall 2m 6s the patch passed +1 compile 9m 34s the patch passed +1 cc 9m 34s the patch passed -1 javac 9m 34s root generated 1 new + 690 unchanged - 0 fixed = 691 total (was 690) -0 checkstyle 1m 52s root: The patch generated 22 new + 1604 unchanged - 5 fixed = 1626 total (was 1609) +1 mvnsite 2m 56s the patch passed +1 mvneclipse 0m 55s the patch passed -1 whitespace 0m 0s The patch has 19 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply -1 findbugs 1m 38s hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) -1 findbugs 1m 44s hadoop-hdfs-project/hadoop-hdfs-client generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) -1 findbugs 2m 4s hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) -1 javadoc 0m 48s hadoop-common-project_hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) -1 unit 9m 9s hadoop-common in the patch failed. +1 unit 1m 1s hadoop-hdfs-client in the patch passed. -1 unit 68m 47s hadoop-hdfs in the patch failed. +1 asflicense 0m 35s The patch does not generate ASF License warnings. 141m 40s Reason Tests FindBugs module:hadoop-common-project/hadoop-common   Inconsistent synchronization of org.apache.hadoop.metrics2.lib.RollingAverages.currentSnapshot; locked 57% of time Unsynchronized access at RollingAverages.java:57% of time Unsynchronized access at RollingAverages.java: [line 226] FindBugs module:hadoop-hdfs-project/hadoop-hdfs-client   Redundant nullcheck of org.apache.hadoop.hdfs.server.protocol.SlowPeerReports.slowPeers, which is known to be non-null in org.apache.hadoop.hdfs.server.protocol.SlowPeerReports.equals(Object) Redundant null check at SlowPeerReports.java:is known to be non-null in org.apache.hadoop.hdfs.server.protocol.SlowPeerReports.equals(Object) Redundant null check at SlowPeerReports.java: [line 100]   Redundant nullcheck of org.apache.hadoop.hdfs.server.protocol.SlowPeerReports.slowPeers, which is known to be non-null in org.apache.hadoop.hdfs.server.protocol.SlowPeerReports.hashCode() Redundant null check at SlowPeerReports.java:is known to be non-null in org.apache.hadoop.hdfs.server.protocol.SlowPeerReports.hashCode() Redundant null check at SlowPeerReports.java: [line 106] FindBugs module:hadoop-hdfs-project/hadoop-hdfs   org.apache.hadoop.hdfs.server.blockmanagement.SlowPeerTracker$ReportForJson defines compareTo(SlowPeerTracker$ReportForJson) and uses Object.equals() At SlowPeerTracker.java:Object.equals() At SlowPeerTracker.java: [lines 236-241] Failed junit tests hadoop.metrics2.lib.TestRollingAverages   hadoop.hdfs.server.datanode.TestBPOfferService   hadoop.tools.TestHdfsConfigFields   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11194 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12845009/HDFS-11194.01.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc uname Linux 2335ac3c55e2 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / f216276 Default Java 1.8.0_111 findbugs v3.0.0 javac https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/diff-compile-javac-root.txt checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/diff-checkstyle-root.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/whitespace-eol.txt findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/new-findbugs-hadoop-common-project_hadoop-common.html findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs-client.html findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/diff-javadoc-javadoc-hadoop-common-project_hadoop-common.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/17981/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17981/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17981/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 10s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 20 new or modified test files.
        0 mvndep 0m 15s Maven dependency ordering for branch
        +1 mvninstall 12m 26s trunk passed
        +1 compile 9m 33s trunk passed
        +1 checkstyle 1m 52s trunk passed
        +1 mvnsite 2m 33s trunk passed
        +1 mvneclipse 0m 54s trunk passed
        +1 findbugs 4m 37s trunk passed
        +1 javadoc 1m 57s trunk passed
        0 mvndep 0m 15s Maven dependency ordering for patch
        +1 mvninstall 1m 51s the patch passed
        +1 compile 9m 10s the patch passed
        +1 cc 9m 10s the patch passed
        -1 javac 9m 10s root generated 1 new + 690 unchanged - 0 fixed = 691 total (was 690)
        -0 checkstyle 1m 54s root: The patch generated 12 new + 1666 unchanged - 5 fixed = 1678 total (was 1671)
        +1 mvnsite 2m 29s the patch passed
        +1 mvneclipse 0m 54s the patch passed
        -1 whitespace 0m 0s The patch has 8 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
        +1 xml 0m 2s The patch has no ill-formed XML file.
        +1 findbugs 5m 5s the patch passed
        +1 javadoc 1m 55s the patch passed
        +1 unit 9m 25s hadoop-common in the patch passed.
        +1 unit 1m 1s hadoop-hdfs-client in the patch passed.
        -1 unit 63m 55s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 33s The patch does not generate ASF License warnings.
        134m 6s



        Reason Tests
        Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue HDFS-11194
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12845104/HDFS-11194.02.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
        uname Linux 190b84f76e01 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 95c2c24
        Default Java 1.8.0_111
        findbugs v3.0.0
        javac https://builds.apache.org/job/PreCommit-HDFS-Build/17987/artifact/patchprocess/diff-compile-javac-root.txt
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17987/artifact/patchprocess/diff-checkstyle-root.txt
        whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/17987/artifact/patchprocess/whitespace-eol.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/17987/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17987/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17987/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 20 new or modified test files. 0 mvndep 0m 15s Maven dependency ordering for branch +1 mvninstall 12m 26s trunk passed +1 compile 9m 33s trunk passed +1 checkstyle 1m 52s trunk passed +1 mvnsite 2m 33s trunk passed +1 mvneclipse 0m 54s trunk passed +1 findbugs 4m 37s trunk passed +1 javadoc 1m 57s trunk passed 0 mvndep 0m 15s Maven dependency ordering for patch +1 mvninstall 1m 51s the patch passed +1 compile 9m 10s the patch passed +1 cc 9m 10s the patch passed -1 javac 9m 10s root generated 1 new + 690 unchanged - 0 fixed = 691 total (was 690) -0 checkstyle 1m 54s root: The patch generated 12 new + 1666 unchanged - 5 fixed = 1678 total (was 1671) +1 mvnsite 2m 29s the patch passed +1 mvneclipse 0m 54s the patch passed -1 whitespace 0m 0s The patch has 8 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply +1 xml 0m 2s The patch has no ill-formed XML file. +1 findbugs 5m 5s the patch passed +1 javadoc 1m 55s the patch passed +1 unit 9m 25s hadoop-common in the patch passed. +1 unit 1m 1s hadoop-hdfs-client in the patch passed. -1 unit 63m 55s hadoop-hdfs in the patch failed. +1 asflicense 0m 33s The patch does not generate ASF License warnings. 134m 6s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11194 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12845104/HDFS-11194.02.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux 190b84f76e01 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 95c2c24 Default Java 1.8.0_111 findbugs v3.0.0 javac https://builds.apache.org/job/PreCommit-HDFS-Build/17987/artifact/patchprocess/diff-compile-javac-root.txt checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17987/artifact/patchprocess/diff-checkstyle-root.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/17987/artifact/patchprocess/whitespace-eol.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/17987/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17987/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17987/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 10s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 20 new or modified test files.
        0 mvndep 0m 16s Maven dependency ordering for branch
        +1 mvninstall 13m 52s trunk passed
        +1 compile 10m 59s trunk passed
        +1 checkstyle 1m 58s trunk passed
        +1 mvnsite 2m 43s trunk passed
        +1 mvneclipse 0m 58s trunk passed
        +1 findbugs 4m 50s trunk passed
        +1 javadoc 1m 55s trunk passed
        0 mvndep 0m 14s Maven dependency ordering for patch
        +1 mvninstall 1m 52s the patch passed
        +1 compile 9m 18s the patch passed
        +1 cc 9m 18s the patch passed
        +1 javac 9m 18s the patch passed
        -0 checkstyle 2m 21s root: The patch generated 7 new + 1666 unchanged - 5 fixed = 1673 total (was 1671)
        +1 mvnsite 2m 38s the patch passed
        +1 mvneclipse 0m 54s the patch passed
        -1 whitespace 0m 0s The patch has 8 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
        +1 xml 0m 2s The patch has no ill-formed XML file.
        +1 findbugs 5m 9s the patch passed
        +1 javadoc 1m 58s the patch passed
        +1 unit 8m 27s hadoop-common in the patch passed.
        +1 unit 1m 0s hadoop-hdfs-client in the patch passed.
        +1 unit 62m 30s hadoop-hdfs in the patch passed.
        +1 asflicense 0m 33s The patch does not generate ASF License warnings.
        135m 59s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue HDFS-11194
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12845117/HDFS-11194.03.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
        uname Linux 20947df75f90 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 95c2c24
        Default Java 1.8.0_111
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17990/artifact/patchprocess/diff-checkstyle-root.txt
        whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/17990/artifact/patchprocess/whitespace-eol.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17990/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17990/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 20 new or modified test files. 0 mvndep 0m 16s Maven dependency ordering for branch +1 mvninstall 13m 52s trunk passed +1 compile 10m 59s trunk passed +1 checkstyle 1m 58s trunk passed +1 mvnsite 2m 43s trunk passed +1 mvneclipse 0m 58s trunk passed +1 findbugs 4m 50s trunk passed +1 javadoc 1m 55s trunk passed 0 mvndep 0m 14s Maven dependency ordering for patch +1 mvninstall 1m 52s the patch passed +1 compile 9m 18s the patch passed +1 cc 9m 18s the patch passed +1 javac 9m 18s the patch passed -0 checkstyle 2m 21s root: The patch generated 7 new + 1666 unchanged - 5 fixed = 1673 total (was 1671) +1 mvnsite 2m 38s the patch passed +1 mvneclipse 0m 54s the patch passed -1 whitespace 0m 0s The patch has 8 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply +1 xml 0m 2s The patch has no ill-formed XML file. +1 findbugs 5m 9s the patch passed +1 javadoc 1m 58s the patch passed +1 unit 8m 27s hadoop-common in the patch passed. +1 unit 1m 0s hadoop-hdfs-client in the patch passed. +1 unit 62m 30s hadoop-hdfs in the patch passed. +1 asflicense 0m 33s The patch does not generate ASF License warnings. 135m 59s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11194 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12845117/HDFS-11194.03.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux 20947df75f90 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 95c2c24 Default Java 1.8.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17990/artifact/patchprocess/diff-checkstyle-root.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/17990/artifact/patchprocess/whitespace-eol.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17990/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17990/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        xyao Xiaoyu Yao added a comment -

        Thanks Arpit Agarwal for working on this and all for the discussion. I have the following comments on the production side changes. Still reviewing the unit test changes and will post my comments on that soon.

        1. BlockReceiver.java
        NIT: Line 848: "&& mirrorAddr != null" can be removed.
        Line 849: can be simplified with "peerMetrics.addSendPacketDownstream"

        2. BPServiceActor.java
        Line 1146: NIT: heatbeatTime can be changed to slowPeerReportTime or remove the
        parameter by hiding the montonicNow() call inside scheduleNextSlowPeerReport().

        3. DatanodeManager.java
        Line 52-53: NIT: avoid import *
        import org.apache.hadoop.util.*;
        import org.apache.hadoop.util.Timer;

        Line 180. The comments seems incomplete.

        Line 212. we should instantiate slowPeerTracker only if dataNodePeerStatsEnabled
        is true.

        Line 1653-1660: NIT: can we tweak the code to avoid calling slowPeers.getSlowPeers()
        multiple times in the worst case and maybe avoid the if (LOG.isDebugEnabled()) with
        parameterized logging?

        Line 1659: can we use nodeinfo.getIpcAddr() sicne the datanode
        has registered?

        4. DataNodePeerMetrics.java
        Line 142-143: Correct me if I'm wrong, looks like the comments is for stats Map
        in Line 137.

        5. DatanodeProtocol.proto
        Line 398-405. This is a very good document. Can we add a field indicating the
        DN aggregate mechanism? This way the NN can enforce consistent aggregation
        across all the datanodes. This can be done in a separate ticket.

        6. DFSConfigKeys.java
        Line 677: document for dfs.datanode.slow.peers.report.interval? We can open
        separate ticket for it.

        7. RollingAverage.java
        Great catch on some missing synchronized on rollOverAvgs.
        NIT: Line 264: missing @param for minSamples

        8. SlowNodeDetector.java
        Line 99-108: We can make this an interface to allow different aggregation methods
        (median, 90th percentile) for outlier detection. This can be done in a separate ticket.
        We can also use Median/Percentile class from apache common to implement
        different aggregation.

        Line 127: we need to guard the tracing with if (LOG.isTraceEnabled()) to
        avoid the implicit sorted.toString() overhead.

        9. SlowPeerReports.java
        Line 44: NIT: typo consistenly -> consistently
        Line 144: NIT: the document needs to update to match the code which returns
        a map -> sortedset of string.
        Line 190: Can we make MAX_NODES_TO_REPORT configurable? This can be fixed in
        a separate ticket.

        Show
        xyao Xiaoyu Yao added a comment - Thanks Arpit Agarwal for working on this and all for the discussion. I have the following comments on the production side changes. Still reviewing the unit test changes and will post my comments on that soon. 1. BlockReceiver.java NIT: Line 848: "&& mirrorAddr != null" can be removed. Line 849: can be simplified with "peerMetrics.addSendPacketDownstream" 2. BPServiceActor.java Line 1146: NIT: heatbeatTime can be changed to slowPeerReportTime or remove the parameter by hiding the montonicNow() call inside scheduleNextSlowPeerReport(). 3. DatanodeManager.java Line 52-53: NIT: avoid import * import org.apache.hadoop.util.*; import org.apache.hadoop.util.Timer; Line 180. The comments seems incomplete. Line 212. we should instantiate slowPeerTracker only if dataNodePeerStatsEnabled is true. Line 1653-1660: NIT: can we tweak the code to avoid calling slowPeers.getSlowPeers() multiple times in the worst case and maybe avoid the if (LOG.isDebugEnabled()) with parameterized logging? Line 1659: can we use nodeinfo.getIpcAddr() sicne the datanode has registered? 4. DataNodePeerMetrics.java Line 142-143: Correct me if I'm wrong, looks like the comments is for stats Map in Line 137. 5. DatanodeProtocol.proto Line 398-405. This is a very good document. Can we add a field indicating the DN aggregate mechanism? This way the NN can enforce consistent aggregation across all the datanodes. This can be done in a separate ticket. 6. DFSConfigKeys.java Line 677: document for dfs.datanode.slow.peers.report.interval? We can open separate ticket for it. 7. RollingAverage.java Great catch on some missing synchronized on rollOverAvgs. NIT: Line 264: missing @param for minSamples 8. SlowNodeDetector.java Line 99-108: We can make this an interface to allow different aggregation methods (median, 90th percentile) for outlier detection. This can be done in a separate ticket. We can also use Median/Percentile class from apache common to implement different aggregation. Line 127: we need to guard the tracing with if (LOG.isTraceEnabled()) to avoid the implicit sorted.toString() overhead. 9. SlowPeerReports.java Line 44: NIT: typo consistenly -> consistently Line 144: NIT: the document needs to update to match the code which returns a map -> sortedset of string. Line 190: Can we make MAX_NODES_TO_REPORT configurable? This can be fixed in a separate ticket.
        Hide
        xyao Xiaoyu Yao added a comment -

        TestHeartbeatHandling.java
        Line 60: is the 300_000 a typo or special usage of timeout rule?

          
        public Timeout testTimeout = new Timeout(300_000);
        

        TestSlowPeerTracker.java
        Line 54: same as above.

        Show
        xyao Xiaoyu Yao added a comment - TestHeartbeatHandling.java Line 60: is the 300_000 a typo or special usage of timeout rule? public Timeout testTimeout = new Timeout(300_000); TestSlowPeerTracker.java Line 54: same as above.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Thanks for the review Xiaoyu Yao. The v4 patch addresses your feedback. A few comments below:

        Line 212. we should instantiate slowPeerTracker only if dataNodePeerStatsEnabled is true.

        We only instantiate it when dataNodePeerStatsEnabled is true. Let me know if I misunderstood your comment.

        Line 1653-1660: NIT: can we tweak the code to avoid calling slowPeers.getSlowPeers() multiple times in the worst case and maybe avoid the if (LOG.isDebugEnabled()) with parameterized logging?

        Fixed. Can't change the logging since DataNodeManager has not been converted to slf4j yet.

        Line 1659: can we use nodeinfo.getIpcAddr() sicne the datanode has registered?

        That overload is private. Did you mean use nodeInfo.getIpcAddr(true)? I used IP addresses everywhere since that's what the DataNodes report on their peers.

        Great catch on some missing synchronized on rollOverAvgs.

        The synchronization was fine since the callers always get the lock but it was triggering a findbugs false positive.

        We can make this an interface to allow different aggregation methods (median, 90th percentile) for outlier detection.

        Thanks. We shouldn't need an interface here as the SlowNodeDetector is agnostic to which aggregation is used.

        we need to guard the tracing with if (LOG.isTraceEnabled()) to avoid the implicit sorted.toString() overhead.

        I think the sorted.toString method will not be called unless the trace level is enabled. Here's the implementation of Logger#trace from Log4jLoggerAdapter:

        public void trace(String format, Object... arguments) {
          if (isTraceEnabled()) {
            FormattingTuple ft = MessageFormatter.arrayFormat(format, arguments);
            logger.log(FQCN, traceCapable ? Level.TRACE : Level.DEBUG, ft
                .getMessage(), ft.getThrowable());
          }
        }
        

        Line 60: is the 300_000 a typo or special usage of timeout rule?

        That's for readability to help count zeros.

        Show
        arpitagarwal Arpit Agarwal added a comment - Thanks for the review Xiaoyu Yao . The v4 patch addresses your feedback. A few comments below: Line 212. we should instantiate slowPeerTracker only if dataNodePeerStatsEnabled is true. We only instantiate it when dataNodePeerStatsEnabled is true. Let me know if I misunderstood your comment. Line 1653-1660: NIT: can we tweak the code to avoid calling slowPeers.getSlowPeers() multiple times in the worst case and maybe avoid the if (LOG.isDebugEnabled()) with parameterized logging? Fixed. Can't change the logging since DataNodeManager has not been converted to slf4j yet. Line 1659: can we use nodeinfo.getIpcAddr() sicne the datanode has registered? That overload is private. Did you mean use nodeInfo.getIpcAddr(true)? I used IP addresses everywhere since that's what the DataNodes report on their peers. Great catch on some missing synchronized on rollOverAvgs. The synchronization was fine since the callers always get the lock but it was triggering a findbugs false positive. We can make this an interface to allow different aggregation methods (median, 90th percentile) for outlier detection. Thanks. We shouldn't need an interface here as the SlowNodeDetector is agnostic to which aggregation is used. we need to guard the tracing with if (LOG.isTraceEnabled()) to avoid the implicit sorted.toString() overhead. I think the sorted.toString method will not be called unless the trace level is enabled. Here's the implementation of Logger#trace from Log4jLoggerAdapter: public void trace( String format, Object ... arguments) { if (isTraceEnabled()) { FormattingTuple ft = MessageFormatter.arrayFormat(format, arguments); logger.log(FQCN, traceCapable ? Level.TRACE : Level.DEBUG, ft .getMessage(), ft.getThrowable()); } } Line 60: is the 300_000 a typo or special usage of timeout rule? That's for readability to help count zeros.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        HDFS-11194-03-04.delta has the changes from the v03 -> v04 patch for reviewing.

        Show
        arpitagarwal Arpit Agarwal added a comment - HDFS-11194 -03-04.delta has the changes from the v03 -> v04 patch for reviewing.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 13s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 20 new or modified test files.
        0 mvndep 1m 13s Maven dependency ordering for branch
        +1 mvninstall 12m 27s trunk passed
        +1 compile 9m 49s trunk passed
        +1 checkstyle 1m 52s trunk passed
        +1 mvnsite 2m 33s trunk passed
        +1 mvneclipse 0m 56s trunk passed
        +1 findbugs 4m 48s trunk passed
        +1 javadoc 1m 59s trunk passed
        0 mvndep 0m 14s Maven dependency ordering for patch
        +1 mvninstall 1m 55s the patch passed
        +1 compile 9m 16s the patch passed
        +1 cc 9m 16s the patch passed
        +1 javac 9m 16s the patch passed
        -0 checkstyle 1m 53s root: The patch generated 7 new + 1666 unchanged - 5 fixed = 1673 total (was 1671)
        +1 mvnsite 2m 32s the patch passed
        +1 mvneclipse 0m 54s the patch passed
        -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
        +1 xml 0m 1s The patch has no ill-formed XML file.
        +1 findbugs 5m 8s the patch passed
        +1 javadoc 1m 58s the patch passed
        +1 unit 8m 27s hadoop-common in the patch passed.
        +1 unit 1m 0s hadoop-hdfs-client in the patch passed.
        -1 unit 83m 45s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 40s The patch does not generate ASF License warnings.
        154m 55s



        Reason Tests
        Failed junit tests hadoop.hdfs.server.namenode.TestDecommissioningStatus
          hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
        Timed out junit tests org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue HDFS-11194
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12845901/HDFS-11194.04.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
        uname Linux d2dc9ae0acbf 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 8850c05
        Default Java 1.8.0_111
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18042/artifact/patchprocess/diff-checkstyle-root.txt
        whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/18042/artifact/patchprocess/whitespace-eol.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/18042/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18042/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18042/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 20 new or modified test files. 0 mvndep 1m 13s Maven dependency ordering for branch +1 mvninstall 12m 27s trunk passed +1 compile 9m 49s trunk passed +1 checkstyle 1m 52s trunk passed +1 mvnsite 2m 33s trunk passed +1 mvneclipse 0m 56s trunk passed +1 findbugs 4m 48s trunk passed +1 javadoc 1m 59s trunk passed 0 mvndep 0m 14s Maven dependency ordering for patch +1 mvninstall 1m 55s the patch passed +1 compile 9m 16s the patch passed +1 cc 9m 16s the patch passed +1 javac 9m 16s the patch passed -0 checkstyle 1m 53s root: The patch generated 7 new + 1666 unchanged - 5 fixed = 1673 total (was 1671) +1 mvnsite 2m 32s the patch passed +1 mvneclipse 0m 54s the patch passed -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 5m 8s the patch passed +1 javadoc 1m 58s the patch passed +1 unit 8m 27s hadoop-common in the patch passed. +1 unit 1m 0s hadoop-hdfs-client in the patch passed. -1 unit 83m 45s hadoop-hdfs in the patch failed. +1 asflicense 0m 40s The patch does not generate ASF License warnings. 154m 55s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestDecommissioningStatus   hadoop.hdfs.server.namenode.TestNamenodeCapacityReport Timed out junit tests org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11194 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12845901/HDFS-11194.04.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux d2dc9ae0acbf 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8850c05 Default Java 1.8.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18042/artifact/patchprocess/diff-checkstyle-root.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/18042/artifact/patchprocess/whitespace-eol.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18042/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18042/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18042/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        xiaobingo Xiaobing Zhou added a comment - - edited

        Thank you Arpit Agarwal for the patch.
        I've some comments.

        1. RollingAverages#getStats is using non-rolling mean states, it should use rolling ones instead. See RollingAverages#snapshot for calculation of rolling averages.
        2. DFS_METRICS_ROLLING_AVERAGES_WINDOW_SIZE_DEFAULT should be changed to the same naming with DFS_METRICS_ROLLING_AVERAGE_WINDOW_LENGTH_KEY
        3. These parameters can be changed to be configurable.
          SlowNodeDetector#minOutlierDetectionPeers
          DataNodePeerMetrics#LOW_THRESHOLD_MS
          SlowPeerTracker#MAX_NODES_TO_REPORT
        4. In BlockReceiver#receivePacket, after trackSendPacketToLastNodeInPipeline(duration); It may need to change a bit, e.g. if (duration > DataNodePeerMetrics#LOW_THRESHOLD_MS). or remove the warning msg at all.
        Show
        xiaobingo Xiaobing Zhou added a comment - - edited Thank you Arpit Agarwal for the patch. I've some comments. RollingAverages#getStats is using non-rolling mean states, it should use rolling ones instead. See RollingAverages#snapshot for calculation of rolling averages. DFS_METRICS_ROLLING_AVERAGES_WINDOW_SIZE_DEFAULT should be changed to the same naming with DFS_METRICS_ROLLING_AVERAGE_WINDOW_LENGTH_KEY These parameters can be changed to be configurable. SlowNodeDetector#minOutlierDetectionPeers DataNodePeerMetrics#LOW_THRESHOLD_MS SlowPeerTracker#MAX_NODES_TO_REPORT In BlockReceiver#receivePacket, after trackSendPacketToLastNodeInPipeline(duration); It may need to change a bit, e.g. if (duration > DataNodePeerMetrics#LOW_THRESHOLD_MS). or remove the warning msg at all.
        Hide
        arpitagarwal Arpit Agarwal added a comment - - edited

        Thanks for the review Xiaobing Zhou. The v05 patch addresses your feedback.

        RollingAverages#getStats is using non-rolling mean states, it should use rolling ones instead. See RollingAverages#snapshot for calculation of rolling averages.

        Nice catch, fixed.

        DFS_METRICS_ROLLING_AVERAGES_WINDOW_SIZE_DEFAULT should be changed to the same naming with DFS_METRICS_ROLLING_AVERAGE_WINDOW_LENGTH_KEY

        Fixed.

        These parameters can be changed to be configurable.

        Can we discuss making them configurable in a separate Jira? It probably makes sense to do so but I also want to limit new config parameters.

        In BlockReceiver#receivePacket, after trackSendPacketToLastNodeInPipeline(duration); It may need to change a bit, e.g. if (duration > DataNodePeerMetrics#LOW_THRESHOLD_MS). or remove the warning msg at all.

        Yes. We could check that dfs.datanode.slow.io.warning.threshold.ms is greater than DataNodePeerMetrics#LOW_THRESHOLD_MS. Let's discuss that separately too. I am not documenting the new config settings for now.

        Show
        arpitagarwal Arpit Agarwal added a comment - - edited Thanks for the review Xiaobing Zhou . The v05 patch addresses your feedback. RollingAverages#getStats is using non-rolling mean states, it should use rolling ones instead. See RollingAverages#snapshot for calculation of rolling averages. Nice catch, fixed. DFS_METRICS_ROLLING_AVERAGES_WINDOW_SIZE_DEFAULT should be changed to the same naming with DFS_METRICS_ROLLING_AVERAGE_WINDOW_LENGTH_KEY Fixed. These parameters can be changed to be configurable. Can we discuss making them configurable in a separate Jira? It probably makes sense to do so but I also want to limit new config parameters. In BlockReceiver#receivePacket, after trackSendPacketToLastNodeInPipeline(duration); It may need to change a bit, e.g. if (duration > DataNodePeerMetrics#LOW_THRESHOLD_MS). or remove the warning msg at all. Yes. We could check that dfs.datanode.slow.io.warning.threshold.ms is greater than DataNodePeerMetrics#LOW_THRESHOLD_MS . Let's discuss that separately too. I am not documenting the new config settings for now.
        Hide
        xyao Xiaoyu Yao added a comment -

        Thanks Arpit Agarwal for updating the patch. Patch v5 looks good to me. +1 pending Jenkins.

        Show
        xyao Xiaoyu Yao added a comment - Thanks Arpit Agarwal for updating the patch. Patch v5 looks good to me. +1 pending Jenkins.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 19s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 20 new or modified test files.
        0 mvndep 2m 1s Maven dependency ordering for branch
        +1 mvninstall 16m 31s trunk passed
        +1 compile 15m 15s trunk passed
        +1 checkstyle 2m 9s trunk passed
        +1 mvnsite 3m 14s trunk passed
        +1 mvneclipse 1m 3s trunk passed
        +1 findbugs 5m 43s trunk passed
        +1 javadoc 2m 18s trunk passed
        0 mvndep 0m 16s Maven dependency ordering for patch
        +1 mvninstall 2m 23s the patch passed
        +1 compile 13m 15s the patch passed
        +1 cc 13m 15s the patch passed
        +1 javac 13m 15s the patch passed
        -0 checkstyle 2m 3s root: The patch generated 10 new + 1666 unchanged - 5 fixed = 1676 total (was 1671)
        +1 mvnsite 2m 55s the patch passed
        +1 mvneclipse 0m 58s the patch passed
        -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
        +1 xml 0m 1s The patch has no ill-formed XML file.
        +1 findbugs 5m 57s the patch passed
        +1 javadoc 2m 12s the patch passed
        +1 unit 8m 47s hadoop-common in the patch passed.
        +1 unit 1m 8s hadoop-hdfs-client in the patch passed.
        -1 unit 83m 42s hadoop-hdfs in the patch failed.
        +1 asflicense 1m 1s The patch does not generate ASF License warnings.
        174m 43s



        Reason Tests
        Failed junit tests hadoop.hdfs.TestDFSClientRetries
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
        Timed out junit tests org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue HDFS-11194
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12849024/HDFS-11194.05.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
        uname Linux dbf5d69c5698 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / a2c5012
        Default Java 1.8.0_121
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18245/artifact/patchprocess/diff-checkstyle-root.txt
        whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/18245/artifact/patchprocess/whitespace-eol.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/18245/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18245/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18245/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 20 new or modified test files. 0 mvndep 2m 1s Maven dependency ordering for branch +1 mvninstall 16m 31s trunk passed +1 compile 15m 15s trunk passed +1 checkstyle 2m 9s trunk passed +1 mvnsite 3m 14s trunk passed +1 mvneclipse 1m 3s trunk passed +1 findbugs 5m 43s trunk passed +1 javadoc 2m 18s trunk passed 0 mvndep 0m 16s Maven dependency ordering for patch +1 mvninstall 2m 23s the patch passed +1 compile 13m 15s the patch passed +1 cc 13m 15s the patch passed +1 javac 13m 15s the patch passed -0 checkstyle 2m 3s root: The patch generated 10 new + 1666 unchanged - 5 fixed = 1676 total (was 1671) +1 mvnsite 2m 55s the patch passed +1 mvneclipse 0m 58s the patch passed -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 5m 57s the patch passed +1 javadoc 2m 12s the patch passed +1 unit 8m 47s hadoop-common in the patch passed. +1 unit 1m 8s hadoop-hdfs-client in the patch passed. -1 unit 83m 42s hadoop-hdfs in the patch failed. +1 asflicense 1m 1s The patch does not generate ASF License warnings. 174m 43s Reason Tests Failed junit tests hadoop.hdfs.TestDFSClientRetries   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Timed out junit tests org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11194 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12849024/HDFS-11194.05.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux dbf5d69c5698 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / a2c5012 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18245/artifact/patchprocess/diff-checkstyle-root.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/18245/artifact/patchprocess/whitespace-eol.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18245/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18245/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18245/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        xiaobingo Xiaobing Zhou added a comment -

        Thanks for v5 patch. Arpit Agarwal. Having some minor comments, otherwise LGTM.

        1. there is minor diff of 'S' in AVERAGE
          DFSConfigKeys.DFS_METRICS_ROLLING_AVERAGE_WINDOW_LENGTH_KEY
          DFSConfigKeys.DFS_METRICS_ROLLING_AVERAGES_WINDOW_LENGTH_DEFAULT
        1. some check style issues
        Show
        xiaobingo Xiaobing Zhou added a comment - Thanks for v5 patch. Arpit Agarwal . Having some minor comments, otherwise LGTM. there is minor diff of 'S' in AVERAGE DFSConfigKeys.DFS_METRICS_ROLLING_AVERAGE_WINDOW_LENGTH_KEY DFSConfigKeys.DFS_METRICS_ROLLING_AVERAGES_WINDOW_LENGTH_DEFAULT some check style issues
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Thanks for the reviews Xiaoyu Yao and Xiaobing Zhou.

        The v06 patch fixes the setting name and a few valid checkstyle issues. The test failures are unrelated to the patch and don't repro locally.

        Show
        arpitagarwal Arpit Agarwal added a comment - Thanks for the reviews Xiaoyu Yao and Xiaobing Zhou . The v06 patch fixes the setting name and a few valid checkstyle issues. The test failures are unrelated to the patch and don't repro locally.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 44s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 20 new or modified test files.
        0 mvndep 2m 54s Maven dependency ordering for branch
        +1 mvninstall 15m 47s trunk passed
        +1 compile 13m 1s trunk passed
        +1 checkstyle 1m 55s trunk passed
        +1 mvnsite 2m 43s trunk passed
        +1 mvneclipse 0m 58s trunk passed
        +1 findbugs 4m 58s trunk passed
        +1 javadoc 1m 58s trunk passed
        0 mvndep 0m 14s Maven dependency ordering for patch
        +1 mvninstall 1m 54s the patch passed
        +1 compile 10m 36s the patch passed
        +1 cc 10m 36s the patch passed
        +1 javac 10m 36s the patch passed
        -0 checkstyle 1m 56s root: The patch generated 9 new + 1666 unchanged - 5 fixed = 1675 total (was 1671)
        +1 mvnsite 2m 34s the patch passed
        +1 mvneclipse 0m 56s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 xml 0m 2s The patch has no ill-formed XML file.
        +1 findbugs 5m 10s the patch passed
        +1 javadoc 2m 0s the patch passed
        +1 unit 8m 41s hadoop-common in the patch passed.
        +1 unit 1m 8s hadoop-hdfs-client in the patch passed.
        -1 unit 98m 36s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 39s The patch does not generate ASF License warnings.
        180m 54s



        Reason Tests
        Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
          hadoop.hdfs.server.namenode.ha.TestHAAppend
        Timed out junit tests org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue HDFS-11194
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12849166/HDFS-11194.06.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml
        uname Linux d9da0c59cf56 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / a33ce45
        Default Java 1.8.0_121
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18252/artifact/patchprocess/diff-checkstyle-root.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/18252/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18252/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: .
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18252/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 44s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 20 new or modified test files. 0 mvndep 2m 54s Maven dependency ordering for branch +1 mvninstall 15m 47s trunk passed +1 compile 13m 1s trunk passed +1 checkstyle 1m 55s trunk passed +1 mvnsite 2m 43s trunk passed +1 mvneclipse 0m 58s trunk passed +1 findbugs 4m 58s trunk passed +1 javadoc 1m 58s trunk passed 0 mvndep 0m 14s Maven dependency ordering for patch +1 mvninstall 1m 54s the patch passed +1 compile 10m 36s the patch passed +1 cc 10m 36s the patch passed +1 javac 10m 36s the patch passed -0 checkstyle 1m 56s root: The patch generated 9 new + 1666 unchanged - 5 fixed = 1675 total (was 1671) +1 mvnsite 2m 34s the patch passed +1 mvneclipse 0m 56s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 2s The patch has no ill-formed XML file. +1 findbugs 5m 10s the patch passed +1 javadoc 2m 0s the patch passed +1 unit 8m 41s hadoop-common in the patch passed. +1 unit 1m 8s hadoop-hdfs-client in the patch passed. -1 unit 98m 36s hadoop-hdfs in the patch failed. +1 asflicense 0m 39s The patch does not generate ASF License warnings. 180m 54s Reason Tests Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes   hadoop.hdfs.server.namenode.ha.TestHAAppend Timed out junit tests org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11194 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12849166/HDFS-11194.06.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml uname Linux d9da0c59cf56 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / a33ce45 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18252/artifact/patchprocess/diff-checkstyle-root.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18252/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18252/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18252/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Pushed to trunk.

        Show
        arpitagarwal Arpit Agarwal added a comment - Pushed to trunk.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11168 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11168/)
        HDFS-11194. Maintain aggregated peer performance metrics on NameNode. (arp: rev b57368b6f893cb27d77fc9425e116f1312f4790f)

        • (add) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/SlowNodeDetector.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
        • (add) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/protocol/SlowPeerReports.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeLifeline.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
        • (add) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/SlowPeerTracker.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocolPB/TestPBHelper.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
        • (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/lib/TestRollingAverages.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
        • (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/metrics/TestDataNodeOutlierDetectionViaMetrics.java
        • (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/metrics/TestSlowNodeDetector.java
        • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/lib/RollingAverages.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodePeerMetrics.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodePeerMetrics.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/InternalDataNodeTestUtils.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestHeartbeatHandling.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tools/TestHdfsConfigFields.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
        • (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestSlowPeerTracker.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11168 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11168/ ) HDFS-11194 . Maintain aggregated peer performance metrics on NameNode. (arp: rev b57368b6f893cb27d77fc9425e116f1312f4790f) (add) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/SlowNodeDetector.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java (add) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/protocol/SlowPeerReports.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeLifeline.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java (add) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/SlowPeerTracker.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocolPB/TestPBHelper.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/lib/TestRollingAverages.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/metrics/TestDataNodeOutlierDetectionViaMetrics.java (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/metrics/TestSlowNodeDetector.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/lib/RollingAverages.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodePeerMetrics.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodePeerMetrics.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/InternalDataNodeTestUtils.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestHeartbeatHandling.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tools/TestHdfsConfigFields.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestSlowPeerTracker.java
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Cherry-picked to branch-2. Had to add a few template type arguments to compile with Java 7. Reran affected unit tests.

        Show
        arpitagarwal Arpit Agarwal added a comment - Cherry-picked to branch-2. Had to add a few template type arguments to compile with Java 7. Reran affected unit tests.
        Hide
        andrew.wang Andrew Wang added a comment -

        This is a really cool feature. Would someone mind adding a release note on how to configure and view these new metrics?

        Show
        andrew.wang Andrew Wang added a comment - This is a really cool feature. Would someone mind adding a release note on how to configure and view these new metrics?
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        We'll have a release note and docs out in time for alpha3.

        Show
        arpitagarwal Arpit Agarwal added a comment - We'll have a release note and docs out in time for alpha3.
        Hide
        andrew.wang Andrew Wang added a comment -

        Great, thanks Arpit!

        Show
        andrew.wang Andrew Wang added a comment - Great, thanks Arpit!

          People

          • Assignee:
            arpitagarwal Arpit Agarwal
            Reporter:
            xiaobingo Xiaobing Zhou
          • Votes:
            0 Vote for this issue
            Watchers:
            18 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development