Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15745

Make DataNodePeerMetrics#LOW_THRESHOLD_MS and MIN_OUTLIER_DETECTION_NODES configurable

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      When i enable DataNodePeerMetrics to find slow slow peer in cluster, i found there is a lot of slow peer but ReportingNodes's averageDelay is very low, and these slow peer node are normal. I think the reason of why generating so many slow peer is that  the value of DataNodePeerMetrics#LOW_THRESHOLD_MS is too small (only 5ms) and it is not configurable. The default value of slow io warning log threshold is 300ms, i.e. DFSConfigKeys.DFS_DATANODE_SLOW_IO_WARNING_THRESHOLD_DEFAULT = 300, so DataNodePeerMetrics#LOW_THRESHOLD_MS should not be less than 300ms, otherwise namenode will get a lot of invalid slow peer information.

      Attachments

        1. HDFS-15745-001.patch
          4 kB
          Haibin Huang
        2. HDFS-15745-002.patch
          5 kB
          Haibin Huang
        3. HDFS-15745-003.patch
          5 kB
          Haibin Huang
        4. HDFS-15745-branch-3.1.001.patch
          5 kB
          Haibin Huang
        5. HDFS-15745-branch-3.2.001.patch
          5 kB
          Haibin Huang
        6. HDFS-15745-branch-3.3.001.patch
          5 kB
          Haibin Huang
        7. image-2020-12-22-17-00-50-796.png
          429 kB
          Haibin Huang

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            huanghaibin Haibin Huang
            huanghaibin Haibin Huang
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 50m
                50m

                Slack

                  Issue deployment