Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14783

expired SlowPeersReport will keep staying on namenode's jmx

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      SlowPeersReport in namenode's jmx can tell us which datanode is slow node, and it is calculated by the average duration between two datanode sending packet. Here is an example, if dn1 send packet to dn2 tasks too long in average (over the upperLimitLatency), you will see SlowPeersReport in namenode's jmx like this :

      "SlowPeersReport" :[{"SlowNode":"dn2","ReportingNodes":["dn1"]}]
      

      However, if dn1 just sending some packet to dn2 with a slow speed in the beginning , then didn't send any packet to dn2 for a long time, which will keep the abovementioned SlowPeersReport staying on namenode's jmx . I think this SlowPeersReport might be an expired message, because the network between dn1 and dn2 may have returned to normal, but the SlowPeersReport is still on nameonode's jmx until next time dn1 sending packet to dn2. So I use a timestamp to record when an org.apache.hadoop.metrics2.util.SampleStat is created, and calculate the average duration with the valid SampleStat , which is judged by it  timestamp.

        Attachments

        1. HDFS-14783-001.patch
          8 kB
          Haibin Huang
        2. HDFS-14783
          5 kB
          Haibin Huang

          Activity

            People

            • Assignee:
              huanghaibin Haibin Huang
              Reporter:
              huanghaibin Haibin Huang
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: