Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-443

New metrics in namenode to capture lost heartbeats.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Number of lost heartbeats metrics will count the heartbeats that namenode didn't receive in a heartbeat interval. This will help operations team to detect the network issues or bad datanode behaviors early on.

      1. HDFS-443.patch
        6 kB
        Jitendra Nath Pandey
      2. HDFS-443-2.patch
        5 kB
        Jitendra Nath Pandey
      3. HDFS-443-3.patch
        3 kB
        Jitendra Nath Pandey

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #23 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/23/)
        . Add a new metrics numExpiredHeartbeats to the Namenode. Contributed by Jitendra Nath Pandey

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #23 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/23/ ) . Add a new metrics numExpiredHeartbeats to the Namenode. Contributed by Jitendra Nath Pandey
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I have committed this. Thanks, Jitendra!

        Show
        Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, Jitendra!
        Hide
        Jitendra Nath Pandey added a comment -

        [exec] +1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 3 new or modified tests.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

        Show
        Jitendra Nath Pandey added a comment - [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
        Hide
        Suresh Srinivas added a comment -

        +1 for the patch

        Show
        Suresh Srinivas added a comment - +1 for the patch
        Hide
        Jitendra Nath Pandey added a comment -

        1. Modified TestDatanodeReport.java for unit test. Removed the new unit test file.
        2. Name of the metrics changed to ExpiredHeartbeats and variable is numExpiredHeartbeats.

        Show
        Jitendra Nath Pandey added a comment - 1. Modified TestDatanodeReport.java for unit test. Removed the new unit test file. 2. Name of the metrics changed to ExpiredHeartbeats and variable is numExpiredHeartbeats.
        Hide
        Suresh Srinivas added a comment -
        1. Is it a good idea to reuse existing test case TestDatanodeReport.java for testing the newly added metrics? At lease we should set the heartbeat.recheck.interval to 500ms.
        2. It is better to call LostHeartbeats to ExpiredHeartbeats and variable numLostHeartbeats to numExpiredHeartbeats?
        Show
        Suresh Srinivas added a comment - Is it a good idea to reuse existing test case TestDatanodeReport.java for testing the newly added metrics? At lease we should set the heartbeat.recheck.interval to 500ms. It is better to call LostHeartbeats to ExpiredHeartbeats and variable numLostHeartbeats to numExpiredHeartbeats ?
        Hide
        Jitendra Nath Pandey added a comment -

        1. The lost hearbeat is counted only when heartbeat for a datanode expires, i.e. when the namenode assumes that datanode is dead.
        2. In the unit test, thread sleeps for twice the heartbeat expiry interval before checking the metrics.
        3. This patch is on hadoop-hdfs-trunk created after the split.

        Show
        Jitendra Nath Pandey added a comment - 1. The lost hearbeat is counted only when heartbeat for a datanode expires, i.e. when the namenode assumes that datanode is dead. 2. In the unit test, thread sleeps for twice the heartbeat expiry interval before checking the metrics. 3. This patch is on hadoop-hdfs-trunk created after the split.
        Hide
        Jitendra Nath Pandey added a comment -

        Files modified
        1. hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        2. hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/FSNamesystemMetrics.java

        New file added for unit test
        1. test/hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/TestFSMetricLostHeartbeats.java

        Show
        Jitendra Nath Pandey added a comment - Files modified 1. hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 2. hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/FSNamesystemMetrics.java New file added for unit test 1. test/hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/TestFSMetricLostHeartbeats.java

          People

          • Assignee:
            Jitendra Nath Pandey
            Reporter:
            Jitendra Nath Pandey
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development