Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-443

New metrics in namenode to capture lost heartbeats.

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Number of lost heartbeats metrics will count the heartbeats that namenode didn't receive in a heartbeat interval. This will help operations team to detect the network issues or bad datanode behaviors early on.

      1. HDFS-443.patch
        6 kB
        Jitendra Nath Pandey
      2. HDFS-443-2.patch
        5 kB
        Jitendra Nath Pandey
      3. HDFS-443-3.patch
        3 kB
        Jitendra Nath Pandey

        Activity

        Hide
        jnp Jitendra Nath Pandey added a comment -

        Files modified
        1. hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        2. hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/FSNamesystemMetrics.java

        New file added for unit test
        1. test/hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/TestFSMetricLostHeartbeats.java

        Show
        jnp Jitendra Nath Pandey added a comment - Files modified 1. hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 2. hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/FSNamesystemMetrics.java New file added for unit test 1. test/hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/TestFSMetricLostHeartbeats.java
        Hide
        jnp Jitendra Nath Pandey added a comment -

        1. The lost hearbeat is counted only when heartbeat for a datanode expires, i.e. when the namenode assumes that datanode is dead.
        2. In the unit test, thread sleeps for twice the heartbeat expiry interval before checking the metrics.
        3. This patch is on hadoop-hdfs-trunk created after the split.

        Show
        jnp Jitendra Nath Pandey added a comment - 1. The lost hearbeat is counted only when heartbeat for a datanode expires, i.e. when the namenode assumes that datanode is dead. 2. In the unit test, thread sleeps for twice the heartbeat expiry interval before checking the metrics. 3. This patch is on hadoop-hdfs-trunk created after the split.
        Hide
        sureshms Suresh Srinivas added a comment -
        1. Is it a good idea to reuse existing test case TestDatanodeReport.java for testing the newly added metrics? At lease we should set the heartbeat.recheck.interval to 500ms.
        2. It is better to call LostHeartbeats to ExpiredHeartbeats and variable numLostHeartbeats to numExpiredHeartbeats?
        Show
        sureshms Suresh Srinivas added a comment - Is it a good idea to reuse existing test case TestDatanodeReport.java for testing the newly added metrics? At lease we should set the heartbeat.recheck.interval to 500ms. It is better to call LostHeartbeats to ExpiredHeartbeats and variable numLostHeartbeats to numExpiredHeartbeats ?
        Hide
        jnp Jitendra Nath Pandey added a comment -

        1. Modified TestDatanodeReport.java for unit test. Removed the new unit test file.
        2. Name of the metrics changed to ExpiredHeartbeats and variable is numExpiredHeartbeats.

        Show
        jnp Jitendra Nath Pandey added a comment - 1. Modified TestDatanodeReport.java for unit test. Removed the new unit test file. 2. Name of the metrics changed to ExpiredHeartbeats and variable is numExpiredHeartbeats.
        Hide
        sureshms Suresh Srinivas added a comment -

        +1 for the patch

        Show
        sureshms Suresh Srinivas added a comment - +1 for the patch
        Hide
        jnp Jitendra Nath Pandey added a comment -

        [exec] +1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 3 new or modified tests.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

        Show
        jnp Jitendra Nath Pandey added a comment - [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
        Hide
        szetszwo Tsz Wo Nicholas Sze added a comment -

        I have committed this. Thanks, Jitendra!

        Show
        szetszwo Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, Jitendra!
        Hide
        hudson Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #23 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/23/)
        . Add a new metrics numExpiredHeartbeats to the Namenode. Contributed by Jitendra Nath Pandey

        Show
        hudson Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #23 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/23/ ) . Add a new metrics numExpiredHeartbeats to the Namenode. Contributed by Jitendra Nath Pandey

          People

          • Assignee:
            jnp Jitendra Nath Pandey
            Reporter:
            jnp Jitendra Nath Pandey
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development