Hadoop Common
  1. Hadoop Common
  2. HADOOP-5897

Add more Metrics to Namenode to capture heap usage

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: metrics
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Recently we had GC issues, where Namenode used more heap than usual. There was no growth indicated by the data in current Metrics to justify the heap usage. Adding more stats such as:

      • Counter to track blocks that are pending deletion
      • BlocksMap hashmap capacity
      • Counter to track excess number of blocks
      1. stats.patch
        21 kB
        Suresh Srinivas
      2. stats.patch
        21 kB
        Suresh Srinivas
      3. stats.1.patch
        21 kB
        Suresh Srinivas
      4. 5897.rel20.patch
        20 kB
        Suresh Srinivas

        Activity

        Hide
        Suresh Srinivas added a comment -
        1. Moved CorruptBlocks metrics from NameNodeMetrics to FSNamesystemMetrics, where other related metrics are maintained.
        2. Added the following metrics:
          • PendingDeletionBlocks - tracks the number of block pending deletion
          • ExcessBlocks - blocks that have more replicas than the required replication factor defined for the file
          • BlockCapacity - this is the capacity of the hashmap where the blocks are stored
        3. Added testcases for testing FSNamesystemMetrics

        The new metrics indicates the data structures that can grow to large size and will help in corelating the heap growth indicated in GC logs.

        Show
        Suresh Srinivas added a comment - Moved CorruptBlocks metrics from NameNodeMetrics to FSNamesystemMetrics, where other related metrics are maintained. Added the following metrics: PendingDeletionBlocks - tracks the number of block pending deletion ExcessBlocks - blocks that have more replicas than the required replication factor defined for the file BlockCapacity - this is the capacity of the hashmap where the blocks are stored Added testcases for testing FSNamesystemMetrics The new metrics indicates the data structures that can grow to large size and will help in corelating the heap growth indicated in GC logs.
        Hide
        Suresh Srinivas added a comment -

        Uploading a newer patch. Previous patch the unit test does not compile.

        Show
        Suresh Srinivas added a comment - Uploading a newer patch. Previous patch the unit test does not compile.
        Hide
        Konstantin Shvachko added a comment -
        1. Why do we need to store capacity and loadFactor in BlocksMap?
          These are HashMap parameters and can be retrieved directly from the HashMap.
          Or do I miss something here?
        2. Should we make pendingReplicationBlocksCount and other block counts volatile?
        Show
        Konstantin Shvachko added a comment - Why do we need to store capacity and loadFactor in BlocksMap ? These are HashMap parameters and can be retrieved directly from the HashMap . Or do I miss something here? Should we make pendingReplicationBlocksCount and other block counts volatile?
        Hide
        gary murry added a comment -

        I will be out of office June 12.

        -Gary

        Show
        gary murry added a comment - I will be out of office June 12. -Gary
        Hide
        Suresh Srinivas added a comment -

        These are HashMap parameters and can be retrieved directly from the HashMap.

        Unfortunately HashMap access to capacity and loadFactor is package private. Beats me why that was the choice made. The hashmap implementation is closely tied to capacity being multiple of two and the code added that calculates capacity should be future proof.

        I will make the other counters volatile.

        Show
        Suresh Srinivas added a comment - These are HashMap parameters and can be retrieved directly from the HashMap. Unfortunately HashMap access to capacity and loadFactor is package private. Beats me why that was the choice made. The hashmap implementation is closely tied to capacity being multiple of two and the code added that calculates capacity should be future proof. I will make the other counters volatile.
        Hide
        Suresh Srinivas added a comment -

        Uploaded patch incorporates changes suggested

        Show
        Suresh Srinivas added a comment - Uploaded patch incorporates changes suggested
        Hide
        Konstantin Shvachko added a comment -

        I see, the methods are not accessible in HashMap, so although ugly there is no other way around but to dub them in our code.
        +1

        Show
        Konstantin Shvachko added a comment - I see, the methods are not accessible in HashMap, so although ugly there is no other way around but to dub them in our code. +1
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12410190/stats.1.patch
        against trunk revision 784318.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/497/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/497/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/497/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/497/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12410190/stats.1.patch against trunk revision 784318. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/497/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/497/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/497/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/497/console This message is automatically generated.
        Hide
        Suresh Srinivas added a comment -

        The failed tests are unrelated to this patch:

        Show
        Suresh Srinivas added a comment - The failed tests are unrelated to this patch: TestJobHistory - tracked in HADOOP-5920 TestHdfsProxy - tracked in HADOOP-5837
        Hide
        Konstantin Shvachko added a comment -

        I just committed this. Thank you Suresh.

        Show
        Konstantin Shvachko added a comment - I just committed this. Thank you Suresh.
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #869 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/869/ )
        Hide
        Suresh Srinivas added a comment -

        Attaching a back porting patch as this change is required on release 20.

        Show
        Suresh Srinivas added a comment - Attaching a back porting patch as this change is required on release 20.
        Hide
        Suresh Srinivas added a comment -

        Patch passes all the unit tests.

        Show
        Suresh Srinivas added a comment - Patch passes all the unit tests.
        Hide
        Konstantin Shvachko added a comment -

        +1
        Committed to branch 0.20.

        Show
        Konstantin Shvachko added a comment - +1 Committed to branch 0.20.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk #9 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/9/)
        . Promote new name-node metrics to branch 0.20.

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk #9 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/9/ ) . Promote new name-node metrics to branch 0.20.

          People

          • Assignee:
            Suresh Srinivas
            Reporter:
            Suresh Srinivas
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development