Issue Details (XML | Word | Printable)

Key: HADOOP-5897
Type: Improvement Improvement
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Suresh Srinivas
Reporter: Suresh Srinivas
Votes: 1
Watchers: 5
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Add more Metrics to Namenode to capture heap usage

Created: 22/May/09 05:19 PM   Updated: 08/Jul/09 04:43 PM
Return to search
Component/s: metrics
Affects Version/s: None
Fix Version/s: 0.21.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 5897.rel20.patch 2009-06-26 05:41 PM Suresh Srinivas 20 kB
Text File Licensed for inclusion in ASF works stats.1.patch 2009-06-08 11:56 PM Suresh Srinivas 21 kB
Text File Licensed for inclusion in ASF works stats.patch 2009-05-29 12:23 AM Suresh Srinivas 21 kB
Text File Licensed for inclusion in ASF works stats.patch 2009-05-28 05:45 PM Suresh Srinivas 21 kB

Hadoop Flags: Reviewed
Resolution Date: 15/Jun/09 11:32 PM


 Description  « Hide
Recently we had GC issues, where Namenode used more heap than usual. There was no growth indicated by the data in current Metrics to justify the heap usage. Adding more stats such as:
  • Counter to track blocks that are pending deletion
  • BlocksMap hashmap capacity
  • Counter to track excess number of blocks


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Suresh Srinivas added a comment - 28/May/09 05:45 PM
  1. Moved CorruptBlocks metrics from NameNodeMetrics to FSNamesystemMetrics, where other related metrics are maintained.
  2. Added the following metrics:
    • PendingDeletionBlocks - tracks the number of block pending deletion
    • ExcessBlocks - blocks that have more replicas than the required replication factor defined for the file
    • BlockCapacity - this is the capacity of the hashmap where the blocks are stored
  3. Added testcases for testing FSNamesystemMetrics

The new metrics indicates the data structures that can grow to large size and will help in corelating the heap growth indicated in GC logs.


Suresh Srinivas made changes - 28/May/09 05:45 PM
Field Original Value New Value
Attachment stats.patch [ 12409288 ]
Suresh Srinivas added a comment - 29/May/09 12:23 AM
Uploading a newer patch. Previous patch the unit test does not compile.

Suresh Srinivas made changes - 29/May/09 12:23 AM
Attachment stats.patch [ 12409326 ]
Konstantin Shvachko added a comment - 08/Jun/09 06:41 PM
  1. Why do we need to store capacity and loadFactor in BlocksMap?
    These are HashMap parameters and can be retrieved directly from the HashMap.
    Or do I miss something here?
  2. Should we make pendingReplicationBlocksCount and other block counts volatile?

gary murry added a comment - 08/Jun/09 06:49 PM
I will be out of office June 12.

-Gary


Suresh Srinivas added a comment - 08/Jun/09 10:58 PM

These are HashMap parameters and can be retrieved directly from the HashMap.

Unfortunately HashMap access to capacity and loadFactor is package private. Beats me why that was the choice made. The hashmap implementation is closely tied to capacity being multiple of two and the code added that calculates capacity should be future proof.

I will make the other counters volatile.


Suresh Srinivas added a comment - 08/Jun/09 11:56 PM
Uploaded patch incorporates changes suggested

Suresh Srinivas made changes - 08/Jun/09 11:56 PM
Attachment stats.1.patch [ 12410190 ]
Konstantin Shvachko added a comment - 09/Jun/09 06:20 PM
I see, the methods are not accessible in HashMap, so although ugly there is no other way around but to dub them in our code.
+1

Suresh Srinivas made changes - 10/Jun/09 07:14 PM
Status Open [ 1 ] Patch Available [ 10002 ]
Hadoop QA added a comment - 13/Jun/09 08:13 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12410190/stats.1.patch
against trunk revision 784318.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/497/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/497/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/497/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/497/console

This message is automatically generated.


Suresh Srinivas added a comment - 15/Jun/09 05:02 PM
The failed tests are unrelated to this patch:
  • TestJobHistory - tracked in HADOOP-5920
  • TestHdfsProxy - tracked in HADOOP-5837

Repository Revision Date User Message
ASF #785025 Mon Jun 15 23:24:51 UTC 2009 shv HADOOP-5897. Add name-node metrics to capture java heap usage. Contributed by Suresh Srinivas.
Files Changed
MODIFY /hadoop/core/trunk/src/hdfs/org/apache/hadoop/hdfs/server/namenode/BlockManager.java
MODIFY /hadoop/core/trunk/CHANGES.txt
MODIFY /hadoop/core/trunk/src/hdfs/org/apache/hadoop/hdfs/server/namenode/BlocksMap.java
MODIFY /hadoop/core/trunk/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
ADD /hadoop/core/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
MODIFY /hadoop/core/trunk/src/hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/FSNamesystemMetrics.java
MODIFY /hadoop/core/trunk/src/hdfs/org/apache/hadoop/hdfs/server/namenode/CorruptReplicasMap.java
ADD /hadoop/core/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/metrics

Konstantin Shvachko added a comment - 15/Jun/09 11:32 PM
I just committed this. Thank you Suresh.

Konstantin Shvachko made changes - 15/Jun/09 11:32 PM
Resolution Fixed [ 1 ]
Fix Version/s 0.21.0 [ 12313563 ]
Hadoop Flags [Reviewed]
Status Patch Available [ 10002 ] Resolved [ 5 ]
Tsz Wo (Nicholas), SZE made changes - 16/Jun/09 11:30 PM
Component/s metrics [ 12310971 ]
Component/s dfs [ 12310710 ]
Hudson added a comment - 17/Jun/09 07:19 PM

Suresh Srinivas added a comment - 26/Jun/09 05:41 PM
Attaching a back porting patch as this change is required on release 20.

Suresh Srinivas made changes - 26/Jun/09 05:41 PM
Attachment 5897.rel20.patch [ 12411942 ]
Suresh Srinivas added a comment - 26/Jun/09 05:42 PM
Patch passes all the unit tests.

Repository Revision Date User Message
ASF #788899 Fri Jun 26 22:50:08 UTC 2009 shv HADOOP-5897. Merge -r 785024:785025 from trunk to branch 0.20.
Files Changed
MODIFY /hadoop/common/branches/branch-0.20
MODIFY /hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/CorruptReplicasMap.java
MODIFY /hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/BlocksMap.java
MODIFY /hadoop/common/branches/branch-0.20/CHANGES.txt
MODIFY /hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/FSNamesystemMetrics.java
ADD /hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
MODIFY /hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
ADD /hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/hdfs/server/namenode/metrics

Repository Revision Date User Message
ASF #788900 Fri Jun 26 22:52:51 UTC 2009 shv HADOOP-5897. Promote new name-node metrics to branch 0.20.
Files Changed
MODIFY /hadoop/common/trunk/CHANGES.txt

Konstantin Shvachko added a comment - 26/Jun/09 10:54 PM
+1
Committed to branch 0.20.

Hudson added a comment - 27/Jun/09 11:09 AM
Integrated in Hadoop-Common-trunk #9 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/9/)
. Promote new name-node metrics to branch 0.20.

Owen O'Malley made changes - 08/Jul/09 04:43 PM
Component/s dfs [ 12310710 ]