Issue Details (XML | Word | Printable)

Key: HADOOP-3058
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Lohit Vijayarenu
Reporter: Marco Nicosia
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Hadoop DFS to report more replication metrics

Created: 20/Mar/08 05:44 AM   Updated: 08/Jul/09 04:43 PM
Component/s: metrics
Affects Version/s: None
Fix Version/s: 0.18.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-3058-2.patch 2008-05-03 07:29 AM Lohit Vijayarenu 24 kB
Text File Licensed for inclusion in ASF works HADOOP-3058-3.patch 2008-05-07 09:09 AM Lohit Vijayarenu 22 kB
Text File Licensed for inclusion in ASF works HADOOP-3058.patch 2008-05-02 07:03 AM Lohit Vijayarenu 24 kB
Issue Links:
Reference
 

Hadoop Flags: Reviewed
Release Note: Added FSNamesystem status metrics.
Resolution Date: 07/May/08 08:38 PM


 Description  « Hide
Currently, the namenode and each datanode reports 'blocksreplicatedpersec.'

We'd like to be able to graph pending replications, vs number of under replicated blocks, vs. replications per second, so that we can get a better idea of the replication activity within the DFS.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Marco Nicosia added a comment - 14/Apr/08 09:56 PM
Now that this has been scheduled for a release, I realize that there were a few metrics that I should also have asked for. Hopefully these are not significant changes in scope.

In addition to the above, I forgot to ask for the most basic stats. It's very important that the NN send metrics on the number of files and blocks in the system, so that we can trend these over time. Including the number of directories would be a bonus.


Lohit Vijayarenu added a comment - 02/May/08 07:03 AM
Attaching a patch which addes FSNamesystem status metrics.
Since these are not timevarying int or timevaying rate. I use MetricsLongValue similar to MetricsIntValue using their set and get methods. The metrics are recorded as FSNamesystem record and list these
  • FilesTotal
  • BlocksTotal
  • CapacityTotal
  • CapacityUsed
  • CapacityRemaining
  • TotalLoad
  • PendingReplicationBlocks
  • UnderReplicatedBlocks
  • ScheduledReplicationBlocks

Tested this using FileContext to log these while FSNamesystem was reporting and I could see the values being updated


Raghu Angadi added a comment - 02/May/08 10:14 PM
+1 Patch looks fine. I have no idea if this has real penalty on namenode performance. Mostly it does not affect. Multiple simple integer and long operation in the critical paths are replaced by methods that synchronize on a different object.

Hadoop QA added a comment - 03/May/08 01:13 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381301/HADOOP-3058.patch
against trunk revision 645773.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests -1. The patch failed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/console

This message is automatically generated.


Lohit Vijayarenu added a comment - 03/May/08 07:29 AM
FSNameSystem has 2 constructors and registerMBean was being called only one place. This was causing NPE, fixed it in the updated patch

Hadoop QA added a comment - 04/May/08 04:06 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381365/HADOOP-3058-2.patch
against trunk revision 645773.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/console

This message is automatically generated.


Lohit Vijayarenu added a comment - 05/May/08 06:50 PM
Yes, I also agree that these adds more operations. The metrics frequently updated are
  • filesTotal which are updated whenever we add/delete new files.
  • blocksTotal which are updated whenever we add/delete new blocks
    I guess it should be fine in the above case.

Few metrics are replaced by updating a global variables regarding the DFS capacity. These were updated on each heart beat once, which should be fine.
Another set of operations are done by ReplicationMonitor in ComputeDatanodeWork(), which should also be fine.


Raghu Angadi added a comment - 05/May/08 07:21 PM

I would think the heartBeat variables will be updated thousands of times every second. I think better approach would be to update the heavy weight metric variables only inside FSNamesystemMetrics.doUpdates() that gets called every 5 seconds or so. This way these stats become pretty much free and also sets good precedence for new metrics.

Lohit Vijayarenu added a comment - 07/May/08 09:09 AM
Thanks Rahgu, I have attached an updated patch in which we maintain local counters in FSNamesystem as earlier. FSNamesystemMetrics object is updated only during invoking of doUpdates()

Hadoop QA added a comment - 07/May/08 07:39 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381578/HADOOP-3058-3.patch
against trunk revision 654128.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/console

This message is automatically generated.


Raghu Angadi added a comment - 07/May/08 08:31 PM
I just committed this. Thanks Lohit!

Hudson added a comment - 08/May/08 12:23 PM