Hadoop Common
  1. Hadoop Common
  2. HADOOP-3058

Hadoop DFS to report more replication metrics

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.18.0
    • Component/s: metrics
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Added FSNamesystem status metrics.

      Description

      Currently, the namenode and each datanode reports 'blocksreplicatedpersec.'

      We'd like to be able to graph pending replications, vs number of under replicated blocks, vs. replications per second, so that we can get a better idea of the replication activity within the DFS.

      1. HADOOP-3058-3.patch
        22 kB
        Lohit Vijayarenu
      2. HADOOP-3058-2.patch
        24 kB
        Lohit Vijayarenu
      3. HADOOP-3058.patch
        24 kB
        Lohit Vijayarenu

        Issue Links

          Activity

          Marco Nicosia created issue -
          Robert Chansler made changes -
          Field Original Value New Value
          Component/s dfs [ 12310710 ]
          Hide
          Marco Nicosia added a comment -

          Now that this has been scheduled for a release, I realize that there were a few metrics that I should also have asked for. Hopefully these are not significant changes in scope.

          In addition to the above, I forgot to ask for the most basic stats. It's very important that the NN send metrics on the number of files and blocks in the system, so that we can trend these over time. Including the number of directories would be a bonus.

          Show
          Marco Nicosia added a comment - Now that this has been scheduled for a release, I realize that there were a few metrics that I should also have asked for. Hopefully these are not significant changes in scope. In addition to the above, I forgot to ask for the most basic stats. It's very important that the NN send metrics on the number of files and blocks in the system, so that we can trend these over time. Including the number of directories would be a bonus.
          Lohit Vijayarenu made changes -
          Assignee lohit vijayarenu [ lohit ]
          Robert Chansler made changes -
          Link This issue is related to HADOOP-3323 [ HADOOP-3323 ]
          Hide
          Lohit Vijayarenu added a comment -

          Attaching a patch which addes FSNamesystem status metrics.
          Since these are not timevarying int or timevaying rate. I use MetricsLongValue similar to MetricsIntValue using their set and get methods. The metrics are recorded as FSNamesystem record and list these

          • FilesTotal
          • BlocksTotal
          • CapacityTotal
          • CapacityUsed
          • CapacityRemaining
          • TotalLoad
          • PendingReplicationBlocks
          • UnderReplicatedBlocks
          • ScheduledReplicationBlocks

          Tested this using FileContext to log these while FSNamesystem was reporting and I could see the values being updated

          Show
          Lohit Vijayarenu added a comment - Attaching a patch which addes FSNamesystem status metrics. Since these are not timevarying int or timevaying rate. I use MetricsLongValue similar to MetricsIntValue using their set and get methods. The metrics are recorded as FSNamesystem record and list these FilesTotal BlocksTotal CapacityTotal CapacityUsed CapacityRemaining TotalLoad PendingReplicationBlocks UnderReplicatedBlocks ScheduledReplicationBlocks Tested this using FileContext to log these while FSNamesystem was reporting and I could see the values being updated
          Lohit Vijayarenu made changes -
          Attachment HADOOP-3058.patch [ 12381301 ]
          Lohit Vijayarenu made changes -
          Release Note This JIRA adds new FSNamesystem status metrics.
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Raghu Angadi added a comment -

          +1 Patch looks fine. I have no idea if this has real penalty on namenode performance. Mostly it does not affect. Multiple simple integer and long operation in the critical paths are replaced by methods that synchronize on a different object.

          Show
          Raghu Angadi added a comment - +1 Patch looks fine. I have no idea if this has real penalty on namenode performance. Mostly it does not affect. Multiple simple integer and long operation in the critical paths are replaced by methods that synchronize on a different object.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381301/HADOOP-3058.patch
          against trunk revision 645773.

          @author +1. The patch does not contain any @author tags.

          tests included -1. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests -1. The patch failed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381301/HADOOP-3058.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/console This message is automatically generated.
          Lohit Vijayarenu made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Lohit Vijayarenu added a comment -

          FSNameSystem has 2 constructors and registerMBean was being called only one place. This was causing NPE, fixed it in the updated patch

          Show
          Lohit Vijayarenu added a comment - FSNameSystem has 2 constructors and registerMBean was being called only one place. This was causing NPE, fixed it in the updated patch
          Lohit Vijayarenu made changes -
          Attachment HADOOP-3058-2.patch [ 12381365 ]
          Lohit Vijayarenu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381365/HADOOP-3058-2.patch
          against trunk revision 645773.

          @author +1. The patch does not contain any @author tags.

          tests included -1. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381365/HADOOP-3058-2.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. release audit +1. The applied patch does not generate any new release audit warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/console This message is automatically generated.
          Hide
          Lohit Vijayarenu added a comment -

          Yes, I also agree that these adds more operations. The metrics frequently updated are

          • filesTotal which are updated whenever we add/delete new files.
          • blocksTotal which are updated whenever we add/delete new blocks
            I guess it should be fine in the above case.

          Few metrics are replaced by updating a global variables regarding the DFS capacity. These were updated on each heart beat once, which should be fine.
          Another set of operations are done by ReplicationMonitor in ComputeDatanodeWork(), which should also be fine.

          Show
          Lohit Vijayarenu added a comment - Yes, I also agree that these adds more operations. The metrics frequently updated are filesTotal which are updated whenever we add/delete new files. blocksTotal which are updated whenever we add/delete new blocks I guess it should be fine in the above case. Few metrics are replaced by updating a global variables regarding the DFS capacity. These were updated on each heart beat once, which should be fine. Another set of operations are done by ReplicationMonitor in ComputeDatanodeWork(), which should also be fine.
          Hide
          Raghu Angadi added a comment -


          I would think the heartBeat variables will be updated thousands of times every second. I think better approach would be to update the heavy weight metric variables only inside FSNamesystemMetrics.doUpdates() that gets called every 5 seconds or so. This way these stats become pretty much free and also sets good precedence for new metrics.

          Show
          Raghu Angadi added a comment - I would think the heartBeat variables will be updated thousands of times every second. I think better approach would be to update the heavy weight metric variables only inside FSNamesystemMetrics.doUpdates() that gets called every 5 seconds or so. This way these stats become pretty much free and also sets good precedence for new metrics.
          Hide
          Lohit Vijayarenu added a comment -

          Thanks Rahgu, I have attached an updated patch in which we maintain local counters in FSNamesystem as earlier. FSNamesystemMetrics object is updated only during invoking of doUpdates()

          Show
          Lohit Vijayarenu added a comment - Thanks Rahgu, I have attached an updated patch in which we maintain local counters in FSNamesystem as earlier. FSNamesystemMetrics object is updated only during invoking of doUpdates()
          Lohit Vijayarenu made changes -
          Attachment HADOOP-3058-3.patch [ 12381578 ]
          Lohit Vijayarenu made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Lohit Vijayarenu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381578/HADOOP-3058-3.patch
          against trunk revision 654128.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381578/HADOOP-3058-3.patch against trunk revision 654128. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/console This message is automatically generated.
          Hide
          Raghu Angadi added a comment -

          I just committed this. Thanks Lohit!

          Show
          Raghu Angadi added a comment - I just committed this. Thanks Lohit!
          Raghu Angadi made changes -
          Hadoop Flags [Reviewed]
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.18.0 [ 12312972 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #484 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/484/ )
          Robert Chansler made changes -
          Release Note This JIRA adds new FSNamesystem status metrics. Added FSNamesystem status metrics.
          Nigel Daley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Owen O'Malley made changes -
          Component/s dfs [ 12310710 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          5d 2h 3m 2 Lohit Vijayarenu 07/May/08 10:10
          Open Open Patch Available Patch Available
          43d 1h 23m 3 Lohit Vijayarenu 07/May/08 10:10
          Patch Available Patch Available Resolved Resolved
          11h 27m 1 Raghu Angadi 07/May/08 21:38
          Resolved Resolved Closed Closed
          106d 23h 12m 1 Nigel Daley 22/Aug/08 20:50

            People

            • Assignee:
              Lohit Vijayarenu
              Reporter:
              Marco Nicosia
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development