Hadoop Common
  1. Hadoop Common
  2. HADOOP-3058

Hadoop DFS to report more replication metrics

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.18.0
    • Component/s: metrics
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Added FSNamesystem status metrics.

      Description

      Currently, the namenode and each datanode reports 'blocksreplicatedpersec.'

      We'd like to be able to graph pending replications, vs number of under replicated blocks, vs. replications per second, so that we can get a better idea of the replication activity within the DFS.

      1. HADOOP-3058.patch
        24 kB
        Lohit Vijayarenu
      2. HADOOP-3058-2.patch
        24 kB
        Lohit Vijayarenu
      3. HADOOP-3058-3.patch
        22 kB
        Lohit Vijayarenu

        Issue Links

          Activity

          Hide
          Marco Nicosia added a comment -

          Now that this has been scheduled for a release, I realize that there were a few metrics that I should also have asked for. Hopefully these are not significant changes in scope.

          In addition to the above, I forgot to ask for the most basic stats. It's very important that the NN send metrics on the number of files and blocks in the system, so that we can trend these over time. Including the number of directories would be a bonus.

          Show
          Marco Nicosia added a comment - Now that this has been scheduled for a release, I realize that there were a few metrics that I should also have asked for. Hopefully these are not significant changes in scope. In addition to the above, I forgot to ask for the most basic stats. It's very important that the NN send metrics on the number of files and blocks in the system, so that we can trend these over time. Including the number of directories would be a bonus.
          Hide
          Lohit Vijayarenu added a comment -

          Attaching a patch which addes FSNamesystem status metrics.
          Since these are not timevarying int or timevaying rate. I use MetricsLongValue similar to MetricsIntValue using their set and get methods. The metrics are recorded as FSNamesystem record and list these

          • FilesTotal
          • BlocksTotal
          • CapacityTotal
          • CapacityUsed
          • CapacityRemaining
          • TotalLoad
          • PendingReplicationBlocks
          • UnderReplicatedBlocks
          • ScheduledReplicationBlocks

          Tested this using FileContext to log these while FSNamesystem was reporting and I could see the values being updated

          Show
          Lohit Vijayarenu added a comment - Attaching a patch which addes FSNamesystem status metrics. Since these are not timevarying int or timevaying rate. I use MetricsLongValue similar to MetricsIntValue using their set and get methods. The metrics are recorded as FSNamesystem record and list these FilesTotal BlocksTotal CapacityTotal CapacityUsed CapacityRemaining TotalLoad PendingReplicationBlocks UnderReplicatedBlocks ScheduledReplicationBlocks Tested this using FileContext to log these while FSNamesystem was reporting and I could see the values being updated
          Hide
          Raghu Angadi added a comment -

          +1 Patch looks fine. I have no idea if this has real penalty on namenode performance. Mostly it does not affect. Multiple simple integer and long operation in the critical paths are replaced by methods that synchronize on a different object.

          Show
          Raghu Angadi added a comment - +1 Patch looks fine. I have no idea if this has real penalty on namenode performance. Mostly it does not affect. Multiple simple integer and long operation in the critical paths are replaced by methods that synchronize on a different object.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381301/HADOOP-3058.patch
          against trunk revision 645773.

          @author +1. The patch does not contain any @author tags.

          tests included -1. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests -1. The patch failed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381301/HADOOP-3058.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2365/console This message is automatically generated.
          Hide
          Lohit Vijayarenu added a comment -

          FSNameSystem has 2 constructors and registerMBean was being called only one place. This was causing NPE, fixed it in the updated patch

          Show
          Lohit Vijayarenu added a comment - FSNameSystem has 2 constructors and registerMBean was being called only one place. This was causing NPE, fixed it in the updated patch
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381365/HADOOP-3058-2.patch
          against trunk revision 645773.

          @author +1. The patch does not contain any @author tags.

          tests included -1. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381365/HADOOP-3058-2.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. release audit +1. The applied patch does not generate any new release audit warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2387/console This message is automatically generated.
          Hide
          Lohit Vijayarenu added a comment -

          Yes, I also agree that these adds more operations. The metrics frequently updated are

          • filesTotal which are updated whenever we add/delete new files.
          • blocksTotal which are updated whenever we add/delete new blocks
            I guess it should be fine in the above case.

          Few metrics are replaced by updating a global variables regarding the DFS capacity. These were updated on each heart beat once, which should be fine.
          Another set of operations are done by ReplicationMonitor in ComputeDatanodeWork(), which should also be fine.

          Show
          Lohit Vijayarenu added a comment - Yes, I also agree that these adds more operations. The metrics frequently updated are filesTotal which are updated whenever we add/delete new files. blocksTotal which are updated whenever we add/delete new blocks I guess it should be fine in the above case. Few metrics are replaced by updating a global variables regarding the DFS capacity. These were updated on each heart beat once, which should be fine. Another set of operations are done by ReplicationMonitor in ComputeDatanodeWork(), which should also be fine.
          Hide
          Raghu Angadi added a comment -


          I would think the heartBeat variables will be updated thousands of times every second. I think better approach would be to update the heavy weight metric variables only inside FSNamesystemMetrics.doUpdates() that gets called every 5 seconds or so. This way these stats become pretty much free and also sets good precedence for new metrics.

          Show
          Raghu Angadi added a comment - I would think the heartBeat variables will be updated thousands of times every second. I think better approach would be to update the heavy weight metric variables only inside FSNamesystemMetrics.doUpdates() that gets called every 5 seconds or so. This way these stats become pretty much free and also sets good precedence for new metrics.
          Hide
          Lohit Vijayarenu added a comment -

          Thanks Rahgu, I have attached an updated patch in which we maintain local counters in FSNamesystem as earlier. FSNamesystemMetrics object is updated only during invoking of doUpdates()

          Show
          Lohit Vijayarenu added a comment - Thanks Rahgu, I have attached an updated patch in which we maintain local counters in FSNamesystem as earlier. FSNamesystemMetrics object is updated only during invoking of doUpdates()
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381578/HADOOP-3058-3.patch
          against trunk revision 654128.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381578/HADOOP-3058-3.patch against trunk revision 654128. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2418/console This message is automatically generated.
          Hide
          Raghu Angadi added a comment -

          I just committed this. Thanks Lohit!

          Show
          Raghu Angadi added a comment - I just committed this. Thanks Lohit!
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #484 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/484/ )

            People

            • Assignee:
              Lohit Vijayarenu
              Reporter:
              Marco Nicosia
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development