Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-9704

Write metrics sink plugin for Hadoop/Graphite

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.3-alpha
    • Fix Version/s: 2.5.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Write a metrics sink plugin for Hadoop to send metrics directly to Graphite in additional to the current ganglia and file ones.

      1. 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch
        10 kB
        Alex Newman
      2. HADOOP-9704.patch
        9 kB
        Babak Behzad
      3. HADOOP-9704.patch
        10 kB
        Chu Tong
      4. HADOOP-9704.patch
        10 kB
        Chu Tong

        Issue Links

          Activity

          Hide
          mnikhil Nikhil Mulley added a comment -

          I was just thinking of this, it looks very much desired for Ops. Would be great to see this developed, voting for it.

          Nikhil

          Show
          mnikhil Nikhil Mulley added a comment - I was just thinking of this, it looks very much desired for Ops. Would be great to see this developed, voting for it. Nikhil
          Hide
          posix4e Alex Newman added a comment -
          Show
          posix4e Alex Newman added a comment - I did this recently: https://github.com/posix4e/GraphiteContext v1 metrics https://github.com/posix4e/GraphiteMetrics2 v2 metrics
          Hide
          posix4e Alex Newman added a comment -

          Feel free to assign to me.

          Show
          posix4e Alex Newman added a comment - Feel free to assign to me.
          Hide
          stayhf Chu Tong added a comment -

          Hi Alex,

          Sorry for the delay. Actually, I have done the code and will post it soon. I took a look at your code and they look somewhat similar to mine. Why do not we wait for me to post my version and let's see how it goes from there?

          Show
          stayhf Chu Tong added a comment - Hi Alex, Sorry for the delay. Actually, I have done the code and will post it soon. I took a look at your code and they look somewhat similar to mine. Why do not we wait for me to post my version and let's see how it goes from there?
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12605615/HADOOP-9704.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3139//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3139//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3139//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12605615/HADOOP-9704.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3139//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3139//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3139//console This message is automatically generated.
          Hide
          posix4e Alex Newman added a comment -

          Fixed some nits

          Show
          posix4e Alex Newman added a comment - Fixed some nits
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12605623/0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3140//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3140//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3140//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12605623/0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3140//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3140//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3140//console This message is automatically generated.
          Hide
          stayhf Chu Tong added a comment -

          Fix some fingbugs warnings

          Show
          stayhf Chu Tong added a comment - Fix some fingbugs warnings
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12605820/HADOOP-9704.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3144//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3144//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12605820/HADOOP-9704.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3144//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3144//console This message is automatically generated.
          Hide
          vicaya Luke Lu added a comment -

          Thanks for the patch, Alex and Chu! Both of you will be credited

          Feedback for the latest patch:

          1. There is no need to do context filtering in a sink implemention, as metrics filtering is provided by framework itself for any sinks. Though currently, context filtering is either all or a particular context, though individual record and metrics can handle glob patterns. File another JIRA for support context globbing (simple to implement) if that's what you want.
          2. Use StringBuilder instead of StringBuffer as the buffer is not shared across threads, hence no need to pay the synchronized penalty.
          3. ArgumentCaptor<String> should work with a mockito mock writer (use #getAllValues to access values from multiple invocations). There is no need to do manual capture.
          Show
          vicaya Luke Lu added a comment - Thanks for the patch, Alex and Chu! Both of you will be credited Feedback for the latest patch: There is no need to do context filtering in a sink implemention, as metrics filtering is provided by framework itself for any sinks. Though currently, context filtering is either all or a particular context, though individual record and metrics can handle glob patterns. File another JIRA for support context globbing (simple to implement) if that's what you want. Use StringBuilder instead of StringBuffer as the buffer is not shared across threads, hence no need to pay the synchronized penalty. ArgumentCaptor<String> should work with a mockito mock writer (use #getAllValues to access values from multiple invocations). There is no need to do manual capture.
          Hide
          babakbehzad Babak Behzad added a comment -

          We need this feature in our company and I got the patch and used it. There was a small bug which caused Graphite to ignore a lot of metrics (some of the metrics have space in their name and Graphite do not handle them). I fixed that and I also fixed all the three feedbacks that Luke Lu mentioned in the above comment. Can someone please review this patch before I submit it?

          Show
          babakbehzad Babak Behzad added a comment - We need this feature in our company and I got the patch and used it. There was a small bug which caused Graphite to ignore a lot of metrics (some of the metrics have space in their name and Graphite do not handle them). I fixed that and I also fixed all the three feedbacks that Luke Lu mentioned in the above comment. Can someone please review this patch before I submit it?
          Hide
          posix4e Alex Newman added a comment -

          +1

          Show
          posix4e Alex Newman added a comment - +1
          Hide
          raviprak Ravi Prakash added a comment -

          What would happen if the Graphite server at the end of the socket was slow? BTW, we see this all the time.

          Show
          raviprak Ravi Prakash added a comment - What would happen if the Graphite server at the end of the socket was slow? BTW, we see this all the time.
          Hide
          raviprak Ravi Prakash added a comment -

          Also, would I open a new connection to the same graphite server for each metric?

          Show
          raviprak Ravi Prakash added a comment - Also, would I open a new connection to the same graphite server for each metric?
          Hide
          babakbehzad Babak Behzad added a comment -

          Thanks Ravi. So, in order to enable this one has to put the following three lines in etc/hadoop-metrics2.properties for each of the contexts (namenode, datanode, resource manager, etc):
          namenode.sink.graphite.server_host=localhost
          namenode.sink.graphite.server_port=2003
          namenode.sink.graphite.metrics_prefix=test.namenode

          Looking at the logs and source code, the socket connection is only created once in the init() function for each of these contexts that you enable them, so at most 5 new connections are created. This is exactly the same as other sinks such as FileSink and GangliaSink and I don't think it is an issue.

          Regarding a slow Graphite server, that's a good point. We will be able to test this soon, but again looking at the current Hadoop code, Ganglia's sink is using DatagramSocket instead of Socket. I am not sure if we need to do that later if we see problems with the current code.

          Show
          babakbehzad Babak Behzad added a comment - Thanks Ravi. So, in order to enable this one has to put the following three lines in etc/hadoop-metrics2.properties for each of the contexts (namenode, datanode, resource manager, etc): namenode.sink.graphite.server_host=localhost namenode.sink.graphite.server_port=2003 namenode.sink.graphite.metrics_prefix=test.namenode Looking at the logs and source code, the socket connection is only created once in the init() function for each of these contexts that you enable them, so at most 5 new connections are created. This is exactly the same as other sinks such as FileSink and GangliaSink and I don't think it is an issue. Regarding a slow Graphite server, that's a good point. We will be able to test this soon, but again looking at the current Hadoop code, Ganglia's sink is using DatagramSocket instead of Socket. I am not sure if we need to do that later if we see problems with the current code.
          Hide
          raviprak Ravi Prakash added a comment -

          Thanks Babak! Also, I noticed you are rethrowing a MetricsException whenever you catch an exception. Is this going to bring down the entire daemon? That would be bad. Luke Lu I noticed GraphiteSink30.java also does the same thing. Is that bad too?

          Show
          raviprak Ravi Prakash added a comment - Thanks Babak! Also, I noticed you are rethrowing a MetricsException whenever you catch an exception. Is this going to bring down the entire daemon? That would be bad. Luke Lu I noticed GraphiteSink30.java also does the same thing. Is that bad too?
          Hide
          vicaya Luke Lu added a comment -

          Ravi Prakash: Exceptions in sink impls won't bring down the daemon. The metrics system is designed to be resilient to transient back-end errors. It'll do retries according to config as well.

          Show
          vicaya Luke Lu added a comment - Ravi Prakash : Exceptions in sink impls won't bring down the daemon. The metrics system is designed to be resilient to transient back-end errors. It'll do retries according to config as well.
          Hide
          vicaya Luke Lu added a comment -

          The patch looks good overall. Thanks Babak Behzad! Please remove the tabs in the source and format according to

          https://wiki.apache.org/hadoop/CodeReviewChecklist

          Show
          vicaya Luke Lu added a comment - The patch looks good overall. Thanks Babak Behzad ! Please remove the tabs in the source and format according to https://wiki.apache.org/hadoop/CodeReviewChecklist
          Hide
          babakbehzad Babak Behzad added a comment -

          Thanks both Ravi and Luke. I fixed the tabs and the formatting and attached a new patch. Hopefully it is correct and sorry about that.

          Show
          babakbehzad Babak Behzad added a comment - Thanks both Ravi and Luke. I fixed the tabs and the formatting and attached a new patch. Hopefully it is correct and sorry about that.
          Hide
          raviprak Ravi Prakash added a comment -

          Ok. Patch lgtm. If I hear no objections, I'll check it in EOD

          Show
          raviprak Ravi Prakash added a comment - Ok. Patch lgtm. If I hear no objections, I'll check it in EOD
          Hide
          vicaya Luke Lu added a comment -

          +1 pending jenkins' results.

          Show
          vicaya Luke Lu added a comment - +1 pending jenkins' results.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12647689/HADOOP-9704.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

          org.apache.hadoop.ha.TestZKFailoverControllerStress

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3996//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3996//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647689/HADOOP-9704.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverControllerStress +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3996//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3996//console This message is automatically generated.
          Hide
          raviprak Ravi Prakash added a comment -

          I was able to get TestZKFailoverControllerStress to pass as well fail intermittently with the patch applied. https://issues.apache.org/jira/browse/HADOOP-9555 seems to be tracking the issue. I'll check this in momentarily

          Show
          raviprak Ravi Prakash added a comment - I was able to get TestZKFailoverControllerStress to pass as well fail intermittently with the patch applied. https://issues.apache.org/jira/browse/HADOOP-9555 seems to be tracking the issue. I'll check this in momentarily
          Hide
          raviprak Ravi Prakash added a comment -

          Hmmm... Changed my mind. Will wait up. I'm going to queue in running the test a 100 times on unpatched trunk to see if that sees intermittent failures too. The nature of the tests might mean an increase in EXTRA_TIMEOUT_SECS is in order, but that's probably a seperate JIRA and I want to understand the reasons why this patch might increase that time.

          Show
          raviprak Ravi Prakash added a comment - Hmmm... Changed my mind. Will wait up. I'm going to queue in running the test a 100 times on unpatched trunk to see if that sees intermittent failures too. The nature of the tests might mean an increase in EXTRA_TIMEOUT_SECS is in order, but that's probably a seperate JIRA and I want to understand the reasons why this patch might increase that time.
          Hide
          raviprak Ravi Prakash added a comment -

          Ok! The failure percentage of the test is practically the same on my computer with or without the patch. Committing.

          Show
          raviprak Ravi Prakash added a comment - Ok! The failure percentage of the test is practically the same on my computer with or without the patch. Committing.

            People

            • Assignee:
              Unassigned
              Reporter:
              stayhf Chu Tong
            • Votes:
              4 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development