Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-11400

GraphiteSink does not reconnect to Graphite after 'broken pipe'

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.5.1, 2.6.0
    • Fix Version/s: 2.7.0
    • Component/s: metrics
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I see that after network error GraphiteSink does not reconnects to Graphite server and in effect metrics are not sent.

      Here is stacktrace I see (this is from nodemanager):

      2014-12-11 16:39:21,655 ERROR org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Got sink exception, retry in 4806ms
      org.apache.hadoop.metrics2.MetricsException: Error flushing metrics
      at org.apache.hadoop.metrics2.sink.GraphiteSinkFixed.flush(GraphiteSinkFixed.java:120)
      at org.apache.hadoop.metrics2.impl.MetricsSinkAdapter.consume(MetricsSinkAdapter.java:184)
      at org.apache.hadoop.metrics2.impl.MetricsSinkAdapter.consume(MetricsSinkAdapter.java:43)
      at org.apache.hadoop.metrics2.impl.SinkQueue.consumeAll(SinkQueue.java:87)
      at org.apache.hadoop.metrics2.impl.MetricsSinkAdapter.publishMetricsFromQueue(MetricsSinkAdapter.java:129)
      at org.apache.hadoop.metrics2.impl.MetricsSinkAdapter$1.run(MetricsSinkAdapter.java:88)
      Caused by: java.net.SocketException: Broken pipe
      at java.net.SocketOutputStream.socketWrite0(Native Method)
      at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
      at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
      at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
      at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
      at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
      at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
      at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
      at org.apache.hadoop.metrics2.sink.GraphiteSinkFixed.flush(GraphiteSinkFixed.java:118)
      ... 5 more
      2014-12-11 16:39:26,463 ERROR org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Got sink exception and over retry limit, suppressing further error messages
      org.apache.hadoop.metrics2.MetricsException: Error flushing metrics
      at org.apache.hadoop.metrics2.sink.GraphiteSinkFixed.flush(GraphiteSinkFixed.java:120)
      at org.apache.hadoop.metrics2.impl.MetricsSinkAdapter.consume(MetricsSinkAdapter.java:184)
      at org.apache.hadoop.metrics2.impl.MetricsSinkAdapter.consume(MetricsSinkAdapter.java:43)
      at org.apache.hadoop.metrics2.impl.SinkQueue.consumeAll(SinkQueue.java:87)
      at org.apache.hadoop.metrics2.impl.MetricsSinkAdapter.publishMetricsFromQueue(MetricsSinkAdapter.java:129)
      at org.apache.hadoop.metrics2.impl.MetricsSinkAdapter$1.run(MetricsSinkAdapter.java:88)
      Caused by: java.net.SocketException: Broken pipe
      at java.net.SocketOutputStream.socketWrite0(Native Method)
      at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
      at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
      at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
      at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
      at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
      at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
      at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
      at org.apache.hadoop.metrics2.sink.GraphiteSinkFixed.flush(GraphiteSinkFixed.java:118)
      ... 5 more

      GraphiteSinkFixed.java is simply GraphiteSink.java from Hadoop 2.6.0 (with fixed https://issues.apache.org/jira/browse/HADOOP-11182) because I cannot simply upgrade Hadoop (I am using CDH5).

      I see that GraphiteSink is using OutputStreamWriter which is created only in init method (which is probably called only once per application runtime) and there is no reconnection logic.

        Attachments

        1. HADOOP-11400.patch
          15 kB
          Kamil Gorlo

          Activity

            People

            • Assignee:
              kgs Kamil Gorlo
              Reporter:
              kgs Kamil Gorlo
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: