Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2769

Ganglia Support Broken / Not working

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 1.1.0
    • Fix Version/s: None
    • Component/s: Spark Core
    • Environment:

      Linux Red Hat 6.4 on Spark 1.1.0

      Description

      Hi all,
      I've build spark 1.1.0 with sbt with ganglia enabled and hadoop version 2.4.0

      No issues there, spark works fine on hadoop 2.4.0 and ganglia (GraphiteSink) is installed.

      I've added the following to the metrics.properties

      *.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
      *.sink.graphite.host=HOSTNAME
      *.sink.graphite.port=8649
      *.sink.graphite.period=1
      *.sink.graphite.prefix=aa

      and I get this error message

      14/07/31 05:39:00 WARN graphite.GraphiteReporter: Unable to report to Graphite
      java.net.SocketException: Broken pipe
      at java.net.SocketOutputStream.socketWrite0(Native Method)
      at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
      at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
      at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
      at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
      at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
      at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
      at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
      at java.io.BufferedWriter.flush(BufferedWriter.java:254)
      at com.codahale.metrics.graphite.Graphite.send(Graphite.java:77)
      at com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:254)
      at com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:156)
      at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:107)
      at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:86)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)

      From looking at the code I see the following.

      val graphite: Graphite = new Graphite(new InetSocketAddress(host, port))

      val reporter: GraphiteReporter = GraphiteReporter.forRegistry(registry)
      .convertDurationsTo(TimeUnit.MILLISECONDS)
      .convertRatesTo(TimeUnit.SECONDS)
      .prefixedWith(prefix)
      .build(graphite)
      https://github.com/apache/spark/blob/87bd1f9ef7d547ee54a8a83214b45462e0751efb/core/src/main/scala/org/apache/spark/metrics/sink/GraphiteSink.scala#L69

      Followed by

      override def start()

      { reporter.start(pollPeriod, pollUnit) }

      I noticed that the error fails when we first fry to send a message but nowhere do I see graphite.connect() being called?

      https://github.com/dropwizard/metrics/blob/master/metrics-graphite/src/main/java/com/codahale/metrics/graphite/Graphite.java#L62

      as it seems to fail on the send function..
      https://github.com/dropwizard/metrics/blob/master/metrics-graphite/src/main/java/com/codahale/metrics/graphite/Graphite.java#L77

      a with "this.writer" not initialized the "writer.write" will fail.

      The GraphiteBuilder doesn't call it either when creating the "reporter" object.
      https://github.com/dropwizard/metrics/blob/master/metrics-graphite/src/main/java/com/codahale/metrics/graphite/GraphiteReporter.java#L113

      Maybe I'm looking in the wrong area and I'm passing in the wrong values - but very little logging has me thinking it is a bug.

      EDIT:
      found out where the connect gets called.
      https://github.com/dropwizard/metrics/blob/master/metrics-graphite/src/main/java/com/codahale/metrics/graphite/GraphiteReporter.java#L153

      ad his is called from here

      https://github.com/dropwizard/metrics/blob/99dc540c2cbe6bb3be304e20449fb641c7f5382a/metrics-core/src/main/java/com/codahale/metrics/ScheduledReporter.java#L98

      which is called form here

      https://github.com/dropwizard/metrics/blob/99dc540c2cbe6bb3be304e20449fb641c7f5382a/metrics-core/src/main/java/com/codahale/metrics/ScheduledReporter.java#L98

      but the issue still stands. :/

      Edit 2:

      my ports are open and listening

      [root@rtr-dev-spark4 ~]# lsof -i :8649
      COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
      gmond 32173 ganglia 5u IPv4 3480253 0t0 UDP rtr-dev-spark4.ord2012:8649
      gmond 32173 ganglia 6u IPv4 3480255 0t0 TCP rtr-dev-spark4.ord2012:8649 (LISTEN)
      gmond 32173 ganglia 7u IPv4 3480257 0t0 UDP rtr-dev-spark4.ord2012:55523->rtr-dev-spark4.ord2012:8649

      Regards
      Steve

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              DarkSlice Stephen Walsh
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: