Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22052

Incorrect Metric assigned in MetricsReporter.scala

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2.0, 2.3.0
    • Fix Version/s: 2.1.2, 2.2.1, 2.3.0
    • Labels:
      None
    • Environment:

      Spark 2.2
      MetricsReporter.scala

      Description

      The wrong metric is being sent in MetricsReporter.scala

      The current implementation for processingRate-total is assigned the wrong metric:
      Look at the first and second registerGauge. The second one mistakenly uses inputRowsPerSecond instead of processedRowsPerSecond.

      class MetricsReporter(
          stream: StreamExecution,
          override val sourceName: String) extends CodahaleSource with Logging {
      
        override val metricRegistry: MetricRegistry = new MetricRegistry
      
        // Metric names should not have . in them, so that all the metrics of a query are identified
        // together in Ganglia as a single metric group
        registerGauge("inputRate-total", () => stream.lastProgress.inputRowsPerSecond)
        registerGauge("processingRate-total", () => stream.lastProgress.inputRowsPerSecond)
        registerGauge("latency", () => stream.lastProgress.durationMs.get("triggerExecution").longValue())
      
        private def registerGauge[T](name: String, f: () => T)(implicit num: Numeric[T]): Unit = {
          synchronized {
            metricRegistry.register(name, new Gauge[T] {
              override def getValue: T = f()
            })
          }
        }
      }
      

      https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetricsReporter.scala

      After adjusting the line and rebuilding from source I tested the change by checking the csv files produced via the metrics properties file. Previously inputRate-total and processingRate-total were identical due to the same metric being used. After the change the processingRate-total file held the right value.

      Please check the attached file "Processed Rows Per Second".
      After altering the code the correct values are displayed in column B.
      They match the data from the INFO StreamExecution displayed during streaming

        Attachments

          Activity

            People

            • Assignee:
              Taaffy Jason Taaffe
              Reporter:
              Taaffy Jason Taaffe
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: