Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5847

Allow for configuring MetricsSystem's use of app ID to namespace all metrics

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.2.1
    • Fix Version/s: 2.1.0
    • Component/s: Spark Core
    • Labels:
      None

      Description

      MetricsSystem currently prepends the app ID to all metrics.

      When reading Spark metrics in Graphite, I've found this to not always be desirable. Graphite is designed to track a mostly-unchanging set of metrics over time; it allocates large zeroed-out files for each metric it sees, and by default rate-limits itself from creating many of these.

      App-ID namespacing means that Graphite is allocating disk-space for every "metric" for every job it sees, when in reality some metrics may correspond to others across jobs (e.g. driver JVM stats).

      Some common Spark usage flows would be better modeled by e.g. namespacing metrics by spark.app.name, so that successive runs of a given job would share "metrics", from a storage perspective as well as allowing for monitoring aspects of a job's performance over time / many runs.

      There's not likely a one-size-fits-all solution here, so I'd propose allowing the metrics config file to allow users to specify whether they'd like metrics namespaced by spark.app.id, spark.app.name, or some other config param.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mgrover Mark Grover
                Reporter:
                rdub Ryan Williams
              • Votes:
                1 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: