Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5847

Allow for configuring MetricsSystem's use of app ID to namespace all metrics

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.2.1
    • 2.1.0
    • Spark Core
    • None

    Description

      MetricsSystem currently prepends the app ID to all metrics.

      When reading Spark metrics in Graphite, I've found this to not always be desirable. Graphite is designed to track a mostly-unchanging set of metrics over time; it allocates large zeroed-out files for each metric it sees, and by default rate-limits itself from creating many of these.

      App-ID namespacing means that Graphite is allocating disk-space for every "metric" for every job it sees, when in reality some metrics may correspond to others across jobs (e.g. driver JVM stats).

      Some common Spark usage flows would be better modeled by e.g. namespacing metrics by spark.app.name, so that successive runs of a given job would share "metrics", from a storage perspective as well as allowing for monitoring aspects of a job's performance over time / many runs.

      There's not likely a one-size-fits-all solution here, so I'd propose allowing the metrics config file to allow users to specify whether they'd like metrics namespaced by spark.app.id, spark.app.name, or some other config param.

      Attachments

        Issue Links

          Activity

            People

              mgrover Mark Grover
              rdub Ryan Williams
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: