Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.2.1
-
None
Description
MetricsSystem currently prepends the app ID to all metrics.
When reading Spark metrics in Graphite, I've found this to not always be desirable. Graphite is designed to track a mostly-unchanging set of metrics over time; it allocates large zeroed-out files for each metric it sees, and by default rate-limits itself from creating many of these.
App-ID namespacing means that Graphite is allocating disk-space for every "metric" for every job it sees, when in reality some metrics may correspond to others across jobs (e.g. driver JVM stats).
Some common Spark usage flows would be better modeled by e.g. namespacing metrics by spark.app.name, so that successive runs of a given job would share "metrics", from a storage perspective as well as allowing for monitoring aspects of a job's performance over time / many runs.
There's not likely a one-size-fits-all solution here, so I'd propose allowing the metrics config file to allow users to specify whether they'd like metrics namespaced by spark.app.id, spark.app.name, or some other config param.
Attachments
Issue Links
- is duplicated by
-
SPARK-4544 Spark JVM Metrics doesn't have context.
- Resolved
-
SPARK-10610 Using AppName instead of AppId in the name of all metrics
- Resolved
- links to