Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45439

Reduce memory usage of LiveStageMetrics.accumIdsToMetricType



    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0
    • SQL


      This PR aims to reduce the memory consumption of LiveStageMetrics.accumIdsToMetricType, which should help to reduce driver memory usage when running complex SQL queries that contain many operators and run many jobs.

      In SQLAppStatusListener, the LiveStageMetrics.accumIdsToMetricType field holds a map which is used to look up the type of accumulators in order to perform conditional processing of a stage’s metrics.

      Currently, that field is derived from LiveExecutionData.metrics, which contains metrics for all operators used anywhere in the query. Whenever a job is submitted, we construct a fresh map containing all metrics that have ever been registered for that SQL query. If a query runs a single job, this isn't an issue: in that case, all LiveStageMetrics instances will hold the same immutable accumIdsToMetricType.

      The problem arises if we have a query that runs many jobs (e.g. a complex query with many joins which gets divided into many jobs due to AQE): in that case, each job submission results in a new accumIdsToMetricType map being created.

      This PR fixes this by changing accumIdsToMetricType to be a mutable ConcurrentHashMap which is shared across all LivestageMetrics instances belonging to the same LiveExecutionData.

      The modified classes are private and are used only in SQLAppStatusListener, so I don't think this change poses any realistic risk of binary incompatibility risks to third party code.


        Issue Links



              joshrosen Josh Rosen
              joshrosen Josh Rosen
              0 Vote for this issue
              2 Start watching this issue