Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26221

Improve Spark SQL instrumentation and metrics

    XMLWordPrintableJSON

    Details

    • Type: Umbrella
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.4.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      This is an umbrella ticket for various small improvements for better metrics and instrumentation. Some thoughts:

       

      Differentiate query plan that’s writing data out, vs returning data to the driver

      • I.e. ETL & report generation vs interactive analysis
      • This is related to the data sink item below. We need to make sure from the query plan we can tell what a query is doing

      Data sink: Have an operator for data sink, with metrics that can tell us:

      • Write time
      • Number of records written
      • Size of output written
      • Number of partitions modified
      • Metastore update time
      • Also track number of records for collect / limit

      Scan

      • Track file listing time (start and end so we can construct timeline, not just duration)
      • Track metastore operation time
      • Track IO decoding time for row-based input sources; Need to make sure overhead is low

      Shuffle

      • Track read time and write time
      • Decide if we can measure serialization and deserialization

      Client fetch time

      • Sometimes a query take long to run because it is blocked on the client fetching result (e.g. using a result iterator). Record the time blocked on client so we can remove it in measuring query execution time.

      Make it easy to correlate queries with jobs, stages, and tasks belonging to a single query, e.g. dump execution id in task logs?

      Better logging:

      • Enable logging the query execution id and TID in executor logs, and query execution id in driver logs.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                rxin Reynold Xin
                Reporter:
                rxin Reynold Xin
              • Votes:
                1 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated: