Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46094

Support Executor JVM Profiling

    XMLWordPrintableJSON

Details

    Description

      To profile a Spark application a user or developer has to run a spark job locally on the development machine and use a tool like Java flight recorder, Yourkit, or async-profiler to record profiling information. Because profiling can be expensive, the profiler is typically attached to the Spark jvm process after the process has started and stopped once sufficient profiling data is collected.

      The developers environment is frequently different from the production environment and may not yield accurate information.

      However, the profiling process is hard when a Spark application runs as a distributed job on a cluster where the developer may have limited access to the actual nodes where the executor processes are running.  Also, in environments like Kubernetes where the executor pods may be removed as soon as the job completes, retrieving the profiling information from each executor pod can become quite tricky.

      This feature is to add a low overhead sampling profiler like async-profiler as a built in capability to the Spark job that can be turned on using only user configurable parameters (async-profiler is a low overhead profiler that can be invoked programmatically and is available as a single multi-platform jar (for linux, and mac).

      In addition, for convenience, the feature would save profiling output files to the distributed file system so that information from all executors can be available in a single place.

      The feature would add an executor plugin that does not add any overhead unless enabled and can be configured to accept profiler arguments as a configuration parameter.

      Attachments

        Issue Links

          Activity

            People

              parthc Parth Chandra
              parthc Parth Chandra
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: