[SPARK-46094] Support Executor JVM Profiling - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 4.0.0
Fix Version/s: 4.0.0
Component/s: Connect
Labels:
- pull-request-available

Description

To profile a Spark application a user or developer has to run a spark job locally on the development machine and use a tool like Java flight recorder, Yourkit, or async-profiler to record profiling information. Because profiling can be expensive, the profiler is typically attached to the Spark jvm process after the process has started and stopped once sufficient profiling data is collected.

The developers environment is frequently different from the production environment and may not yield accurate information.

However, the profiling process is hard when a Spark application runs as a distributed job on a cluster where the developer may have limited access to the actual nodes where the executor processes are running. Also, in environments like Kubernetes where the executor pods may be removed as soon as the job completes, retrieving the profiling information from each executor pod can become quite tricky.

This feature is to add a low overhead sampling profiler like async-profiler as a built in capability to the Spark job that can be turned on using only user configurable parameters (async-profiler is a low overhead profiler that can be invoked programmatically and is available as a single multi-platform jar (for linux, and mac).

In addition, for convenience, the feature would save profiling output files to the distributed file system so that information from all executors can be available in a single place.

The feature would add an executor plugin that does not add any overhead unless enabled and can be configured to accept profiler arguments as a configuration parameter.

Attachments

Issue Links

causes

SPARK-48127 Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` modules

Resolved

links to

GitHub Pull Request #44021

GitHub Pull Request #45353

Activity

People

Assignee:: Parth Chandra

Reporter:: Parth Chandra

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/Nov/23 20:03

Updated:: 04/May/24 01:25

Resolved:: 15/Jan/24 21:47