Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40281

Memory Profiler on Executors

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • PySpark
    • None

    Description

      The ticket proposes to implement PySpark memory profiling on executors. See more design.

      There are many factors in a PySpark program’s performance. Memory, as one of the key factors of a program’s performance, had been missing in PySpark profiling. A PySpark program on the Spark driver can be profiled with Memory Profiler as a normal Python process, but there was not an easy way to profile memory on Spark executors.

      PySpark UDFs, one of the most popular Python APIs, enable users to run custom code on top of the Apache Spark™ engine. However, it is difficult to optimize UDFs without understanding memory consumption.

      The ticket proposes to introduce the PySpark memory profiler, which profiles memory on executors. It provides information about total memory usage and pinpoints which lines of code in a UDF attribute to the most memory usage. That will help optimize PySpark UDFs and reduce the likelihood of out-of-memory errors.

      Attachments

        There are no Sub-Tasks for this issue.

        Activity

          People

            XinrongM Xinrong Meng
            XinrongM Xinrong Meng
            Votes:
            2 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: