Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42162

Memory usage on executors increased drastically for a complex query with large number of addition operations

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.4.0
    • SQL
    • None

    Description

      With the recent changes  in the expression canonicalization, a complex query with a large number of Add operations ends up consuming 10x more memory on the executors.

      The reason for this issue is that with the new changes the canonicalization process ends up generating lot of intermediate objects, especially for complex queries with a large number of commutative operators. In this specific case, a heap histogram analysis shows that a large number of Add objects use the extra memory.
      This issue does not happen before PR #37851.

      The high memory usage causes the executors to lose heartbeat signals and results in task failures.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            scnakandala Supun Nakandala
            scnakandala Supun Nakandala
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment