Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30246

Spark on Yarn External Shuffle Service Memory Leak

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.3
    • 2.4.5, 3.0.0
    • Shuffle, Spark Core
    • None
    • hadoop 2.7.3
      spark 2.4.3
      jdk 1.8.0_60

    Description

      In our large busy yarn cluster which deploy Spark external shuffle service as part of YARN NM aux service, we encountered OOM in some NMs.
      after i dump the heap memory and found there are some StremState objects still in heap, but the app which the StreamState belongs to is already finished.

      Here is some relate Figures:

      The heap dump below shows that the memory consumption mainly consists of two parts:
      (1) OneForOneStreamManager (4,429,796,424 (77.11%) bytes)
      (2) PoolChunk(occupy 1,059,201,712 (18.44%) bytes. )

      dig into the OneForOneStreamManager, there are some StreaStates still remained :

      incomming references to StreamState::associatedChannel:

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            henriquedsg89 Henrique dos Santos Goulart
            UncleHuang uncle-huang
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment