Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33710

Shuffle Index use Guava cache OOM, Yarn NodeManage GC Alarm

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.2.0
    • None
    • Shuffle, YARN
    • None

    Description

      CDH6.3 Yarn nodemanger frequently GC, and then the dump file is generated due to memory overflow


       

      Use the Memory Analyzer Tool to locate the shuffle index module

       

      Using guava to cache the memory limit, there is no restriction on the cache key, resulting in a lot of path information in the memory. If the size of shuffleindexinformation in the cache is very small, the number of keys will be very large, and eventually lead to memory overflow. I think there is a defect here, and the capacity of key should be added to the statistics of 100MB

       

      According to the MAT, the ExternalShuffleBlockHandler uses guava's local cache and takes up 82.88% of the heap memory

       

       

      Through the analysis, it is found that there are a lot of shuffle index path information in the memory, which takes up more than 400 MB of memory, and the number is very large. This path is the key of shuffleindex cache in the external shufflebock resolver. After looking at the source code, we know that there may be some defects in the cache management, because the limited 100MB does not include the key statistics

       

       

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            tianlun liangtianlun

            Dates

              Created:
              Updated:

              Slack

                Issue deployment