Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33710

Shuffle Index use Guava cache OOM, Yarn NodeManage GC Alarm

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.2.0
    • Fix Version/s: None
    • Component/s: Shuffle, YARN
    • Labels:
      None

      Description

      CDH6.3 Yarn nodemanger frequently GC, and then the dump file is generated due to memory overflow


       

      Use the Memory Analyzer Tool to locate the shuffle index module

       

      Using guava to cache the memory limit, there is no restriction on the cache key, resulting in a lot of path information in the memory. If the size of shuffleindexinformation in the cache is very small, the number of keys will be very large, and eventually lead to memory overflow. I think there is a defect here, and the capacity of key should be added to the statistics of 100MB

       

      According to the MAT, the ExternalShuffleBlockHandler uses guava's local cache and takes up 82.88% of the heap memory

       

       

      Through the analysis, it is found that there are a lot of shuffle index path information in the memory, which takes up more than 400 MB of memory, and the number is very large. This path is the key of shuffleindex cache in the external shufflebock resolver. After looking at the source code, we know that there may be some defects in the cache management, because the limited 100MB does not include the key statistics

       

       

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tianlun liangtianlun
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: