Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33710

Shuffle Index use Guava cache OOM, Yarn NodeManage GC Alarm



    • Type: Bug
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.2.0
    • Fix Version/s: None
    • Component/s: Shuffle, YARN
    • Labels:


      CDH6.3 Yarn nodemanger frequently GC, and then the dump file is generated due to memory overflow


      Use the Memory Analyzer Tool to locate the shuffle index module


      Using guava to cache the memory limit, there is no restriction on the cache key, resulting in a lot of path information in the memory. If the size of shuffleindexinformation in the cache is very small, the number of keys will be very large, and eventually lead to memory overflow. I think there is a defect here, and the capacity of key should be added to the statistics of 100MB


      According to the MAT, the ExternalShuffleBlockHandler uses guava's local cache and takes up 82.88% of the heap memory



      Through the analysis, it is found that there are a lot of shuffle index path information in the memory, which takes up more than 400 MB of memory, and the number is very large. This path is the key of shuffleindex cache in the external shufflebock resolver. After looking at the source code, we know that there may be some defects in the cache management, because the limited 100MB does not include the key statistics







            • Assignee:
              tianlun liangtianlun
            • Votes:
              0 Vote for this issue
              4 Start watching this issue


              • Created: