Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39569

Spark Shuffle Index Cache ignore the weight of index Path

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.0, 3.1.2
    • None
    • Shuffle
    • None

    Description

      We had the same OOMs problem with SPARK-33206. This PR fixed the incorrect weight calculation problem when ExternalShuffle caches ShuffleIndexInformation, but we noticed that the key was ignored, of which type is filePath
       
      shuffleIndexCache = CacheBuilder.newBuilder()
            .maximumWeight(JavaUtils.byteStringAsBytes(indexCacheSize))
            .weigher((Weigher<String, ShuffleIndexInformation>)
              (filePath, indexInfo) -> indexInfo.getRetainedMemorySize())
            .build(indexCacheLoader);
       
      in our case the length of the index path could be greater than 100, e.g. /data/data2/yarn/nm/usercache/hive/appcache/application_1654741161919_1249246/blockmgr-6b0f7db0-7d55-4270-ad3d-42fe70b5694e/37/shuffle_0_1794_0.index
      . This is causing a lot of memory usage in jmap dump. Should we consider the key size when calculating the weight in order to get a more accurate result?

      Attachments

        Activity

          People

            Unassigned Unassigned
            chenzhejia chen zhejia
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: