Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.4.0, 3.1.2
-
None
-
None
Description
We had the same OOMs problem with SPARK-33206. This PR fixed the incorrect weight calculation problem when ExternalShuffle caches ShuffleIndexInformation, but we noticed that the key was ignored, of which type is filePath
shuffleIndexCache = CacheBuilder.newBuilder()
.maximumWeight(JavaUtils.byteStringAsBytes(indexCacheSize))
.weigher((Weigher<String, ShuffleIndexInformation>)
(filePath, indexInfo) -> indexInfo.getRetainedMemorySize())
.build(indexCacheLoader);
in our case the length of the index path could be greater than 100, e.g. /data/data2/yarn/nm/usercache/hive/appcache/application_1654741161919_1249246/blockmgr-6b0f7db0-7d55-4270-ad3d-42fe70b5694e/37/shuffle_0_1794_0.index
. This is causing a lot of memory usage in jmap dump. Should we consider the key size when calculating the weight in order to get a more accurate result?