[SPARK-39569] Spark Shuffle Index Cache ignore the weight of index Path - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.4.0, 3.1.2
Fix Version/s: None
Component/s: Shuffle
Labels:
None

Description

We had the same OOMs problem with SPARK-33206. This PR fixed the incorrect weight calculation problem when ExternalShuffle caches ShuffleIndexInformation, but we noticed that the key was ignored, of which type is filePath

shuffleIndexCache = CacheBuilder.newBuilder()
.maximumWeight(JavaUtils.byteStringAsBytes(indexCacheSize))
.weigher((Weigher<String, ShuffleIndexInformation>)
(filePath, indexInfo) -> indexInfo.getRetainedMemorySize())
.build(indexCacheLoader);

in our case the length of the index path could be greater than 100, e.g. /data/data2/yarn/nm/usercache/hive/appcache/application_1654741161919_1249246/blockmgr-6b0f7db0-7d55-4270-ad3d-42fe70b5694e/37/shuffle_0_1794_0.index
. This is causing a lot of memory usage in jmap dump. Should we consider the key size when calculating the weight in order to get a more accurate result?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: chen zhejia

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 23/Jun/22 15:05

Updated:: 23/Jun/22 15:05