[SPARK-24827] Some memory waste in History Server by strings in AccumulableInfo objects - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.2.2
Fix Version/s: None
Component/s: Spark Core
Labels:
- bulk-closed

Description

I've analyzed a heap dump of Spark History Server with jxray (www.jxray.com) and found that 42% of the heap is wasted due to duplicate strings. The biggest sources of such strings are the name and value data fields of AccumulableInfo objects:

7. Duplicate Strings:  overhead 42.1% 

  Total strings   Unique strings   Duplicate values  Overhead 
    13,732,278	   729,234	     354,032	     867,177K (42.1%)

Expensive data fields:


318,421K (15.4%), 3669685 / 100% dup strings (8 unique), 3669685 dup backing arrays:

 ↖org.apache.spark.scheduler.AccumulableInfo.name

178,994K (8.7%), 3674403 / 99% dup strings (35640 unique), 3674403 dup backing arrays:

 ↖scala.Some.x

168,601K (8.2%), 3401960 / 92% dup strings (175826 unique), 3401960 dup backing arrays:

 ↖org.apache.spark.scheduler.AccumulableInfo.value

That is, 15.4% of the heap is wasted by AccumulableInfo.name and 8.2% is wasted by AccumulableInfo.value.

It turns out that the problem has been partially addressed in spark 2.3+, e.g.

https://github.com/apache/spark/blob/b045315e5d87b7ea3588436053aaa4d5a7bd103f/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L590

However, this code has two minor problems:

Strings for AccumulableInfo.value are not interned in the above code, only AccumulableInfo.name.
For interning, the code in weakIntern(String) method uses a Guava interner (stringInterner = Interners.newWeakInterner[String]()). This is an old-fashioned, less efficient way of interning strings. Since some 3-4 years old JDK7 version, the built-in JVM String.intern() method is much more efficient, both in terms of CPU and memory.

It is therefore suggested to add interning for value and replace the Guava interner with String.intern().

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Misha Dmitriev

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/Jul/18 22:48

Updated:: 08/Oct/19 05:43

Resolved:: 08/Oct/19 05:43