Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24827

Some memory waste in History Server by strings in AccumulableInfo objects



    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 2.2.2
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:


      I've analyzed a heap dump of Spark History Server with jxray (www.jxray.com) and found that 42% of the heap is wasted due to duplicate strings. The biggest sources of such strings are the name and value data fields of AccumulableInfo objects:

      7. Duplicate Strings:  overhead 42.1% 
        Total strings   Unique strings   Duplicate values  Overhead 
          13,732,278	   729,234	     354,032	     867,177K (42.1%)
      Expensive data fields:
      318,421K (15.4%), 3669685 / 100% dup strings (8 unique), 3669685 dup backing arrays:
      178,994K (8.7%), 3674403 / 99% dup strings (35640 unique), 3674403 dup backing arrays:
      168,601K (8.2%), 3401960 / 92% dup strings (175826 unique), 3401960 dup backing arrays:

      That is, 15.4% of the heap is wasted by AccumulableInfo.name and 8.2% is wasted by AccumulableInfo.value.

      It turns out that the problem has been partially addressed in spark 2.3+, e.g.


      However, this code has two minor problems:

      1. Strings for AccumulableInfo.value are not interned in the above code, only AccumulableInfo.name.
      2. For interning, the code in weakIntern(String) method uses a Guava interner (stringInterner = Interners.newWeakInterner[String]()). This is an old-fashioned, less efficient way of interning strings. Since some 3-4 years old JDK7 version, the built-in JVM String.intern() method is much more efficient, both in terms of CPU and memory.

      It is therefore suggested to add interning for value and replace the Guava interner with String.intern().




            • Assignee:
              misha@cloudera.com Misha Dmitriev
            • Votes:
              1 Vote for this issue
              3 Start watching this issue


              • Created: