Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24801

Empty byte[] arrays in spark.network.sasl.SaslEncryption$EncryptedMessage can waste a lot of memory

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.4.0
    • Spark Core, YARN

    Description

      I recently analyzed another Yarn NM heap dump with jxray (www.jxray.com), and found that 81% of memory is wasted by empty (all zeroes) byte[] arrays. Most of these arrays are referenced by org.apache.spark.network.util.ByteArrayWritableChannel.data, and these in turn come from spark.network.sasl.SaslEncryption$EncryptedMessage.byteChannel. Here is the full reference chain that leads to the problematic arrays:

      2,597,946K (64.1%): byte[]: 40583 / 100% of empty 2,597,946K (64.1%)
      
      ↖org.apache.spark.network.util.ByteArrayWritableChannel.data
      ↖org.apache.spark.network.sasl.SaslEncryption$EncryptedMessage.byteChannel
      ↖io.netty.channel.ChannelOutboundBuffer$Entry.msg
      ↖io.netty.channel.ChannelOutboundBuffer$Entry.{next}
      ↖io.netty.channel.ChannelOutboundBuffer.flushedEntry
      ↖io.netty.channel.socket.nio.NioSocketChannel$NioSocketChannelUnsafe.outboundBuffer
      ↖io.netty.channel.socket.nio.NioSocketChannel.unsafe
      ↖org.apache.spark.network.server.OneForOneStreamManager$StreamState.associatedChannel
      ↖{java.util.concurrent.ConcurrentHashMap}.values
      ↖org.apache.spark.network.server.OneForOneStreamManager.streams
      ↖org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.streamManager
      ↖org.apache.spark.network.yarn.YarnShuffleService.blockHandler
      ↖Java Static org.apache.spark.network.yarn.YarnShuffleService.instance

       

      Checking the code of SaslEncryption$EncryptedMessage, I see that byteChannel is always initialized eagerly in the constructor:

      this.byteChannel = new ByteArrayWritableChannel(maxOutboundBlockSize);

      So I think to address the problem of empty byte[] arrays flooding the memory, we should initialize byteChannel lazily, upon the first use. As far as I can see, it's used only in one method, private void nextChunk().

       

      Attachments

        Issue Links

          Activity

            People

              misha@cloudera.com Misha Dmitriev
              misha@cloudera.com Misha Dmitriev
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: