Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13510

Shuffle may throw FetchFailedException: Direct buffer memory

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 1.6.0
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:
      None

      Description

      In our cluster, when I test spark-1.6.0 with a sql, it throw exception and failed.

      16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request for 1 blocks (915.4 MB) from 10.196.134.220:7337
      16/02/17 15:36:03 INFO shuffle.ExternalShuffleClient: External shuffle fetch from 10.196.134.220:7337 (executor id 122)
      16/02/17 15:36:03 INFO client.TransportClient: Sending fetch chunk request 0 to /10.196.134.220:7337
      16/02/17 15:36:36 WARN server.TransportChannelHandler: Exception in connection from /10.196.134.220:7337
      java.lang.OutOfMemoryError: Direct buffer memory
      	at java.nio.Bits.reserveMemory(Bits.java:658)
      	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
      	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
      	at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
      	at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
      	at io.netty.buffer.PoolArena.allocate(PoolArena.java:212)
      	at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
      	at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
      	at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
      	at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
      	at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
      	at io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
      	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
      	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
      	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
      	at java.lang.Thread.run(Thread.java:744)
      16/02/17 15:36:36 ERROR client.TransportResponseHandler: Still have 1 requests outstanding when connection from /10.196.134.220:7337 is closed
      16/02/17 15:36:36 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block shuffle_3_81_2, and will not retry (0 retries)
      

      The reason is that when shuffle a big block(like 1G), task will allocate the same memory, it will easily throw "FetchFailedException: Direct buffer memory".
      If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will throw

      java.lang.OutOfMemoryError: Java heap space
              at io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
              at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
              at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
              at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
      

      In mapreduce shuffle, it will firstly judge whether the block can cache in memery, but spark doesn't.
      If the block is more than we can cache in memory, we should write to disk.

        Attachments

        1. spark-13510.diff
          57 kB
          shenh062326

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                shenhong shenh062326
              • Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: