Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2773

OutOfMemoryError on YARN Session

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.10.0
    • 0.10.0
    • Deployment / YARN
    • None

    Description

      When running a Flink program on a detached YARN session using the latest master (commit 0b3ca57b41e09937b9e63f2f443834c8ad1cf497), I observed this OutOfMemoryError

      java.lang.Exception: The data preparation for task 'CoGroup (coGroup-A68B765B7BAB4E29BF6816965A994776)' , caused an error: Error obtaining the sorted input: Thread 'SortMerger Reading Thread' terminated due to an exception: java.lang.OutOfMemoryError: Direct buffer memory
      	at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:464)
      	at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:354)
      	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:579)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.RuntimeException: Error obtaining the sorted input: Thread 'SortMerger Reading Thread' terminated due to an exception: java.lang.OutOfMemoryError: Direct buffer memory
      	at org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:607)
      	at org.apache.flink.runtime.operators.RegularPactTask.getInput(RegularPactTask.java:1089)
      	at org.apache.flink.runtime.operators.CoGroupDriver.prepare(CoGroupDriver.java:97)
      	at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:459)
      	... 3 more
      Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated due to an exception: java.lang.OutOfMemoryError: Direct buffer memory
      	at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:787)
      Caused by: org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: java.lang.OutOfMemoryError: Direct buffer memory
      	at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:153)
      	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:246)
      	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:224)
      	at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)
      	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:246)
      	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:224)
      	at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)
      	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:246)
      	at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:737)
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:310)
      	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
      	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
      	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
      	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
      	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct buffer memory
      	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:234)
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
      	... 9 more
      Caused by: java.lang.OutOfMemoryError: Direct buffer memory
      	at java.nio.Bits.reserveMemory(Bits.java:658)
      	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
      	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
      	at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
      	at io.netty.buffer.UnpooledUnsafeDirectByteBuf.capacity(UnpooledUnsafeDirectByteBuf.java:157)
      	at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251)
      	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849)
      	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841)
      	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831)
      	at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:92)
      	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:228)
      	... 10 more
      

      Since I know, that this feature was properly working recently, I reverted to commit 8ca853e0f6c18be8e6b066c6ec0f23badb797323 and the problem was gone.
      The problem might have been introduced when adding offheap memory support for YARN (commit 93c95b6a6f150a2c55dc387e4ef1d603b3ef3f22).

      Attachments

        Issue Links

          Activity

            People

              mxm Maximilian Michels
              fhueske Fabian Hueske
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: