Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-11379

OutOfMemoryError when loading large TaskDeploymentDescriptor

    XMLWordPrintableJSON

Details

    Description

      When TM loads a offloaded TDD with large size, it may throw a "java.lang.OutOfMemoryError: Direct Buffer Memory" error. The loading uses nio's Files.readAllBytes() to read serialized TDD. In the call stack of Files.readAllBytes() , it will allocate a direct memory buffer which's size is equal the length of the file. This will cause OutOfMemoryErro error when direct memory is not enough.

      If the length of a file is large than a maximum buffer size,  the maximum size direct-buffer should be used to read bytes of the file to avoid direct memory OutOfMemoryError.  The maximum buffer size can be 8K or others.

      The exception stack is as follows (this exception stack is from an old Flink version, but the master branch has the same problem).

      Caused by: java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:706)
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
        at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
        at sun.nio.ch.IOUtil.read(IOUtil.java:195)
        at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:182)
        at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
        at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
        at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
        at java.nio.file.Files.read(Files.java:3105)
        at java.nio.file.Files.readAllBytes(Files.java:3158)
        at org.apache.flink.runtime.deployment.TaskDeploymentDescriptor.loadBigData(TaskDeploymentDescriptor.java:338)
        at org.apache.flink.runtime.taskexecutor.TaskExecutor.submitTask(TaskExecutor.java:397)
        at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:211)
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:155)
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java:133)
        at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
        ... 9 more

      Attachments

        Issue Links

          Activity

            People

              sunhaibotb Haibo Sun
              sunhaibotb Haibo Sun
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m