Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-21552

The managed memory was not released if exception was thrown in createPythonExecutionEnvironment

    XMLWordPrintableJSON

Details

    Description

      If there is exception thrown in createPythonExecutionEnvironment, the job will failed with the following exception:

      org.apache.flink.runtime.memory.MemoryAllocationException: Could not created the shared memory resource of size 611948962. Not enough memory left to reserve from the slot's managed memory.
      at org.apache.flink.runtime.memory.MemoryManager.lambda$getSharedMemoryResourceForManagedMemory$5(MemoryManager.java:536)
      at org.apache.flink.runtime.memory.SharedResources.createResource(SharedResources.java:126)
      at org.apache.flink.runtime.memory.SharedResources.getOrAllocateSharedResource(SharedResources.java:72)
      at org.apache.flink.runtime.memory.MemoryManager.getSharedMemoryResourceForManagedMemory(MemoryManager.java:555)
      at org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:250)
      at org.apache.flink.streaming.api.operators.python.AbstractPythonFunctionOperator.open(AbstractPythonFunctionOperator.java:113)
      at org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:116)
      at org.apache.flink.table.runtime.operators.python.scalar.AbstractPythonScalarFunctionOperator.open(AbstractPythonScalarFunctionOperator.java:88)
      at org.apache.flink.table.runtime.operators.python.scalar.AbstractRowDataPythonScalarFunctionOperator.open(AbstractRowDataPythonScalarFunctionOperator.java:70)
      at org.apache.flink.table.runtime.operators.python.scalar.RowDataPythonScalarFunctionOperator.open(RowDataPythonScalarFunctionOperator.java:59)
      at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:428)
      at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$2(StreamTask.java:543)
      at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93)
      at org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:533)
      at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:573)
      at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
      at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
      at java.lang.Thread.run(Thread.java:834)
      Caused by: org.apache.flink.runtime.memory.MemoryReservationException: Could not allocate 611948962 bytes, only 0 bytes are remaining. This usually indicates that you are requesting more memory than you have reserved. However, when running an old JVM version it can also be caused by slow garbage collection. Try to upgrade to Java 8u72 or higher if running on an old Java version.
      at org.apache.flink.runtime.memory.UnsafeMemoryBudget.reserveMemory(UnsafeMemoryBudget.java:170)
      at org.apache.flink.runtime.memory.UnsafeMemoryBudget.reserveMemory(UnsafeMemoryBudget.java:84)
      at org.apache.flink.runtime.memory.MemoryManager.reserveMemory(MemoryManager.java:423)
      at org.apache.flink.runtime.memory.MemoryManager.lambda$getSharedMemoryResourceForManagedMemory$5(MemoryManager.java:534)
      ... 17 more
      

      The reason is that the reserved managed memory was not added back to the MemoryManager when Job failed because of exceptions thrown in createPythonExecutionEnvironment. This causes that there is no managed memory to allocate during failover.

      Attachments

        Issue Links

          Activity

            People

              xtsong Xintong Song
              dian.fu Dian Fu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: