Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-17558

Partitions are released in TaskExecutor Main Thread

    XMLWordPrintableJSON

Details

    Description

      Partitions are released in the main thread of the TaskExecutor (see the stacktrace below). This can lead to missed heartbeats, timeouts of RPCs, etc. because deleting files is blocking I/O. The partitions should be released in a devoted I/O thread pool (TaskExecutor#ioExecutor is a candidate but requires a higher default thread count).

      2020-05-06T19:13:12.4383402Z "flink-akka.actor.default-dispatcher-35" #3555 prio=5 os_prio=0 tid=0x00007f7fcc071000 nid=0x1f3f9 runnable [0x00007f7fd302c000]
      2020-05-06T19:13:12.4383983Z    java.lang.Thread.State: RUNNABLE
      2020-05-06T19:13:12.4384519Z    at sun.nio.fs.UnixNativeDispatcher.unlink0(Native Method)
      2020-05-06T19:13:12.4384971Z    at sun.nio.fs.UnixNativeDispatcher.unlink(UnixNativeDispatcher.java:146)
      2020-05-06T19:13:12.4385465Z    at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:231)
      2020-05-06T19:13:12.4386000Z    at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
      2020-05-06T19:13:12.4386458Z    at java.nio.file.Files.delete(Files.java:1126)
      2020-05-06T19:13:12.4386968Z    at org.apache.flink.runtime.io.network.partition.FileChannelBoundedData.close(FileChannelBoundedData.java:93)
      2020-05-06T19:13:12.4388088Z    at org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartition.checkReaderReferencesAndDispose(BoundedBlockingSubpartition.java:247)
      2020-05-06T19:13:12.4388765Z    at org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartition.release(BoundedBlockingSubpartition.java:208)
      2020-05-06T19:13:12.4389444Z    - locked <0x00000000ff836d78> (a java.lang.Object)
      2020-05-06T19:13:12.4389905Z    at org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:290)
      2020-05-06T19:13:12.4390481Z    at org.apache.flink.runtime.io.network.partition.ResultPartitionManager.releasePartition(ResultPartitionManager.java:80)
      2020-05-06T19:13:12.4391118Z    - locked <0x000000009d452b90> (a java.util.HashMap)
      2020-05-06T19:13:12.4391597Z    at org.apache.flink.runtime.io.network.NettyShuffleEnvironment.releasePartitionsLocally(NettyShuffleEnvironment.java:153)
      2020-05-06T19:13:12.4392267Z    at org.apache.flink.runtime.io.network.partition.TaskExecutorPartitionTrackerImpl.stopTrackingAndReleaseJobPartitions(TaskExecutorPartitionTrackerImpl.java:62)
      2020-05-06T19:13:12.4392914Z    at org.apache.flink.runtime.taskexecutor.TaskExecutor.releaseOrPromotePartitions(TaskExecutor.java:776)
      2020-05-06T19:13:12.4393366Z    at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
      2020-05-06T19:13:12.4393813Z    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      2020-05-06T19:13:12.4394257Z    at java.lang.reflect.Method.invoke(Method.java:498)
      2020-05-06T19:13:12.4394693Z    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:279)
      2020-05-06T19:13:12.4395202Z    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199)
      2020-05-06T19:13:12.4395686Z    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
      2020-05-06T19:13:12.4396165Z    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$$Lambda$72/775020844.apply(Unknown Source)
      2020-05-06T19:13:12.4396606Z    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
      2020-05-06T19:13:12.4397015Z    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
      2020-05-06T19:13:12.4397447Z    at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
      2020-05-06T19:13:12.4397874Z    at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
      2020-05-06T19:13:12.4398414Z    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
      2020-05-06T19:13:12.4398879Z    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
      2020-05-06T19:13:12.4399321Z    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
      2020-05-06T19:13:12.4399737Z    at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
      2020-05-06T19:13:12.4400138Z    at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
      2020-05-06T19:13:12.4400552Z    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
      2020-05-06T19:13:12.4400930Z    at akka.actor.ActorCell.invoke(ActorCell.scala:561)
      2020-05-06T19:13:12.4401390Z    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
      2020-05-06T19:13:12.4401763Z    at akka.dispatch.Mailbox.run(Mailbox.scala:225)
      2020-05-06T19:13:12.4402135Z    at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
      2020-05-06T19:13:12.4402540Z    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
      2020-05-06T19:13:12.4402984Z    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
      2020-05-06T19:13:12.4403448Z    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
      2020-05-06T19:13:12.4404096Z    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
      

      Attachments

        Issue Links

          Activity

            People

              chesnay Chesnay Schepler
              gjy Gary Yao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: