Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5228

LocalInputChannel re-trigger request and release deadlock

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.0, 1.1.4
    • Component/s: Network
    • Labels:
      None

      Description

      Concurrent release and re-triggering of a partition request can lead to a deadlock.

      Found one Java-level deadlock:
      =============================
      "Canceler for Map -> Sink: Unnamed (1/4)":
      waiting to lock monitor 0x0000000001e27bd8 (object 0x00000000ffa1f688, a java.lang.Object),
      which is held by "Timer-3"
      "Timer-3":
      waiting to lock monitor 0x00007fdbd029ec48 (object 0x00000000ffa1f3a0, a java.lang.Object),
      which is held by "Canceler for Map -> Sink: Unnamed (1/4)"
      
      Java stack information for the threads listed above:
      ===================================================
      "Canceler for Map -> Sink: Unnamed (1/4)":
         at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.releaseAllResources(LocalInputChannel.java:240)
         - waiting to lock <0x00000000ffa1f688> (a java.lang.Object)
         at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.releaseAllResources(SingleInputGate.java:348)
         - locked <0x00000000ffa1f3a0> (a java.lang.Object)
         at org.apache.flink.runtime.taskmanager.Task$TaskCanceler.run(Task.java:1280)
         at java.lang.Thread.run(Thread.java:745)
      "Timer-3":
         at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.retriggerPartitionRequest(SingleInputGate.java:307)
         - waiting to lock <0x00000000ffa1f3a0> (a java.lang.Object)
         at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.requestSubpartition(LocalInputChannel.java:128)
         - locked <0x00000000ffa1f688> (a java.lang.Object)
         at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel$1.run(LocalInputChannel.java:148)
         at java.util.TimerThread.mainLoop(Timer.java:555)
         at java.util.TimerThread.run(Timer.java:505)
      

        Activity

        Hide
        uce Ufuk Celebi added a comment -

        Fixed in 388acbc (release-1.1), 3229dc0 (master).

        Show
        uce Ufuk Celebi added a comment - Fixed in 388acbc (release-1.1), 3229dc0 (master).

          People

          • Assignee:
            uce Ufuk Celebi
            Reporter:
            uce Ufuk Celebi
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development