Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2134

Deadlock in SuccessAfterNetworkBuffersFailureITCase

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.9
    • Component/s: None
    • Labels:
      None

      Description

      I ran into the issue in a Travis run for a PR: https://s3.amazonaws.com/archive.travis-ci.org/jobs/64994288/log.txt

      I can reproduce this locally by running SuccessAfterNetworkBuffersFailureITCase multiple times:

      cluster = new ForkableFlinkMiniCluster(config, false);
      for (int i = 0; i < 100; i++) {
         // run test programs CC, KMeans, CC
      }
      

      The iteration tasks wait for superstep notifications like this:

      "Join (Join at runConnectedComponents(SuccessAfterNetworkBuffersFailureITCase.java:128)) (8/6)" daemon prio=5 tid=0x00007f95f374f800 nid=0x138a7 in Object.wait() [0x0000000123f2a000]
         java.lang.Thread.State: TIMED_WAITING (on object monitor)
      	at java.lang.Object.wait(Native Method)
      	- waiting on <0x00000007f89e3440> (a java.lang.Object)
      	at org.apache.flink.runtime.iterative.concurrent.SuperstepKickoffLatch.awaitStartOfSuperstepOrTermination(SuperstepKickoffLatch.java:57)
      	- locked <0x00000007f89e3440> (a java.lang.Object)
      	at org.apache.flink.runtime.iterative.task.IterationTailPactTask.run(IterationTailPactTask.java:131)
      	at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
      	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
      	at java.lang.Thread.run(Thread.java:745)
      

      I've asked Robert Metzger to reproduce this and it deadlocks for him as well. The system needs to be under some load for this to occur after multiple runs.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              uce Ufuk Celebi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: